Star Schema and Technical Keys 5

kalyan · Jun 25, 1999

Iam new to data warehousing concepts. But I've good background and experience in Oracle developemnt. I want to know in detail about 
STAR SCHEMA 
TECHNICAL KEYS 
Any help in this regard will be highly appreciated. 
 
Cheers 
Kalyan

WHArchitect · Jul 16, 1999

Do you mean surrogate keys? You might want to be more specific about what you want to know.

jnicho02 · Jul 20, 1999

A star schema is the opposite of a normal OLTP RDBS design, it is nearly completely de-normalised. This is because we don't want to discriminate between data types (customer data, sales data, product data).. data is data. 
In a star schema all the transaction data is held in a central 'fact' table. This is held at the lowest level of querying and keyed by its dimensions. For example a database holding sales may be dimensioned by customer, product and time. There will then be some extra columns showing how much was bought, what it's value is, etc. these are measures. The schema turns into a star when reference tables containing the translations for the dimension ids are added.

WHArchitect · Jul 20, 1999

For a more complete discussion, see Ralph Kimball, "The Data Warehouse Toolkit". Bear in mind that Kimball's is not the only voice when it comes to decision support database design. 
 
You may get the impression from reading Kimball that a star schema is always the best approach to designing a decision support database. Far more prominent practicioners than I have differing opinions on that subject. A decision support database is usually not designed as a strictly normalized structure. However, A normalized design enhanced with tables of summarized data often makes more sense for a high volume database. 
 
Although the star schema has limitations, it is a powerful data structure when properly designed. Its main strength is that it allows one to do highly sophisticated reporting using simple SQL statements. A star schema can be mapped directly to some OLAP tools and is compatible with all of them. It is a simple, easily understood design. This reduces the need for skilled programmers and supports end user computing. With all that going for it, one would think that everyone would be jumping on the star schema band wagon. 
 
The star schema approach imposes some constraints. Star schema designs can be difficult to update incrementally under some circumstances. Because this type of design is usually focused on a specific business context, the implementation of new reporting requirements often requires design and construction of additional tables. The most effective way to improve performance with this type of database is to enhance the design by adding summarized tables called aggregates. Adjusting the control parameters, or adding or deleting indices are usually less effective.

SASMAN · Jul 23, 1999

Kalyan, 
When you do the analysis and design of your DWH, a good tip would be to start by interviewing all the stake holders in the project to determine which business units will ultimately use the DWH. 
Once you've identified these areas, speak to representatives from each area to determine their high level information requirements. Document these requirements using a Subject Area Model. This model serves as a blue print for future evolutionary development of the DWH. Try to keep this model at a high enough level of abstration so that you dont get caught up in "analysis paralisys" but include enough detail to make a meaningful model. A medium size corporation should have about 20 subject areas. Spend about 2 weeks building this model. 
 
Once you have this signed off, you can then focus on one particular business unit at a time. At this point you can begin to gather detailed information requirements and this is when ER modelling kicks in. A star schema is one of a few variations of schemas available but remember you cant denormalise until you have normalised a model. So my advice would be to do a normalised ER model (3rd NF would probably suffice in most cases). 
 
Each iteration or build of your DWH focuses on the specific needs of a separate business unit, but you want to avoid islands of info (independent datamarts). So the idea is to integrate the needs of new business units into your ER model. A normalised ER Model (logical) is most suitable when it comes to integrating additional pieces in an evolutionary way.

WHArchitect · Jul 23, 1999

SASMAN, 
 
I couldn't agree more strongly about the ERD, although some experts would say that it is not necessary. There is a lot of debate about how detailed this model should be, and whether it should lead to a normalized database design, or whether it should become the basis for denormalizing into a star schema. 
 
I think that the determining factors mostly have to do with the scale of the warehouse project. Enterprise scale data warehouse projects often fail to meet their objectives because of the failure to focus on specific business cases, the tendency to include all of the data as far as the eye can see, and because the long time frame for such a project tends to negate any advantage the enterprise might derive from it in the first place. On the other side, departmental data warehouses are often difficult to integrate with other warehouses at the departmental level, leading to duplicated effort and inconsistent statistics. 
 
I doubt that there can be much value in an ERD that is developed in two weeks, unless the business function that defines the subject matter is very limited in scope. The information gathered and conclusions drawn in such a short time are almost always contradicted by subsequent, more detailed analysis. 
 
Further, most organizations don't maintain a separate logical model. The most common practice in this regard is to discard the logical model once the database has been built. In organizations where this is the case, it will be difficult to get acceptance of this new procedural paradigm and to convince management that the logical model has value independent of its role as a step in the design process. 
 
I have found that a relatively high level, sparsely attributed business area ER model, enhanced by detailed subject area ER models is a good way to control integration of new subject matter into the warehouse. So long as these products are maintained separately from the physical design, either design paradigm can be used for the physical database. 
 
By the way, if you design your star or "constellation" schema around a set of conforming dimensions, it is usually easier to integrate new subject matter than it would be in a normalized design having the same scope. The trick is to extend the scope of the initial analysis so that it covers all of the subject matter that may eventually be included.

SASMAN · Jul 29, 1999

WHArchitect, 
At last, someone who understands. 
Just to clear up some misperceptions. 
 
The Subject Area Model is not specifically an ERD in the traditional sence of the word. It is a highly abstracted model representing the subject areas in an enterprise and the relationship between those subjects. As far as 2 weeks to do the model is concerned, most of the Architect's time will probably be spent arranging or waiting for meetings with organisational stake-holders rather than modelling. When it comes down to the model, a corporation probably has around 20 subject areas and to model these in a subject model is quite possible in 2 weeks. 
 
You voiced doubts about the value of any model that gets produced in two weeks. The value that the Subject Area model brings to the table is that of a high level conceptual schema for the entire enterprise data warehouse. What is that you may ask. Well to have an outline of everything that will land up in the data warehouse sounds pretty valuable to me and secondly to get this outline in 2 weeks sounds like a bargain. Remember you want to avoid long time frames to delivery, but you still need some idea of how everything fits together. 
 
I think your sparsely attributed business area ER models are probably just another form of subject area model. I would like to qualitfy one thing though and that is that my subject area models are not aligned with organisational business units as defined in the enterprise organisation structure but rather with the areas of interest about which the enterprise would like to have information. 
 
The scope of some data warehouse projects had less to do with their failures, than the approach that was used to develop those DWH's. If you want to manage the risks get a competant project team, use a proven methodology and deploy proven enabling technology. 
 
If you're interested to know more about proven methodologies and some insight into the types of enabling technologies I'm refering to then email me at loginjames@extra.co.nz. If you think this subject is worth airing in this forum then let me know. 
 
If an organisation cant see the benefit of maintaining a separate logical model it is a real shame and I'll go further to say that is incumbent on us DW Architects to convince them of its value. Given that your logical model is hardware and software independent, its a really valuable tool to have. It is the reference document that rises above all the techno jargon and represents the business' information in business speak. Because of its independence of H/W and S/W it enables physical scalability while maintaining a representation of the logic of the data. Further more, it is not a model that the project team goes off into a huddle for a year or more to develop, it is developed incrementally with each iteration of build of the warehouse. 
 
When it comes to data warehousing there are a number of models involved. Lets examine them. 
1. Subject area model - high level big picture strategic conceptual blue-print for iterative builds. 
2. Detailed logical model - narrow in scope, focussed on one business unit's needs. Is added to (evovled) with each iteration of a DWH build. When DWH builds have covered every business unit this model will represent a logical corporate data model. 
3. Process models - you have to document where you are going to extract operational data from, how you are going to transform it (application of business rules etc) and how you are going to load the DWH. 
4. Physical data model - Some lucky DBA or warehouse administrator has to implement and maintain the systerm at a physical level. A model would be handy. 
5. Lets not forget about application models for the fancy GUI interfaces we might develop to enable our user communities to exploit the DWH. 
6. These applications are more than likely going to need the data structured for performance so we're looking at another set of process models and data models. 
 
Hey......is this begining to sound like a lot of work. Do you think organisations will want to carry all this overhead. Wouldn't a case tool that handled all of this be really handy. Funny thing is I know of one........ interested? 
 
Just one last thought. Down stream dependent datamarts exist to enable efficient exploitation of the data warehouse. The physical structure of these marts is all important because we want to keep our users happy with the systems response times and perhaps simplify navigation. So when you're considering denormalising any data structures it should be in this area. Think about structures like multi-dimensional databases, star schemas and the like to speed up your OLAP. BUT think twice about denormalising your DWH repository. Remember that the DWH repository serves as an integrated data store whose purpose is to supply downstream data marts and should therefore be structured for minimal data redundancy and maximum performance of data loading. 
 
Cheers for now 
SASMAN

WHArchitect · Jul 30, 1999

A few points, Sasman. 
 
You have surmised correctly that I am talking about a high level ERD. What I am talking about is not what you have described as a "subject area model". I believe that we both agree that a star schema can be characterized as a radically denormalised relational database design. This begs the question, "denormalized from what?" The obvious answer is, "from a normalized model". There is an advantage to identifying dimensions early rather than late. There is also an advantage in basing the definition of these dimensions on the broadest possible analysis. 
 
No doubt you are familiar with articles written by fellow practicioners in this field that explain the denormalization techniques and analysis techniques that can be used to derive a star schema from a detailed, normalized logical data model. The same techniques can be applied to a less detailed model, even one that is not, technically, normalized, in order to derive dimensions that are reusable. "Conforming dimensions" in Ralph Kimball's parlance. Even if the central warehouse is to be a normalized database, conforming dimensions are necessary in order to assure consistent query results from one data mart to the next where a dimensional approach is selected. 
 
The subject matter for this high level data model is the business area that forms the context of data warehouse plus whatever external data are necessary to satisfy the various business cases that can be anticipated at project initiation. There is no need to be finicky at this point. It is best to paint with a broad brush. This model becomes the framework from which new sbject matter will hang. As new subject matter is added, this model is the source of fundamental entities that will seed the detailed logical model. The high level model will, in return, be enhanced with any new fundamental entities discovered during creation of the detailed model. The scope of each detailed model is determined by the business case or cases that it is intended to represent. Each of the detailed models is then used to enhance the physical design of the central warehouse, as well as the design of the data mart that will actually be used to deliver the information. 
 
About the central data warehouse and normalization: Normalization is good. It is better for the central warehouse to be rigorously normalized, in textbook third normal form at the very least. This is an article of faith. However, the size of a data warehouse may make it difficult, if not impossible, to maintain the data within the available operational window. In practice, most medium to large data warehouses are denormalized to some extent. I agree with you that this should be kept at a minimum.

gg · Dec 27, 1999

Your discussions are interesting and informative. However, for your novice readers I think 
brief examples to accompany your essays would be extremely beneficial.

WHArchitect · Dec 28, 1999

Point well taken, gg. We are dealing with broad generalities here and trying not to write whole chapters on the subject. Like most of us, I also have to work for a living. That leaves less time than I would like to devote to forums like this one. I will try to be more specific and give examples, even if it means avoiding broad statements like those above.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Star Schema and Technical Keys 5

kalyan

Programmer

WHArchitect

MIS

jnicho02

Programmer

WHArchitect

MIS

SASMAN

Programmer

WHArchitect

MIS

SASMAN

Programmer

WHArchitect

MIS

gg

MIS

WHArchitect

MIS

Similar threads

Part and Inventory Search

Sponsor