Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

"Monster" dimension causing problems

Status
Not open for further replies.

jcoira

Programmer
Sep 26, 2001
51
0
0
US

Hi,

I am designning a data mart and have a problem with one dimension table growing as fast as the fact table, that is slowing down the response times. I am using SQL 2000 and MS Analysis Services.

We receive trucks loaded with oranges and they are graded to get some quality attributes. I have one fact table to keep track of each graded load. The structure is something like (all keys are surrogate keys)

Time_key,
Variety_key (variety of orange carried in the load),
Load_key (key of load received),
NetWeight, Quality1, Quality2,...(all the facts)

The problem is that I have a dimension table to store attributes of the load such as LoadType, Status, RegistrationNumber, etc. For each load I receive and grade I insert one record in the fact table with all the quality attributes, BUT at the same time I insert all attributes of that load in the dimension table. Therefore, the dimension table grows as fast as the fact table.

With less than 200,000 loads the response of my Analysis Services cubes is really bad.

Any suggestions of what I can do to improve the design or/and the response times from MSAS?

Thanks,
// jcoira
 
If LoadType, Status and any others have common attributes you may want to make these actual dimensions. for example if you have status' that are Arrived, Inspected, Rejected and Approved, you would be better off having a Status dimension that keys to the fact records using their own surogate keys.

"Shoot Me! Shoot Me NOW!!!"
- Daffy Duck
 

Thanks for the reply MDXer,

What I actually need is to get rid of the load dimension or make the design to work well with it. I have around 20 attributes in the LOAD dimension, which is growing in parallel with the fact table. I think that is causing MS Analysis Services to give me slow response times.

// jcoira
 
do you actually do any analisys based upon the load dimension?

"Shoot Me! Shoot Me NOW!!!"
- Daffy Duck
 

Users want to see member properties of the load dimension. Sometimes they need to verify some of the member properties when making analysis.

// jcoira
 
If these member properties are common among loads then as I said previously make them dimensions. without seeing the data it is pretty difficult to say.

"Shoot Me! Shoot Me NOW!!!"
- Daffy Duck
 

What is the advantage of making them dimensions?

All I need is to get rid of the load dimension, which is growing at the same speed as the fact table. For each record in the fact table I insert one record in the load dimension. I think that is the cause of the bad performance, but I don't know how to modify the design to take care of it.

// jcoira
 
If you are getting one dimension row for each fact row, you probably do not have a dimension. You probably have an event. Events belong in fact tables. In this case, the event is the receipt of a load. There appears to be only two dimensions, time and variety. Everything else is facts associated with the receiving event. Facts do not necessaritly have to be numeric, although they often are.


Sometimes the grass is greener on the other side because there is more manure there - original.
 

I would say YES, it is like an event. Each time I receive a load with oranges I must record some quality measures(numeric additive facts) and attributes of the load like the carrier, the status and like 20 more.

Is it ok to put those 20 attributes as non-additive measures in the fact table?

Thanks,
// jcoira
 
Yes, use the fact table. The user might be interested in counting them, for instance. The status of a load is a fact, and the user might be interested in knowing all loads with status of "not unloaded", or "not quality inspected".

Sometimes the grass is greener on the other side because there is more manure there - original.
 
After further thought, you probably need a carrier dimension, a status dimension, etc, to hold the description of the codes, and as MDXer said, use a surrogate (or synthetic as the latest jargon goes) key to join to the carrier dimension. If you do not have code values for carrier and status, I would consider adding them.

The fact table still holds the event of the receiving, since the status and carrier of each load is not unique to that load, you might want to pull them out into their own dimensions.

Sometimes the grass is greener on the other side because there is more manure there - original.
 
My main concern is not which dimension to use, but what to do with this LOAD dimension. I just dont feel comfortable including 20 attributes that describe loads into the fact table. On the other hand the load dimension with several thousands of loads is killing the performance of MS Analysis Services. I guess I will try to squeeze the attributes in the fact table and see what happens.

Thanks for the support.
// jcoira
 
If you have a good understanding of the business requirements, you could put the most popular attributes in the fact table and the least used in a separate "dimension" table.

If you want, send your attribute list and I'll take a stab at what I think your star schema should look like.

Sometimes the grass is greener on the other side because there is more manure there - original.
 
This is how the fact table and load dimension look like. Notice that I have the following dimensions in the design, and one surrogate key in the fact table for each one.

Dimensions:

Time_SK, Season_SK, Load_SK, Grower_SK, CropVar_SK, Supplier_SK, DestWarehouse_SK, Fieldsman_SK, Contract_SK

Load dimension:

CREATE TABLE [dbo].[DIMLoad] (
[Load_SK] [int] IDENTITY (1, 1) NOT NULL ,
[LoadNumber] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[LoadStatus] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[LoadTypeDescription] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TicketNumber] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ReceivingUOMCode] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ReceivedDate] [datetime] NULL ,
[TimeIn] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TimeOut] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CertificateNumber] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CarrierVendorNumber] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[RegistrationNumber] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[OriginLot] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DestLot] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DGLNumber] [numeric](18, 5) NULL ,
[LoaderNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DriverNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TransportDistance] [numeric](18, 5) NULL ,
[TrailerNumber] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TruckCode] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TripCode] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TRDistance] [numeric](18, 5) NULL ,
[TRDistanceUOM] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TRRate] [numeric](18, 5) NULL ,
[TRRateUOM] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TRDesc] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[LoaderRate] [numeric](18, 5) NULL ,
[TRType] [varchar] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO

Fact table:

CREATE TABLE [dbo].[FACTGrading] (
[Time_SK] [int] NOT NULL ,
[Season_SK] [int] NOT NULL ,
[Load_SK] [int] NOT NULL ,
[Grower_SK] [int] NOT NULL ,
[CropVar_SK] [int] NOT NULL ,
[Weighbridge_SK] [int] NOT NULL ,
[Supplier_SK] [int] NOT NULL ,
[DestWarehouse_SK] [int] NOT NULL ,
[Fieldsman_SK] [int] NOT NULL ,
[Contract_SK] [int] NOT NULL ,
[GrossWeight] [numeric](18, 5) NULL ,
[NetWeight] [numeric](18, 5) NULL ,
[TotalWeight] [numeric](18, 5) NULL ,
[TotalWeightSG] [numeric](18, 5) NULL ,
[AllocatedQty] [numeric](18, 5) NULL ,
[PaidWeight] [numeric](18, 5) NULL ,
[Quality1_W] [numeric](18, 5) NULL ,
[Quality1_P] [numeric](18, 5) NULL ,
[Quality2_W] [numeric](18, 5) NULL ,
[Quality2_P] [numeric](18, 5) NULL ,

................

[Quality20_W] [numeric](18, 5) NULL ,
[Quality20_P] [numeric](18, 5) NULL ,
) ON [PRIMARY]
GO

 
The first thing I notice is that the Load dimension has a "Truck Code" and "Trip Code". Do these have lookups in yet another table (i.e. snowflake design)? If so, that will hurt your performance considerably and you should denormalize the Truck and Trip Code descriptions into the dimension.

I still believe that since the fact table and Load dimension share a 1:1 cardinality, they are actually one very wide fact table. If, as I mentioned earlier today, you can determine that certain attributes are not used as often as others, you could establish a 1:1 relationship between the load fact table and (say) the load additional fact table.

I stick by my guns that you need a fact table for the Load Event. Now, I see that there is a Load Status in the Load dimension. Does this status change from receiving to unloading to complete? If so, then that status field is definitely part of the fact. Receiving UOM would also seem to belong in the fact.

Another point to ponder. I see that you have at Time key in the Fact table, but also have a Received Date in the Load dimension. Seems to me that the Received Date belongs in the fact (unless those two fields are duplicates).

I also see another FACT tab;e, perhaps, called Load Transportation which would consist of alot of the fields in the LOAD RECEIPT fact table. This would be an event-based fact table.

TicketNumber
ReceivedDate
TimeIn
Time Out
CertificateNumber
CarrierVendorNumber
RegistrationNumber
OriginLot
DestLot
DGLNumber
LoaderNumber
DriverNumber
TransportDistance
TrailerNumber
TruckCode
TripCode
TRDistance
TRDistanceUOM
TRRate
TRRateUOM

Well, I've given you a few more ideas to think about. You might also want to review the Transportation and Procurement chapters of The Data Warehouse Toolkit, 2nd edition, by Ralph Kimball.

Sometimes the grass is greener on the other side because there is more manure there - original.
 
Thanks johnherman

You have given me a good starting point to improve the design and the performance. I will try to include the load dimension in one fact table and see what happens.

I will also take a look at Kimball's book, as you suggested.

// jcoira

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top