Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Confused about FACT tables

Status
Not open for further replies.

BOOZ

Programmer
Oct 16, 2002
2
US
I am totally new to data warehousing. I have an Oracle table that has Meeting data in it, and I have child tables with attendees, sites, etc. To start all I want to do is create a warehouse that can tell me how many meetings were in the month of Jun, or in FY 2002. Do I need to put the date of the meeting in the fact table? I created a table called
mtg_by_month that had monthid (1 for Jan, etc.), and Tot_no_conf where I just did a
count when I loaded in the data. However, what would I put in my fact table besides monthid. And then if I wanted it by FY, would that have to be a seperate fact table?
Thanks for any advice/help - Booz
 
Hi Booz,
What you have just created is a VERY simple, periodic snapshot table. Technically, there is nothing else you need to put in it if all you want to know is the total number of meetings in a month, for all sites, for all attendees.

But I have to wonder how much useful Business Intellegence you're getting out of that table. :)

I suspect you really don't need an aggregate table at all. They are usually created to speed reporting because of massive numbers of fact rows. Just how many meetings does your company hold? ;-)

Try creating a fact table where each row represents a single attendee at a single site and a single meeting on a single date. The dimensions would be Attendee, site, Meeting, and Date.

And if you haven't yet, make sure you read Ralph Kimball's book "Data Warehousing Toolkit (2nd Edition)".

Hope that helps. Good luck,
Matt :)
 
Thanks Matt, actually we hold/host hundreds of thousands of meetings every year, mostly thru teleconferencing (which count in the database). I actually ordered the book already. It seems like I am having a hard time with the concept of how the dimension table is used. I created a dimension table for month - but what am I actually using it for, it is really just a lookup table for the user? - thanks again.
 
Hello Matt,

We discussed aggregates before in another thread quite extensively and I won't bore the forum with another discussion on that topic. I am honestly interested in your remark about :

*** But I have to wonder how much useful Business Intellegence you're getting out of that table ***

Since i am trying to build up reports that go beyond just gathering month's data in big tables, I was wondering if you could derive a higher level of BI with this sort of data at a more atomic level. Can you comment on this , perhaps with an example......? T. Blom
Information analyst
tbl@shimano-eu.com
 
I think you are keeping the month level data in the "Tot_no_conf" FACT table . In the "mtg_by_month" DIMENSION keep the records with different id for months of each year, starting with the earliest record you are loading i.e keep different id for Jan2002, Jan2003 etc. You also keep the start date as another column in the DIMENSION table. You load this data when you load the corresponding data in the FACT tabel.

When you want the record for a particular month you query the DIMENSION table get the month id for it, then go to the FACT to get the record for it.

If you want the data for the FY (Financial Year) then you can get the id s for the months that fall in the FY and aggregate from your month table (Max 12 rows if your data in the FACT table is at a monthly level!!!)

This may solve your immediate question but try going for a detailed diensional model as suggested by others.
Thanks!
 
Hi Booz,

While hundreds of thousands of meetings a year may sound like a lot, in DW terms, it's still pretty small. :) Don't assume the numbers will be a problem as long as you aren't severly limited in processing power or storage space.

As far as concept of Dimensions go, don't think of them as lookup tables. Think of them as the beginning point of any question about your facts. Most filtering criteria in your queries will come from your dimension tables.

The date table in particular is hit a lot because almost every business question involves time periods. "How did I do last month compared to the previous month? How much have I sold year-to-date compared to year-to-date last year?"

Probably the most important point of dimensions is that they are used over and over by many fact tables. Don't think of your date table as being a lookup table for your monthly meeting facts, think of your monthly meeting fact table as being one of many facts that tie to dates.

And I would make your date table "grain" be daily. On every row add a column that indicates what the month is. For example, you'd have rows for 2002-01-01 and a 2002-01-02 and both of them would have a Year_Month column of 2002-01. This will let you ask both daily and monthly questions. If necessary, you could also create a monthly level view of the date table.

Kimball has a great discussion on dimensions in general and the date dimension in particular.

Good luck,
Matt :)
 
Hi Blom,

I'll try to avoid re-thrashing the aggregate table discussion again too. But it might creep back in. :)

My comment about the BI value of Booz's original table was based on the fact that all you could apparently get out of it was the number of meetings held each month. Nothing about attendees, nothing about Sites, not evan any information about the individual meetings. How many questions are you going to answer when the only answer you can give is, "We had 25,326 meeting in May."?

But the real point is, even if someone asks that question and I can answer it accurately, wouldn't you expect them to ask a follow-up question? "How did these break out by site?" "What were the average number of attendees?" "How many were overbooked?" None of those questions can be answered from that table. It answers one question.

To me, the value of BI is not it's ability to answer the question the user asked, but rather to answer the NEXT question they will ask.

That question may roll up to a higher level of summary or it may drill down to a lower level of detail or it may shift over to another, related area of the business ("Gee, I'm not selling many 40 megawatt widgets this month, but sales are down all across the world. It's obviously not a sales problem, per se. Let's take a look in manufacturing and shipping of widgets.")

The difficulty with aggregate tables like Booz originally proposed (oops, aggregates slipped in again) is that they are designed to answer specific questions but are of limited help after that question is asked. They don't help much with the next question.

If I keep the facts at the atomic level, I can always roll up but if I pre-aggregate, I lose detail that I can never regain. That's why Kimball always recommends keeping the atomic facts, even if you create aggregate tables.

Now obviously, there is a potential performance hit in always rolling up the detail so folks look for ways to address that problem. Creating aggregate tables is one of those ways. If all you have is SQL, it is about the only way. Of course, most users don't actually write SQL, they use some BI application (or at least they would prefer to). BI tools can broaden your options. If you can get acceptable performance and be able to ask the NEXT question, then it's good BI, with or without aggregate tables.

My goal is to build the simplest data structure possible to answer the broadest range of questions in the shortest possible time. As I said before, you can always add complexity to a solution. You can't add simplicity.

I hope I clarified rather than confused.
Matt :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top