Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

A question about defining dimensions

Status
Not open for further replies.

Timicho

Programmer
Apr 18, 2009
2
US
Hi everyone, I'm currently in the process creating a data warehouse for my company. The purpose of this data warehouse is to track work that is being done on various projects, customer tracking, and billing for work done (A fairly typical request). I've identified a few business processes that would be beneficial to track in this warehouse and I choose one of these process to begin with.

The current process I'm attempting to define is the work done by a resource for a project every week (sine my company tracks work on weekly basis). From what I've determined so far the fact table would be the charges for work (hours worked, hourly cost, etc...). The dimensions I have are Resource, Task, Project, Date (and potentially more as I continue to define the schema). However as I've been defining this process something keeps bothering me, whether or not I should have two seperate dimensions for Task and Project.

A Project is simply as collection of tasks with potentially other data tagged on (description, project number,...), and a task is a description of work to be done for a project. I've been debating with myself whether I should keep them separate or simply include the Project information with the task(i.e. project number, description,...). Both seem to have positives and negatives. If I keep them separate then there will be far less redundancy, but at the cost of an addition join (more of a snowflake schema, due to a foreign key in Task back to the Project it belongs to). If I combine them I get the advantage of less joins which will give better query performance, but at the cost of redundancy.

So finally getting to my question (sorry for the long winded explanation), what would people here do given this situation? I know there are some very experienced people here and I though it would be useful to get those opinions. If you need me to explain anything further I will be more than happy to.

Thanks,
Tim

 
I would keep them together. Don't worry about redundancy. Here's an example:

Code:
Project Task Dimension
-----------------------
ProjectTaskKey INT
ProjectCode  CHAR
ProjectName VARCHAR
TaskName VARCHAR
TaskType VARCHAR


Sample Data
----------------------
ProjectTaskKey:  1
ProjectCode:  PR2009003
ProjectName:  Upgrade Database Version
TaskName:  Backup current schema
TaskType:  Maintenance  

ProjectTaskKey:  2
ProjectCode:  PR2009003
ProjectName:  Upgrade Database Version
TaskName:  Create new database diagram
TaskType:  Developent

In this scenario, you have one dimension at the Task granularity. You can easily roll these up to Project, which is an attribute in the dimension. I've also added a TaskType which you should consider if you haven't done so. This will allow you to analyze the amount of time and money spent on certain types of tasks, regardless of Project.
 
I on the other hand would probably seperate the two, depending on the technologies used. The reason for this would be to evaluate the frequency of a task across all projects. This info would be beneficial in targeting what skills sets may be needed when adding staff to support more projects.

I typically build all my data to be analyzed in to MOLAP Cubes.

Lets say you have a BI project that is always behind schedule and everyone knows and agrees it is because the number of people asigned to the project can't handle the amount of work, but this is just 1 of say 20 projects you have. If I am using a MOLAP structure as my data source I would not have a unique TASK because if I had 3 projects each with ETL tasks they possibly would map to seperate projects. So if I was trying to analyze the amount of ETL work I would probably have to implement some custom grouping.

If I had a Tasks Dimension that contained a Hierarchy that was set in a way to where Tasks were unique and like tasks belonged to a single parent I could more easily analyze things.

TASKS
Project Management
Product Spec Review
Status Meetings
Project Plan Development
Documentation
Source to Target
ERD
Data Dictionary
Training Materials
Operational Documents
Development
ETL Development
Report Development
Defect Resolution
Cube Development
Custom MDX


This structure would support without custom groupings.

1) How much time as an organization is spent in all three major tasks PM, Documentation and Development

2)How much time is spent in these 3 areas for each project.

3) How much time is spent across the orginization on each Task

4) How much time is spent as per project on each task.


Number 4 would be more dificult if the tasks were children of Project but allows people to quickly see what specific resources are needed for a project.

Project A has 3 ETL developers assigned to it and they average 90 hours of ETL development a week and deliver projects ahead of milestones. Project B has 2 ETL Developers and averages 90 hours of development a week but is currently 2 weeks behind schedule. The structure as shown above could easily show this with the result possibly being ok Lets Move 2 resources to project B for a week to get back on schedule. Or even better you can tell what percentage of work is a specific task to better target what skills you really need in future employees.

You really just need to wiegh your requirements and see what is the better fit for you. I typically will preak subject areas out seperately and even though Projects contain tasks doesn't mean that in all analysis they are the same subject.

If you compare it to a sales model I see Projects are being a Purchase and tasks being the products purchased.
 
MDXer said:
The reason for this would be to evaluate the frequency of a task across all projects.

That's where the TaskType attribute comes into play in my model. You could easily have multiple hierarchies in my model. Consider the following:

Code:
*  Project
   *  Task

*  Project Type
   *  Task Type
      *  Task Sub-Type


Project:       Project Data Mart Development
Task:          ETL from Project Database to Staging Area
Project Type:  Datawarehouse Development
Task Type:     ETL
Task Sub-Type: Staging ETL

Project:       Project Data Mart Development
Task:          ETL from Project Staging Area to Project ODS
Project Type:  Datawarehouse Development
Task Type:     ETL
Task Sub-Type: ODS ETL

Project:       Project Data Mart Development
Task:          ETL from Project ODS to Project Dimension
Project Type:  Datawarehouse Development
Task Type:     ETL
Task Sub-Type: Dimensional ETL

So, in the above model, you could easily look for inefficiencies in your ETL group, or even specific ETL type. So the way I look at is that Task is a child of Project, and TaskType is a child of ProjectType. Children should be kept with Parents in the dimensional model. Therefore the main decision would be whether or not to separate the two hierarchies into separate dimensions. I would keep them together.
 
Thank you very much for your opinions on this. I think both bring valid arguments to the table. However, I'm tended to sway more towards RiverGuy's approach. As i've been defining this schema I keep feeling like the two dimensions should really be combined. I like the idea of simply adding the project and task type to the task table, because its still allows me to rollup to the task type or project level, and I believe it will be simpler for management to understand when they review the proposed schema. All that said I really like the analogy MDXer made in comparing this to sales model. I agree very much that the project can be compared to purchase and the individual tasks would be the products, from a conceptual level.

Again thank you very much both MDXer and RiverGuy. You both gave me very insightful opinions and gave me a better understanding on how to attack this.

Thanks,
Tim

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top