Hi everyone, I'm currently in the process creating a data warehouse for my company. The purpose of this data warehouse is to track work that is being done on various projects, customer tracking, and billing for work done (A fairly typical request). I've identified a few business processes that would be beneficial to track in this warehouse and I choose one of these process to begin with.
The current process I'm attempting to define is the work done by a resource for a project every week (sine my company tracks work on weekly basis). From what I've determined so far the fact table would be the charges for work (hours worked, hourly cost, etc...). The dimensions I have are Resource, Task, Project, Date (and potentially more as I continue to define the schema). However as I've been defining this process something keeps bothering me, whether or not I should have two seperate dimensions for Task and Project.
A Project is simply as collection of tasks with potentially other data tagged on (description, project number,...), and a task is a description of work to be done for a project. I've been debating with myself whether I should keep them separate or simply include the Project information with the task(i.e. project number, description,...). Both seem to have positives and negatives. If I keep them separate then there will be far less redundancy, but at the cost of an addition join (more of a snowflake schema, due to a foreign key in Task back to the Project it belongs to). If I combine them I get the advantage of less joins which will give better query performance, but at the cost of redundancy.
So finally getting to my question (sorry for the long winded explanation), what would people here do given this situation? I know there are some very experienced people here and I though it would be useful to get those opinions. If you need me to explain anything further I will be more than happy to.
Thanks,
Tim
The current process I'm attempting to define is the work done by a resource for a project every week (sine my company tracks work on weekly basis). From what I've determined so far the fact table would be the charges for work (hours worked, hourly cost, etc...). The dimensions I have are Resource, Task, Project, Date (and potentially more as I continue to define the schema). However as I've been defining this process something keeps bothering me, whether or not I should have two seperate dimensions for Task and Project.
A Project is simply as collection of tasks with potentially other data tagged on (description, project number,...), and a task is a description of work to be done for a project. I've been debating with myself whether I should keep them separate or simply include the Project information with the task(i.e. project number, description,...). Both seem to have positives and negatives. If I keep them separate then there will be far less redundancy, but at the cost of an addition join (more of a snowflake schema, due to a foreign key in Task back to the Project it belongs to). If I combine them I get the advantage of less joins which will give better query performance, but at the cost of redundancy.
So finally getting to my question (sorry for the long winded explanation), what would people here do given this situation? I know there are some very experienced people here and I though it would be useful to get those opinions. If you need me to explain anything further I will be more than happy to.
Thanks,
Tim