Hello,
I have been thinking about the best schema for a DW for clickstream analysis including sales.
What comes to mind first for web behavior is a page_view fact table with dimensions like session, user, and referrer. We also want to track sales, however, in relation to this behavior. So, can sales itself then become a dimension table of page_view?
Also, we want a classic sales_fact table with the usual dimensions like product, time and customer.
My question is though, it seems we have sales information then in two different places, one for analyzing in relation to web surfing, and another as an actual fact table. (In addition "user" for clickstream is also a customer dimension for sales_fact.
How best should we manage this? Is it a bad thing to have two separate star schemas where user and customer and identical and where sales (from page_view) and sales_fact are also quite similar? It seems like we are adding a lot of duplicate information. Is there a more elegant way to handle this?
I appreciate your input.
I have been thinking about the best schema for a DW for clickstream analysis including sales.
What comes to mind first for web behavior is a page_view fact table with dimensions like session, user, and referrer. We also want to track sales, however, in relation to this behavior. So, can sales itself then become a dimension table of page_view?
Also, we want a classic sales_fact table with the usual dimensions like product, time and customer.
My question is though, it seems we have sales information then in two different places, one for analyzing in relation to web surfing, and another as an actual fact table. (In addition "user" for clickstream is also a customer dimension for sales_fact.
How best should we manage this? Is it a bad thing to have two separate star schemas where user and customer and identical and where sales (from page_view) and sales_fact are also quite similar? It seems like we are adding a lot of duplicate information. Is there a more elegant way to handle this?
I appreciate your input.