Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Cube Design for Survey Data

Status
Not open for further replies.
Oct 13, 2005
17
US
Hi All,

I'm struggling with the appropriate way to design my database/cube. I'll start with some background. I'm trying to develop a cube that will allow users to see results of survey data. This survey data consists of what we call demographic questions and metric questions. the demographic questions describe the respondent (ie. gender, age, geography) and the metric questions are the actual facts we are collecting and aggregating on (ie. On a scale of 1 - 7 how happy are you at work?). My current setup is a fact table with answers to the survey questions, FactResponse. So if it's a metric question, it will have a fk in to the question table. It will also have a responseValue column that you can aggregate on. This table includes a respondentID. So a respondent ID would have multiple rows in the fact table. (one for every question in the survey including the demographic questions) I have a few dim tables, dimSurvey (which of our many surveys this fact is from), dimOrganization (which company the response belongs to), dimDate (when the survey was completed) and where I'm struggling the most dimQuestion.

Details on dimQuestion: This table includes a row for every answer choice for every question. so the question, "On a scale of 1 - 7 how happy are you at work?" there are 7 rows. the columns of this table are basically a PrimaryKey, Question Text, QuestionAnswer. Keep in mind that this table also includes the demographic questions (ie. "What is your gender?" has 2 rows "Male" and "Female")

So what i need to develop is a cube where i can get the average of the question "How happy are you at work" and slice it by multiple demographics.

So how happy are males between the ages of 18-25? The answer would be the average of ResponseValue column of the FactReponse table for that particular question that matches those demographics. (of course the demographics are on seperate rows within that fact table)

So to me, while trying to design this, i find myself apparently needing to slice by the dimQuestion dimension multiple times.(so that i'm slicing by gender and age). I can't grasp in my head the appropriate way to set this up and i'd love to hear some opinions from people with more experience than me.

Some notes to consider. Each time a company takes a survey, they are able to add their own custom questions specific to their company. So they may add a business unit demographic question that has their specific BUs and i'd need to cut by that. (it would be an additional question in the survey) Likewise they could add a custom metric question that would need to be aggregated on.
 
SSAS is a Business Intelligence tool....so it's kind of hard to utilize all of the Business Intelligence features while remaining completely generic.

If this was my project, I would create a standard Demographics dimension as well as a Location dimension (for your business unit example) in addition to other possibilities. I would ETL the demo questions (what gender are you) to the Demographics dimension. Obviously, not all surveys will have the same demographics questions, so you could end up with a bunch of UNKNOWN attributes. Conversely, you could break this out further and store foreign keys in your fact table to individual dimensions such as Income Range, State of Residence, etc. You would ultimately end up with a bunch of dimension records describing the survey and respondent in addition to some fact records describing the respondents' ratings. In other words, not all questions are created equal.

This creates a lot more work on the ETL side, but I think if you look at this a Header/Response scenario instead of just a bunch of random questions on a form, you'll have an easier time analyzing the data.
 
Thanks for the reply,

If I'm following you correctly, you would have a dimension for each custom demographic (such as business unit). But every time we upload data, we could have new custom demographic questions, so a new dimension would need to be built for each of those. This seems like it would quickly spiral out of control. We'd like to have the number of dimensions fixed.
 
Nope. I've never worked for a survey company, but I know they are out there, and I'm saying that I assume that they have figured out the baseline demographics. They know that many businesses are interested in geographic region, and gender, and income level and "distance located to us." So you would create those dimensions and attributes before-hand, and then map them to the proper places in your ETL.

Yes, you're going to get a lot of one-off, never-thought-of-that things. In those cases, you have two choices--add them to your generic survey dimension and try to make due with your generic analysis, or modify your standard dimensions to include new attributes.

Again, since SSAS is designed mostly for use within the Organization, it's kind of hard to make much use out of it without including specifics about the Organization. So if you go completely generic, you're going to end up with something like a two-level hierarchy of survey questions thrown together in a big mix and match list. True, you could whip up some custom MDX for each client to produce their reports, but if you go that route, why even use SSAS? Just use the relational database.

 
I'd also be inclined to say that the usefulness of an SSAS application is a result of the effort and precision put into architecting the solution (in addition to other variables). So if you want to just design something very basic, set it on auto-pilot, and expect to fulfill all of your clients' needs, I'm betting it won't cut the mustard.
 
ok, well regardless of the custom question issue. What if i wanted to cut by 2 standard demographics (say age and gender) if they are both in the dimDemographic dimension, I can't have the same dimension on 2-axis in a query.

The reasoning for using SSAS is we have a lot of data that needs to be quickly filtered. Our fact table contains a few million rows, and this is growing steadily every month. The queries we right from the relational database are complex and slow.

I'm in the process of putting together some proof of concepts to see if SSAS will speed things up. So far it's been dramatically for the queries i've been able to run, but i haven't the solution to these issues you see in my original post.

Thanks for the comments, I would love to hear more suggestions.
 
Age and Gender would be two separate attributes, which means two separate hierarchies, which you can easily place one on the x-axis and one on the y-axis.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top