Difficult Crosstab report - need your design advice! 1

dlutzker · Aug 4, 2004

I want to create a crosstab report showing what programs/courses students are likely to take next.
Here is the crosstab I would like to create:

Program_2 Name
Program_1 Name Metric: Count of Programs Completed
Program_A2 Program_B2 Program_C2
------------------------------------------------------------
Program_A1 --- 1000 2000
Program_B1 500 --- 1000
Program_C1 1000 3000 ---

Program_n1 is the program taken prior to Program_n2.
I have a star/snowflake hybrid model. There is a time dimension for the date the program was completed, Student dim, Program dim, and Programs_Completed_indicator in the Program_fact table.
Any ideas how to create this report?
I think I will need an additional view in the model, perhaps with a ranking of Program and Student over time.
Then define an attribute for each possible rank, using something like

Code:

ApplySimple("CASE WHEN #0 = 1 THEN #1 ELSE NULL end", [Program_Sequence_Rank], [Program_1])

Another Attribute:

Code:

ApplySimple("CASE WHEN #0 = 2 THEN #1 ELSE NULL end", [Program_Sequence_Rank], [Program_1])

And so on.
This seems an ugly way to do it, and will limit the scope of the analysis. Is there a better way??

All help will be appreciated!
Dave

entaroadun · Aug 4, 2004

Could you please go into your data model in greater detail?

dlutzker · Aug 5, 2004

Program_Completer_Fact
Program_Completer_Fact_ID (PK)
Program_Dim_ID (FK)
Start_Time_Dim_ID (FK)
End_Time_Dim_ID (FK)
Contact_LU_ID (FK)
Days_To_Complete (int)

Completer_End_Time_Dim_VW
View is defined as select * from Time_Dim

Contact_LU
Contact_LU_ID (PK)
Chapter_LU_ID (FK)
Job_LU_ID (FK)
Contact_Type_LU_ID (FK)
Company_LU_ID (FK)
Matriculation_Time_Dim_ID (FK)
Contact_Name1
Contact_Name2

We have found the need to add a snapshot (materialized view)
Completer_Exam_Seq_SS
defined as

Code:

SELECT
    oef.program_dim_id,
    oef.contact_lu_id,
    SUBSTR(
	MAX(DECODE(oef.taken_sequence, 1, ', ' || ed.exam_name, NULL)) ||
    MAX(DECODE(oef.taken_sequence, 2, ', ' || ed.exam_name, NULL)) ||
    ...
    MAX(DECODE(oef.taken_sequence, 20, ', ' || ed.exam_name, NULL)), 3) taken_exam_sequence,
    SUBSTR(
    MAX(DECODE(oef.taken_sequence, 1, DECODE(oef.exam_pass_flag, 1, ', ' || ed.exam_name, NULL), NULL)) ||
    MAX(DECODE(oef.taken_sequence, 2, DECODE(oef.exam_pass_flag, 1, ', ' || ed.exam_name, NULL), NULL)) ||
    ...
    MAX(DECODE(oef.taken_sequence, 20, DECODE(oef.exam_pass_flag, 1, ', ' || ed.exam_name, NULL), NULL)), 3) passed_exam_sequence
FROM ordered_exam_fact_ss oef,
program_completer_fact cf, 
exam_dim ed
WHERE
(oef.contact_lu_id = cf.contact_lu_id AND oef.program_dim_id = cf.program_dim_id)
AND oef.exam_dim_id = ed.exam_dim_id
GROUP BY
    oef.program_dim_id,
    oef.contact_lu_id;

This snapshot has more to do with requirements for reports from the Exam_Fact schema, which shares a few dimensions (Time, Program, Contact). It is on the list of tables for the Completer and Program attributes. It uses Ordered_Exam_Fact_SS which adds a column using the rank() function:

Code:

SELECT
    ef.*,
    RANK() OVER (
        PARTITION BY program_dim_id, contact_lu_id
        ORDER BY exam_time_dim_id, exam_fact_id
    ) taken_sequence
FROM exam_fact ef
WHERE exam_sat_flag = 1

entarodun - I was hoping you would be around to help me out, because I really don't know how to go about this. If you want more info, I am on hand to supply it.
And anyone else who wants to add their 2 cents, please go ahead!
Dave

entaroadun · Aug 5, 2004

Could you go into greater detail the taken_exam_sequence and passed_exam_sequence columns? Also, please provide the structure of the oef and exam_dim tables.

dlutzker · Aug 5, 2004

entaroadun - thanks for taking the time, but I don't think these tables will help in solving the problem. They were added to a different star-schema, Exam_Fact. I am working on creating something similar for the Program_Completer_Fact. The tables you are asking about were ranking exams taken by the students, which is at a lower level. Still, here is what you asked for....
oef is the alias for ordered_exam_fact_ss, which is defined in the second code block in my first post.
taken_exam_sequence is the column showing the order in which a given student took all his exams. It is a varchar() field that is the concatenation of the names of all the exams that that student took. There is one row for each student - program combination.
passed_exam_sequence is the same as the taken_exam_sequence, but it includes only those exams that were passed.
This was an ugly solution that we used to meet a requirement for a report showing the most common sequence of exams that students used to complete their program.

The current requirement is to show what programs students choose next after completing a program. I am thinking along the lines of using something similar to the above, but for program_completers:

Code:

Create table SDW_Completer_Program_Seq as
select program_short_label, f.contact_lu_id, time_date, RANK() OVER (
        PARTITION BY f.contact_lu_id
        ORDER BY f.end_time_dim_id, f.program_dim_id) program_seq
from    sdw_completer_fact f join sdw_program_dim p on f.program_dim_id = p.program_dim_id
    join sdw_completer_end_time_dim_vw t on f.end_time_dim_id = t.end_time_dim_id
where f.end_time_dim_id > 0 
order by f.contact_lu_id, program_seq

Now, I am not sure what to do with this. My report will not show a varchar() field of program names, so I can't do the same thing that we did for exams.
What do you think? I am working on some ideas now, using ApplySimple in two Program_name attributes, one for the rows and one for the columns. I will post my idea when I have some sql...
Thanks.
Dave

entaroadun · Aug 5, 2004

OK, I think I understand the requirement. I have some ideas, but first let me ask you a question: how do you want to deal with students (contacts) who start a second program before completing the first one?

entaroadun · Aug 5, 2004

OK, I see from the MV you are creating that you are not looking at program start time at all. I would simply build another fact table. However, first I would alter SDW_Completer_Program_Seq to use program_dim_id instead of the description.

Code:

CREATE TABLE SDW_Completer_Program_Next AS
SELECT  scps1.contact_lu_id
  ,     scps1.program_dim_id AS program_dim_id_1
  ,     scps1.program_seq AS program_seq_1
  ,     scps2.program_dim_id AS program_dim_id_2
  ,     scps2.program_seq AS program_seq_2
FROM    SDW_Completer_Program_Seq scps1
  JOIN  SDW_Completer_Program_Seq scps2
    ON  scps1.contact_lu_id = scps2.contact_lu_id
    AND scps1.program_seq + 1 = scps2.program_seq

Model your students to contact_lu_id and program to program_dim_id_1. Create a new attribute called "Next Program" and point it to program_dim_id_2.

Put the Next Program attribute into your crosstab.

entaroadun · Aug 5, 2004

Oh, and make Next Program a one-to-one parent of Program.

dlutzker · Aug 5, 2004

entaroadun - good question. I would ideally like to include someone who started another program, but not completed it in the analysis. However, that data will only be found in a different schema, and I fear that it will be overly complex too achieve.

I would also like to include the progression that a student makes from their 2nd completed program to their 3rd completed program. This is also difficult, but more realistic.

I am willing to compromise on the incomplete program. I am less willing to compromise the 3rd, 4th etc. completed programs.

As a last resort, I can provide an analysis of 1st to 2nd programs on one report, and 2nd to 3rd on a separate report, and so on.

Isn't this fun??
Thanks.
Dave

dlutzker · Aug 5, 2004

I like your suggestion. I could perhaps add another FK to the existing fact table for Next_Program_Dim_ID?
Do you think there are advantages to that? I try to explain some of the funky things that I have seen Microstrategy do (the sql engine) and it seems to use all the information it has about the schema whenever it wants.

With an additional fact table, won't I run the risk of having it used when I don't want it used, and resulting in bad data??

Between these two alternatives, your suggestion and an added FK column, which do you think is better? Pros and cons?

Thanks.
Dave

entaroadun · Aug 5, 2004

There's really no difference between the two. It really depends on your ETL. Personally, I would put the sequence number and the next program ID in the fact table instead of creating a new table, just for the performance lift.

MSTR doesn't use tables willy-nilly, if you know EXACTLY what you want. The biggest issue I have seen is cardinality: if you have tables that 'drop' records because they represent relationships that don't apply to all elements of a given attribute.

The fact table I proposed falls into that category, but I really don't see how MSTR would use it.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Difficult Crosstab report - need your design advice! 1

dlutzker

Programmer

entaroadun

Programmer

dlutzker

Programmer

entaroadun

Programmer

dlutzker

Programmer

entaroadun

Programmer

entaroadun

Programmer

entaroadun

Programmer

dlutzker

Programmer

dlutzker

Programmer

entaroadun

Programmer

Similar threads

Part and Inventory Search

Sponsor