Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Left Join Performance and Missing Data Issues

Status
Not open for further replies.

sxschech

Technical User
Jul 11, 2002
1,033
US
Apologies for the long SQL statement. I removed the SELECT and GROUP BY portions and am only showing the FROM section (hopefully that will be enough to go on). When we run the query on only one college 'SC' and Student Population 'T' for example, it takes about two minutes and comes close to the numbers we are expecting. If we comment out Population 'T' in order to include all Population, it goes from 2 mins to 30 mins. Any thoughts about how we might speed up the query and more importantly, why when we include more than one college the results are lower overall. In the example for SC when run COLLEGE='SC' only, the total is 135, when run using COLLEGE in ('LA','SC','BN','NS','ED'), the total for SC shows 75. If you need more of the SQL statement, please let me know. We are doing multiple Left Joins as we need to get different status and dates and be able to show one record if it exists per person out of the multiple status and dates. sxschech.cutoff is a table that contains a "Cutoff Date", the date we want to restrict on.
Code:
            from sxschech.cutoff, mst_person p, 
            mst_Admissions_Application a   

LEFT JOIN 
            (SELECT *
             FROM (SELECT decision.*,
                     ROW_NUMBER () OVER (PARTITION BY person_uid ORDER BY decision_date DESC, decision_number DESC)
                                                                          dt
                   FROM mst_admissions_decision decision, sxschech.CUTOFF 
                   WHERE decision_date IS NOT NULL AND decision_date <= cutoff_dt
                   and academic_period in ('200830','200840')
                  )
             WHERE dt = 1 
            ) ad 
        ON  a.APPLICATION_NUMBER = ad.APPLICATION_NUMBER 
        AND a.ACADEMIC_PERIOD = ad.ACADEMIC_PERIOD 
        AND a.Academic_period in ('200830','200840')
        AND a.PERSON_UID = ad.PERSON_UID
        AND a.Curriculum_Priority = 1

            LEFT OUTER JOIN
   (select entity_uid, state_province, 
    row_number() over (partition by entity_uid order by address_type, preferred_address_ind desc, address_seq_no desc) as rnum
    from mst_address
    where address_type in ('MA', 'PR')
    and address_status_ind is null                        -- active and current
    and (address_end_date is null or address_end_date > sysdate) ) d
     ON a.person_uid = d.entity_uid
    and d.rnum = 1
    
             LEFT OUTER JOIN mst_admissions_attribute aa
     ON a.person_uid         = aa.person_uid
    and a.academic_period    = aa.academic_period
    and a.application_number = aa.application_number
    and aa.admissions_attribute in ('FG', 'NFG', 'DFG')
    
            LEFT OUTER JOIN deposit_jl de                          -- overroad mtt_ar_deposit, now includes deposits, prepaids and released B.lott 6/26
     ON a.person_uid = de.account_uid
    and a.academic_period = de.academic_period 
    and de.detail_code <> 'TDEP'           
    and de.academic_period > '200800' 
      
     LEFT OUTER JOIN academic_study e
     ON a.person_uid = e.person_uid
    and a.academic_period = e.academic_period
    AND a.student_level = e.student_level

       LEFT OUTER JOIN mst_admissions_requirement r                    -- waived deposits
     ON a.person_uid = r.person_uid
    and a.academic_period = r.academic_period
    and a.application_number = r.application_number
    and r.requirement = 'TDEP' 
    and r.academic_period > '200800'
    and r.requirement_deadline_date IS NULL 

      LEFT OUTER JOIN
       (SELECT *
             FROM (SELECT person_uid,academic_period, application_number, decision_date as acc_date,
                     ROW_NUMBER () OVER (PARTITION BY person_uid ORDER BY decision_date, decision_number)
                                                                          dt
                   FROM mst_admissions_decision, sxschech.CUTOFF 
                   WHERE decision_date IS NOT NULL AND decision_date <= cutoff_dt
                   and academic_period in ('200830','200840')
                   and DECISION IN ('AE', 'AC', 'CC', 'CD', 'CP', 'CX', 'DC', 'HN', 'N6', 'N7', 'N8', 
                    'PB', 'PH', 'PT', 'QA', 'TL', 'TO', 'US', 'WA')
                  )
             WHERE dt = 1 
            ) acc 
        ON  a.APPLICATION_NUMBER = acc.APPLICATION_NUMBER 
        AND a.ACADEMIC_PERIOD = acc.ACADEMIC_PERIOD 
        AND a.Academic_period in ('200830','200840')
        AND a.PERSON_UID = acc.PERSON_UID
        AND a.Curriculum_Priority = 1   

WHERE p.PERSON_UID = a.PERSON_UID
  AND a.application_date <= cutoff_dt 
  AND a.Academic_period in ('200830','200840')
      AND SUBSTR(a.PROGRAM,1,2) <> 'ND' 
      AND a.PROGRAM IS NOT NULL  
      AND a.STUDENT_LEVEL = 'UG' 
      AND a.STUDENT_POPULATION <> 'S'
      AND a.STUDENT_POPULATION = 'T'
--      AND college in ('LA', 'SC', 'BN', 'NS', 'ED')
      AND a.college = 'SC'
 
I think the problem probably lies with the joins where you have included a constant value e.g.

ON a.APPLICATION_NUMBER = acc.APPLICATION_NUMBER
AND a.ACADEMIC_PERIOD = acc.ACADEMIC_PERIOD
AND a.Academic_period in ('200830','200840')
AND a.PERSON_UID = acc.PERSON_UID
AND a.Curriculum_Priority = 1

or

ON a.person_uid = d.entity_uid
and d.rnum = 1

Move all of these out of the join conditions and either into where clauses if they are from the main tables or into the inline views if they are associated with the secondary tables.

 
Dagon,

I tried your suggestion and commented out all the academic_period and other fields that had '= 1' (curriculum_priority, dt rnum) and put them in the where clause at the bottom of the query. In running only for 'SC', the count dropped from 135 to 126 (this is not good as the total should be 135). I looked at the unmatched records and it seems related to the address query. If I take this out, I get 135, if I leave in and don't make it =1, then I get multiple records. I then did a SELECT * on everything else and added the outer join to the overall query for the address portion which yielded the correct result.

I have not seen an improvement in the time it takes to run the query. However, if I take out the Address portion (where we grab the state) altogether and run for 'SC' and comment out 'T', the time dropped by 1/2 from 30 mins to 15 mins. This still seems quite long and won't provide us one of the fields we need displayed. Also, this is only with one college. When I open it up to all the colleges, the query takes one hour to run and counts are down. Even without the address portion, SC, for example shows 76 rather than 135. I don't understand why the count per college would go down when I add more colleges.

Address portion
Code:
       ) MAIN
                   LEFT OUTER JOIN
(select entity_uid, state_province, 
 row_number() over (partition by entity_uid order by address_type, preferred_address_ind desc, address_seq_no desc) as rnum
 from mst_address
 where address_type in ('MA', 'PR')
   and address_status_ind is null                        -- active and current
   and (address_end_date is null or address_end_date > sysdate) ) d
     ON MAIN.PIDM = d.entity_uid
    and d.rnum = 1
 
What exactly is it you are counting ? Is that the number of records from mst_Admissions_Application with college = 'SC' or something else ?

You need to establish what the granularity of the query is. Start with just a query against mst_Admissions_Application. Does this give you the counts you want or do you have to join to some other tables ? Work towards the smallest query possible that gives the correct counts, then start adding the other tables to it. Each one should be joining one-to-one with the base query, so shouldn't change the counts.

I'd also avoid mixing the ANSI join syntax with the Oracle join syntax. For example, in the original query, you use LEFT OUTER for everything except MST_PERSON, which you join using Oracle syntax.
 
sx,

may I also suggest that you consider a divide and conquer approach. What I mean is to start with just the data tables and starting from the statement of requirement, and with no reference whatsoever to the existing query, build a query up piece by piece, using inner joins where possible. At each stage check that you're getting and avoiding the expected rows.

By slowly building it up and doing frequent timing tests, you can isolate those parts of the query causing trouble, and then take remedial action.

Trying to fix a 'whopper' in one go is always difficult.

This looks like a bit of sql server style sql ported to oracle - is that correct?

Regards

T
 
Hi Dagon and thargtheslayer, was out on Friday.

I ended up creating a table from mst_Admissions to get the specific population. I have rewriting the query to look at created table for the total counts. So far, using the new table as the source, is giving the correct counts for all the colleges. Also, this cut down the running time from hours to 3 mins. I still have some more tweaking to do, but looks like this will work. I was originally trying to avoid creating more tables, but since the table route seems to be providing the counts and speed, I'll continue in that direction.

The reason for the left joins was that we need the total population that mst_Admissions provides. The other tables can have more than one record per person or no records at all.

I have a mixture of join syntax, because the query is cobbled together from queries that were created by a consultant and my trying to learn Oracle after previosly working mostly in MS-Access. WHen I have a difficulty figuring out a join, I copy the query to Access and then use the graphic query editor, then copy it back to Oracle.

Sometimes when I try the (+), with Oracle, the join doesn't seem to work, but when I do the copy to Access and paste back to Oracle, the join then works.

I still wonder why, if the query is essentially the same it works to give the correct counts off the created table rather than when I run it off mst_Admissions directly.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top