Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

JOINS, JOINS, and more JOINS

Status
Not open for further replies.

shadedecho

Programmer
Oct 4, 2002
336
US
ok... I have developed a reputation on this site for long complicated posts (I like to think that's because I don't ask questions that are simple because I somehow figure those out on my own...) anyway, this one is no different --
NOTE: not for the faint of heart!

This question relates to a web-based timesheet system I have written for my company, and all the tables that store the information each employee enters daily. i have a hugely complex set of cross-linked tables, and I am trying to do one master query across all of those tables (using joins) to come up with a result set that has a field for each of the variables and calculated values that I need, and each record in the result set corresponds to one employee.

Here's the foundation for my question:

I have an employee table, which is inner-linked on a many-to-one basis with the SupID (supervisor's employee ID) field pointing to another record's EmpID (employee ID).

I should now mention that all other records are related to an employee record ONLY by a calculated value... by this, I mean that a "UID" is created on the fly with MD5(CONCAT(`FirstName`,`LastName`,`EmpID`)) from the employee table and that UID (32 character string) is what is stored in each timesheet or child or previous adjustment record.

Next, I have a timesheet table, which has the main record of a timesheet for a certain pay period, the unique identifier is TSID. Then, linked to the main timesheet record are records from two "child" tables, project_worked and leavetime_used, both on a many-to-one relationship to the main timesheet record (linking to the TSID).

The project_worked records are flagged with a &quot;default_holiday&quot; if they are holiday pay, and one of several other (irrevlevant) codes if not. So I have to get a total of non-holiday hours (flag <> 'default_holiday') and a total of holiday hours (flag = 'default_holiday').

The project_worked (and leavetime_used) records are stored with a &quot;date&quot; field, meaning they correspond to a particular day that an employee logged time for. overtime is calculated based on calendar weeks (sun-sat), but our pay period starts and ends in the middle of the calendar week, so overtime calculation has to &quot;look back&quot; to previous days, all the way back to the sunday preceeding the payperiod startdate, to get an accurate calendar week total. The way our pay period dates fall, it is POSSIBLE for there to be a total of 3 full calendar weeks in question that the OT calculation would need to occur on (ie, the pay period starts on a saturday, the last day of a period, and ends 16 days later, on a monday).

Also, we have this notion of &quot;previous adjustments&quot; which are logged in the current time period, but are adjusting the last day of the previous pay period. these are all stored in one table, but are flagged with a field to be &quot;project_worked&quot; or &quot;leavetime_used&quot;. the previous adjustments are line items, that each have their own primary key, but related previous adjustments (ie, adjustments for an employee in a particular pay period) all have a common &quot;documentID&quot;. also, previous adjustments are linked to a &quot;current pay period&quot; with the &quot;period_start_date&quot;.

So, the challenge is, I need to have the query use the SUM() function to sum up based on the &quot;total_time&quot; field in my child tables like project_worked, leavetime_used, and previousTS_adjustments. Also, I need a calendar week total using SUM from each of the 3 possible calendar weeks (don't worry, I programatically input the appropriate start and end dates when creating this query). This way, I can examine each week's total to see if > 40 to be displayed elsewhere as OT.

Sound complicated enough?

Here's the query I've written so far to attempt this feat...

Note, $$$___$$$ denotes a variable that I programatically build into the query string in PHP, so just assume it to be some constant.

Code:
SELECT emp.`actual_name`,
       emp.`EmpID`,
       MD5(CONCAT(emp.`FirstName`,emp.`LastName`,emp.`EmpID`)) AS 'empUID',
       emp.`SupID`,
       sup.`actual_name`,
       MD5(CONCAT(sup.`FirstName`,sup.`LastName`,sup.`EmpID`)) AS 'supUID',

       ts.`TSID`,
       SUM(ts_pw.`total_time`) AS 'ts_pw_total',
       SUM(ts_lt.`total_time`) AS 'ts_lt_total',
       SUM(ts_pw_holiday.`total_time`) AS 'ts_holiday_total',
       SUM(ts_pw_week1.`total_time`) AS 'ts_pw_week1_total',
       SUM(ts_pw_week2.`total_time`) AS 'ts_pw_week2_total',
       SUM(ts_pw_week3.`total_time`) AS 'ts_pw_week3_total',

       pa.`documentID`,
       SUM(pa_pw.`total_time`) AS 'pa_pw_total',
       SUM(pa_lt.`total_time`) AS 'pa_lt_total'


FROM `employee` emp INNER JOIN `employee` sup ON sup.`EmpID` = emp.`SupID` 
     LEFT OUTER JOIN `timesheet` ts ON ts.UID = MD5(CONCAT(emp.`FirstName`,emp.`LastName`,emp.`EmpID`))
     LEFT OUTER JOIN `project_worked` ts_pw ON ts.`TSID` = ts_pw.`TSID` AND ts_pw.`flag` <> 'default_holiday'
     LEFT OUTER JOIN `project_worked` ts_pw_holiday ON ts.`TSID` = ts_pw_holiday.`TSID` AND ts_pw_holiday.`flag` = 'default_holiday'
     LEFT OUTER JOIN `project_worked` ts_pw_week1 ON ts.`TSID` = ts_pw_week1.`TSID` AND ts_pw_week1.`flag` <> 'default_holiday' AND (TO_DAYS(ts_pw_week1.`date`) BETWEEN TO_DAYS('$$$calendar1start$$$') AND TO_DAYS('$$$calendar1end$$$'))
     LEFT OUTER JOIN `project_worked` ts_pw_week2 ON ts.`TSID` = ts_pw_week2.`TSID` AND ts_pw_week1.`flag` <> 'default_holiday' AND (TO_DAYS(ts_pw_week2.`date`) BETWEEN TO_DAYS('$$$calendar2start$$$') AND TO_DAYS('$$$calendar2end$$$'))
     LEFT OUTER JOIN `project_worked` ts_pw_week3 ON ts.`TSID` = ts_pw_week3.`TSID` AND ts_pw_week1.`flag` <> 'default_holiday' AND (TO_DAYS(ts_pw_week3.`date`) BETWEEN TO_DAYS('$$$calendar3start$$$') AND TO_DAYS('$$$calendar3end$$$'))
     LEFT OUTER JOIN `leavetime_used` ts_lt ON ts.`TSID` = ts_lt.`TSID`
     LEFT OUTER JOIN `previousTS_adjustments` pa ON pa.`UID` = MD5(CONCAT(emp.`FirstName`,emp.`LastName`,emp.`EmpID`))
     LEFT OUTER JOIN `previousTS_adjustments` pa_pw ON pa_pw.`documentID` = pa.`documentID` AND pa_pw.`type` = 'work'
     LEFT OUTER JOIN `previousTS_adjustments` pa_lt ON pa_lt.`documentID` = pa.`documentID` AND pa_lt.`type` = 'leave'

WHERE sup.`EmpID` = emp.`SupID` AND

      ts.`start_date` = '$$$periodstart$$$' AND
      ts.`end_date` = '$$$periodend$$$' AND
      ts.`status` = 'processed' AND

      pa.`status` = 'processed' AND
      pa.`period_start_date` = '2003-03-12'

GROUP BY emp.`actual_name`

so, the good news is that I don't get any syntax errors from all that stuff. bad news, it is taking INCREDIBLY too much time to execute... right now, the query has been running for nearly 10 minutes and the page just timed out. I only have about 200 employee records, and each employee has one timesheet (so far) and maybe 30 child records. This is NOT a huge data set at this point, I must be doing something incredibly inefficient for this to be occuring.

As you can see in the query, I select out some fields, and then select out some &quot;SUM&quot;s of the &quot;total_time&quot; field in the various other tables... because i need to get the total in several different fashions (holiday, non-holiday, week1, etc etc) I am joining the same table to the mix several times over, each time applying a different ON clause for that particular condition, like the date being between the start and end dates of the respective calendar week.

Can anyone help me out in where I might be going wrong with this query?
 
let me add this note, if its not apparent from my description...

the result set should have 16 fields across (see my SELECT statement) -- emp.`actual_name`,emp.`EmpID`,'EmpUID',....,'ts_pw_week1_total',...

going down the result set vertically, there will be one record for each employee, and there should be 0's (or Null's) for each SUM of a field with conditions that cause that table's JOIN to fail.

a record might look like:

John Smith 601 dlal4aow4lak4wmlklr4kmr2 .... 0 0 35 36.7 ....

 
Using the output of MD5 is not a good idea for a table index. The algorithm can map multiple inputs to the same hash value. The chances are remote, but it could be a problem one day.

In general, I recommend that you reduce some of the joins in your query by looping through employees in PHP. Want the best answers? Ask the best questions: TANSTAAFL!
 
FYI: I said in my post that the chances are remote. This is an important index in your schema -- do you want a low probability of a key collision or no probability of a key collision? Using an integer auto_increment index both reduces the chances of a key collision to zero and makes searches faster. Even if you use a bigint, you only have to make comparisons over a pair of 8-byte values. Versus 32 bytes each with an MD5 key. But hey, it's your app.


Now back to that octopus orgy of a query of yours. Chances are that you have some join in there that MySQL is interpreting as a cross join. MySQL is then trying to return some enormous number of records. Have you run that thing in a non-web-based MySQL admin app to see what it does without PHP's script runtime limits getting in the way?

Again, I recommend that you use record looping in PHP and a series of less complex queries. Instead of trying to get it all at once, create a group of simpler queries that pull out the classes of data you need. Let PHP perform the aggregation. Want the best answers? Ask the best questions: TANSTAAFL!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top