Order by question

LarrySteele · May 31, 2012

I'm reading from a DB2 table (read: not under my control) that's known to be a mess. It's an amalgamation of job assignment, position assignment, and employment status.

Since this one table contains three disparate data domains, each time they add a new row, they often add repeating data. For example, if a person receives a new job assignment, but their position and status don't change, then they'll repeat the position and status data in the new row. Same thing can happen for position changes or status changes. It's a mess.

I'm trying to pull position assignment information. It's fairly straightforward using a distinct clause - not rocket science. The problem is when an employee has two or more position records on the same date. Here's an example:

Code:

POS_DATE    |  ACTION  | POS_NAME  |  BLAH...
2005-05-16  |  JR      | ANALYST   |
2005-10-03  |  RO      | ASSOC BSA |
2005-10-03  |  RR      | ANALYST   |
2006-01-02  |  L       | ASSOC BSA |
2006-01-02  |  RO      | ASSOC BSA |
2006-09-25  |  PH      | BSA       |

In the example above, row 2 and 3 are in reverse order. Row 3 should appear before row 2. This person held an analyst position, then associate BSA - did not go back and forth between positions as suggested by the output.

The source table contains a record sequence number per employee and when using that column, records appear in the correct order. Of course, adding that essentially voids the distinct since by definition every sequence number is unique. We end up with something like this:

Code:

POS_DATE    |  ACTION  | POS_NAME  | SEQ_NBR | BLAH...
2005-05-16  |  JR      | ANALYST   | 1
2005-05-16  |  JR      | ANALYST   | 2
2005-05-16  |  JR      | ANALYST   | 3
2005-05-16  |  JR      | ANALYST   | 4
2005-10-03  |  RR      | ANALYST   | 5
2005-10-03  |  RO      | ASSOC BSA | 6
2006-01-02  |  RO      | ASSOC BSA | 7
2006-01-02  |  RO      | ASSOC BSA | 8
2006-01-02  |  RO      | ASSOC BSA | 8
2006-01-02  |  L       | ASSOC BSA | 10
2006-09-25  |  PH      | BSA       | 11

As you can see, records 2-4 are repeats of 1 (and there are others). I tried sneaking in an order by clause on sequence number without including said column in output. SQL Server was having none of that nonsense: "ORDER BY items must appear in the select list if SELECT DISTINCT is specified."

Any suggestions on how I might be able work around this?

Thanks in advance,
Larry

bborissov · May 31, 2012

Code:

...
ORDER BY CASE WHEN POS_NAME = 'ANALIST' THEN 1
              WHEN POS_NAME = 'ASSOC BSA' THEN 2
              ....
              ELSE 99999 END, 
          POS_NAME
...

Borislav Borissov
VFP9 SP2, SQL Server 2000,2005 & 2008.

LarrySteele · May 31, 2012

Sometimes the answer just appears right after posting...

The solution is to use row_number() over. Then, partition by every column (except the sequence number)...
[tt]
row_number() over (partition by POS_DATE, ACTION, POS_NAME, BLAH... order by employee_id, SEQ_NBR) as rec_seq
[/tt]
Then query the results; filtering for records where REC_SEQ=1. Order by employee_id and SEQ_NBR and the right records are there in the right order.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Order by question

LarrySteele

Programmer

bborissov

Programmer

LarrySteele

Programmer

Similar threads

Part and Inventory Search

Sponsor