Breaking ties when using max(date)

pezzypezpez · Jul 24, 2006

I am trying to find information on how DB2 (and Informix, since we are switching from Informix to DB2) breaks ties and returns data using the following type of select statement:

select max(date(claim_status_hist.claim_status_dt))
from claim_status_hist
where claim_status_cd = 1;

The specific is that although in this example claim_status_dt is of data type TIMESTAMP, using the date function ignores the time portion of the timestamp and only reads out the date. If you have two rows of claim_status_cd = 1 which have the same date and only differ by time, how does DB2 decide which one to return when using the MAX function before the DATE function?

I understand that we should be using max(timestamp) rather than max(date), but we are dealing with legacy code here. DB2 (and informix) both return a single date, but we are unable to figure out how each system decides which row to return. Informix does it one way, and DB2 seems to do it another way. Any help with this would be greatly appreciated.

Thanks!

blom0344 · Jul 25, 2006

1.What makes you think that DB2 would not look - internally - to the time portion?
2.The aggregate function returns 1 value. Why would you want to know from which record this originates if both the records yield this value?

Ties Blom

pezzypezpez · Jul 25, 2006

Thanks for the quick response.
These are both very good questions, and I hope I can answer them.

First, I have been told by the DB2 analysts I have working on this say that this is not the case. They say the date function returns a set of values and then DB2 picks the max one, but if there is a duplicate value they don't know why it picks one duplicate over the other. I have not verified this myself since my role on this project is project manager, not DBA. I am about to load DB2 on my laptop myself to try to verify this independently. I submitted this question in hopes that the answer was out there and well known.

Second, the reason it is important to know which record this originates is the following:

Lets say you have a claim with a claim status code and a date on which that claim status code was entered. Right now, if the claim has its status changed on the same day we are unable to determine which is the actual latest claim status, because the code currently being used is max(date) rather than max(timestamp). DB2 returns one of the claim status codes, but we can't figure out why it picks one over the other. Informix does the same thing, but Informix does not appear to pick the same one. The databases appear to be using different tie breaker criteria.

The reason we are hoping to get this information is because the effort to change the code to max(timestamp) is too great for us to take on right now due to a pending deadline. If we can better understand the criteria by which each system chooses one max date over another, it would be very helpful.

Thank you in advance.

MarcLodge · Jul 25, 2006

Hi,

I am assuming that you are not just picking up the date of the claim, but other info as well. If that is the case, you would be better off using the following type of SQL:

select date(a.claim_status_dt)
from claim_status_hist a
where a.claim_status_cd = 1
and a.claim_status_dt = (select max(b.claim_status_dt)
from claim_status_Hist b
wher b.claim_status_cd = a.claim_status_cd);

If you have any other pertinent key fields they should be included in the link in order to make a good join.

The select max will then work on the latest timestamp, so you will pick up the correct row, but only display/return the date.

Hope this helps.

Marc

blom0344 · Jul 25, 2006

I think Marc came up with an elegant solution. Should be in order for at least a big star if it does what you need.

If it is true what you are saying about the time portion not being considered we may need to find someone close to the core to get to the bottom of this.

Should add a nice alinea to Graeme Birchall's standard reference for DB2

Please let us know if you could get a definite result from your laptop!

Ties Blom

hagelnx · Aug 14, 2006

I had a very similar problem, loading records from files.

Large numbers of records loaded, to be converted to accounts. A particular "card number" had an expiration date (timestamp), there were two other identifiers involved, call them sid and eid, and a primary key.

Code:

SELECT 
  i.record_id
FROM (
    SELECT 
      h.RECORD_ID
     ,row_number() OVER 
        (PARTITION BY h.sid, h.eid order by expiration_date desc) as r
    FROM  
      RECORD h
    WHERE 
      VALIDATED = 1 
     ) i
        where i.r = 1;

This gives me a collection of record id values, "unique-ified". I happen to know that the timestamp is created from a record in the text form "19720101" as "YYYYMMDD", so there's no HHMMSS business getting in there.

The point is, I pick off the first one in each ranking. When I was using an inner query with MAX(expiration_date) I would get more than one id for each sid:eid business key where the ties occurred.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Breaking ties when using max(date)

pezzypezpez

IS-IT--Management

blom0344

Technical User

pezzypezpez

IS-IT--Management

MarcLodge

Programmer

blom0344

Technical User

hagelnx

Programmer

Similar threads

Part and Inventory Search

Sponsor