Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations sizbut on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Finding twins in pupil database 1

Status
Not open for further replies.

cjashwell

MIS
Jul 10, 2001
80
GB
I am using an Oracle 9i db but I thought I would post this question here as a) there is no specific Oracle SQL forum available, b) the Oracle developer forum looks pretty sparse, and c) I thought it might be interesting to see if/how my problem can be solved in 'pure' ANSI SQL. If I'm in the wrong place please do let me know and I'll try elsewhere.

Here is the background to my question. I have a pupil database consisting of a number of tables, each containing tens of thousands rows. I need to find TWIN pupils, and the criteria I'm using to determine a twin are:

1. Same birthdate
2. Same current address

The two tables involved are PUPIL and PUPIL_ADDRESS.

PUPIL contains a pupil_id column and various other columns such as forename, surname, birthdate, etc.

PUPIL_ADDRESS contains pupil_id (as a FK to PUPIL), address_id (as a FK to ADDRESS), main_address_ind (a column restricted to values M for Main - meaning current address - and P for Previous, meaning an old address), address_start_date (the date the address started for the record), address_end_date (the date the address ended for the record, if applicable - can be NULL).

In order to find the twins in the dataset I can run the following query:
Code:
select p.pupil_id
, p.forenames
, p.surname
, p.BIRTHDATE
, pa.address_id
from 
pupil p
, pupil_address pa
where
p.pupil_id = pa.pupil_id
and pa.main_address_ind = 'M'
and exists
(select 1
 from pupil p2, pupil_address pa2
 where p2.pupil_id = pa2.PUPIL_ID
 and pa2.main_address_ind = 'M'
 and p2.birthdate = p.birthdate
 and p2.pupil_id <> p.pupil_id
 and pa2.address_id = pa.address_id
 )
order by p.birthdate, p.surname, p.forenames

So, I'm selecting five columns from two tables, I have joined PUPIL and PUPIL_ADDRESS on the PUPIL_ID column, and I have set the condition that the MAIN_ADDRESS_IND is 'M' to restrict the addresses to current addresses only.

I have used the 'exists' condition with a sub-query to ensure that the rows returned are those where there are related records in the same table where the PUPIL_ID cols don't match but the BIRTHDATE and ADDRESS_IND cols do match (also with the same restriction on the address). I hope this makes sense - and I hope ANSI SQL allows the 'exists' condition! The query does work on Oracle, I hasten to add.

My question is this. Can I transform this query so that instead of outputting a standard list like this:

Code:
PUPIL_ID[TAB]FORENAME[TAB]SURNAME[TAB]DOB[TAB]ADDRESS_ID
106458[tab]Samatha[tab]Brown[tab]01/06/1982[tab]161583
107262[tab]Sasha[tab]Brown[tab]01/06/1982[tab]161583

I could use some kind of sub-query to have the first pupil details and the twin on the same line:

Code:
PUPIL_1_ID[TAB]PUPIL_1_FNAME[TAB]PUPIL_1_SNAME[TAB]PUPIL_2_ID[TAB]
PUPIL_2_FNAME[TAB]PUPIL_2_SNAME
106458[TAB][tab]Samanatha[tab][tab]Brown[tab][tab]107262[tab][tab]
Sasha[tab][tab]Brown

Imagining there are ten twins in the dataset, I would like the output to be five rows, with the first of the twins followed by the second on the same row.

I hope this makes sense. If anyone could give me any help with this I'd be most grateful.

Best wishes,
cj ashwell
 
Something like this ?
Code:
SELECT P1.pupil_id, P1.forenames, P1.surname
, P2.pupil_id, P2.forenames, P2.surname
  FROM pupil P1
 INNER JOIN pupil P2 ON P1.birthdate = P2.birthdate
 INNER JOIN pupil_address PA1 ON P1.pupil_id = PA1.pupil_id
 INNER JOIN pupil_address PA2 ON P2.pupil_id = PA2.pupil_id
 WHERE P1.pupil_id <> P2.pupil_id
   AND PA1.address_id = PA2.address_id
   AND PA1.main_address_ind = 'M'
   AND PA2.main_address_ind = 'M'

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Thanks for the reply, PH. Although this neatly 'pivots' the list so that the twins are on the same 'row', it still returns the complete dataset.

So I get, for example:

Code:
PUPIL_ID[tab]FORENAMES[TAB]SURNAME[TAB]PUPIL_ID_1[TAB]FORENAMES_1[TAB]SURNAME_1
106458[TAB][TAB]Samantha[TAB][TAB]Brown[TAB][TAB]107262[TAB][TAB]Sasha[TAB][TAB]Brown
107262[TAB][TAB]Sasha[TAB][TAB]Brown[TAB][TAB]106458[TAB][TAB]Samantha[TAB][TAB]Brown

What I'm after (and I think I may be looking for something illogical!!) is to exclude from the result set the twin, so that I end up with the format as above but only one 'row' in the results.

EG:
Code:
PUPIL_ID[tab]FORENAMES[TAB]SURNAME[TAB]PUPIL_ID_1[TAB]FORENAMES_1[TAB]SURNAME_1
106458[TAB][TAB]Samantha[TAB][TAB]Brown[TAB][TAB]107262[TAB][TAB]Sasha[TAB][TAB]Brown

Instead of 10 rows in the output I'd get 5.

Any ideas?
 
And what result do you except for triplets etc? (Where is the third forename found?)
 
Replace this:
WHERE P1.pupil_id <> P2.pupil_id
with this:
WHERE P1.pupil_id < P2.pupil_id

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
PH - brilliant! Exactly what I was looking for!!

I didn't think of this, but I see how it works (I think!).

Many thanks :)

(BTW, JarlH, the triplets (etc) issue is a good point, but there are so few they can be resolved manually)
 
Why assume twins will always be born on the same day? What will you resolve when twins are born on different days?

ex: 01/01/07 11:55 PM and 01/02/07 12:05 AM?

Can you use something like P1.birthdate between P2.birthdate-1 and P2.birthdate+1

?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top