Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SQL SELECT Statement Needed

Status
Not open for further replies.

deepgrewal22

Instructor
May 2, 2005
108
I have a table with about 800,000 records and the records have attributes as such:

PLANT_ID, PROJ_NUM, START_DATE, END_DATE

The problem that I am having is that some plants have more than one entry where the only difference is the dates. This is unnecessarily increasing the amount of records within this table.

*Unfortunately, the primary key is a combination of ALL of the fields. Therefore, no 2 records can be exactly the same at anytime.

Code:
SELECT * FROM acct_to_proj;

PLANT_ID   PROJ_NUM   START_DATE    END_DATE                  
---------- ---------- ------------- -------------
PLANT1     03423200   22-MAR-2007   11-MAY-2007                 
PLANT1     03423200   12-MAY-2007   02-JAN-2100                 
PLANT2     03423200   22-MAR-2007   11-MAY-2007                 
PLANT2     03423200   12-MAY-2007   02-JAN-2100                 
PLANT3     03423200   22-MAR-2007   02-JAN-2100                 
PLANT4     03423200   22-MAR-2007   11-MAY-2007                 
PLANT4     03423200   12-MAY-2007   02-JAN-2100                 
PLANT5     03423200   22-MAR-2007   02-JAN-2100                 
PLANT1     03423272   20-JUN-2000   31-OCT-2001                 
PLANT1     03423272   01-NOV-2001   15-AUG-2012                 
PLANT2     03423272   20-JUN-2000   31-OCT-2001                 
PLANT2     03423272   01-NOV-2001   15-AUG-2012                 
PLANT3     03423272   20-JUN-2000   31-OCT-2001                 
PLANT4     03423272   01-NOV-2001   15-AUG-2012                 
PLANT4     03423272   20-JUN-2000   31-OCT-2001                 
PLANT5     03423272   01-NOV-2001   15-AUG-2012

QUESTION:
----------
What would the query syntax be to only select those records whose project numbers (PROJ_NUM) appear twice for the same plant (PLANT_ID)?



Deep Grewal
Don't say the M word; you sound ignorant.
 
Try
Code:
SELECT * FROM my_table
WHERE (plant_id, proj_num) IN 
(SELECT plant_id, proj_num FROM my_table
 GROUP BY plant_id, proj_num
 HAVING count(*) = 2);
 
Carp,

Thanx for the reply. I tried that coding and it didn't work. Identical project numbers were not listed next to each other as records. Also the grouping seems to repeat. Plant 1 appears multiple times in groups along with the other plants. Meaning that Plant 1 is grouped for about 20-30 rows, then other plants are grouped for about 20-30 rows, and the grouping repeat. So further on down in the data set, Plant 1 appears again grouped for about 20-30 records.

Deep Grewal
Don't say the M word; you sound ignorant.
 
Your requirement didn't specify they should be right next to each other. That can be easily remedied:

Code:
SELECT * FROM my_table
WHERE (plant_id, proj_num) IN 
(SELECT plant_id, proj_num FROM my_table
 GROUP BY plant_id, proj_num
 HAVING count(*) = 2)
ORDER BY plant_id, proj_num, start_date;

The other possible problem I see with your requirements is that you only wanted the project numbers that appear in the same plant twice. Does this mean if the same project appears three or four times, you don't care? If you actually meant to say "more than once", then replace
"HAVING count(*) = 2" with "HAVING count(*) > 1".
If this still doesn't work for you, let us know and we'll see what we can do.
 
Carp,

Brilliant! That worked out very well. I should have been more explicit in my original post. Some PLANT_ID || PROJ_NUM have 2 sets of dates as opposed to the ideal 1 set. My goal is to identify all PLANT_ID || PROJ_NUM that have 2 sets of dates and then (somehow, any ideas?) remove the second record while adjusting the dates of the first record.

THIS:
Code:
PLANT_ID   PROJ_NUM   START_DATE    END_DATE                  
---------- ---------- ------------- -------------
PLANT1     03423200   22-MAR-2007   11-MAY-2007                 
PLANT1     03423200   12-MAY-2007   02-JAN-2100

SHOULD CONSOLIDATE TO THIS:
Code:
PLANT_ID   PROJ_NUM   START_DATE    END_DATE                  
---------- ---------- ------------- -------------
PLANT1     03423200   22-MAR-2007   02-JAN-2100


Deep Grewal
Don't say the M word; you sound ignorant.
 
If everything else in the rows are identical, then you should be able to get pretty close to what you want with something along the lines of
Code:
SELECT plant_id, proj_num, <other identical columns>,
MIN(start_date), MAX(end_date)
FROM my_table
GROUP BY plant_id, proj_num, <other identical columns>;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top