Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SP delete group by blah!!!

Status
Not open for further replies.

Dineedshelp

Technical User
Sep 27, 2002
44
0
0
GB
Hi,

Need to create a stored proc that will delete duplicate records held within a table. I can get the records out that I want to remove, but can't seem to put the right statements in the right place to delete just these rows!!! Please help!!!!

So far:

SELECT bookinguniqueid
FROM dbo.Schedule
GROUP BY bookinguniqueid
having count (bookinguniqueid) > 1

Tried exsists, in alsorts!!!

 
Firstly, that criteria will delete ALL occurances of any duplicate rows - are you sure that's what you want? Usually you would want to leave just one instance and get rid of the other duplicates.

--James
 
How does this work for you? This will delete all rows with duplicate bookinguniqueid's, not all but one. Do you want to keep one row per bookinguniqueid?

Code:
DELETE FROM dbo.Schedule
FROM dbo.Schedule AS s
  INNER JOIN (SELECT  bookinguniqueid
              FROM dbo.Schedule
              GROUP BY bookinguniqueid
              HAVING Count (bookinguniqueid) > 1
  ) AS d
  ON s.bookinguniqueid = d.bookinguniqueid

--John [rainbow]
-----------------------------------
Behold! As a wild ass in the desert
go forth I to do my work.
--Gurnie Hallock (Dune)
 
thanks, but your right, it deletes all the duplicate records, and the original records. I need to keep the orignal records, and just delete the duplicates.



Does that make sense???
 
How do you want to identify which duplicate to keep? A common way is if you have a date field and you want to keep the earliest:

Code:
DELETE schedule
FROM schedule s1
WHERE date_added <> (
    SELECT MIN(date_added)
    FROM schedule
    WHERE bookinguniqueid = s1.bookinguniqueid
  )

--James
 
don't have a date field, the only unique id is the bookinguniqueid field. Doesn't matter which record we keep, they are both the same.

Thanks for your help so far!!!
 
Are all the fields in the duplicate rows the same? If so you can do:

Code:
SELECT DISTINCT *
  INTO #temp
FROM schedule

TRUNCATE TABLE schedule

INSERT schedule
SELECT * FROM #temp

--James
 
tried the above, but I still have duplicates.............argh!!!!!!!!!!!
 
um, something's not right here

you say "the only unique id is the bookinguniqueid field"

well, it can't be unique because in your first post you are looking for multiple instances of it --

SELECT bookinguniqueid
FROM dbo.Schedule
GROUP BY bookinguniqueid
having count (bookinguniqueid) > 1


rudy | r937.com | Ask the Expert | Premium SQL Articles
SQL for Database-Driven Web Sites (next course starts May 8 2005)
 
rudy, I'm assuming that the bookinguniqueid should be the unique id but isn't, which is why Dineedshelp is trying to remove the duplicates.

Dineedshelp, if my last query didn't work you must have some field which differs between rows with the same bookinguniqueid. Tell us what that is and we can provide you with a query that should work.

--James
 
If you don't have anything unique amongst these records, may I suggest adding another column with an identity value? Temporarily at least. That way it'll fill in unique values automatically and you can do your filter on the min or max of that value.



Catadmin - MCDBA, MCSA
Beware the error of pre-emptive poultry inventory!
 
Two scenarios:

a) suppose that "duplicates" (same booking ID) are caused by sloppy DB/app design and concurrency issues (two users attempt to SELECT MAX() last ID at the same time). Both records contain valid data, the only problem is duplicated wannabe primary key.

b) suppose that data is filled by web site/application. One of columns is DateInserted default GETDATE() blah. User double-clicks on submit button or re-POSTs data due to World Wide Wait hiccups. One of rows is duplicated, but DISTINCT/temp table won't eliminate it because DateInserted values are different.

In other words: check for distinct bookingIDs is too narrow, distinct * is too wide.

Obvious course of action is to choose fewer columns "somewhere between" for DISTINCT/GROUP BY checks and then rely on identity to remove everything except one row per group. Which columns exactly, we can only guess without known table structure and possibly sample data.

Even after all that, dupes from a) would still require manual correction (new unique booking ID).

------
heisenbug: A bug that disappears or alters its behavior when one attempts to probe or isolate it
schroedinbug: A bug that doesn't appear until someone reads source code and realizes it never should have worked, at which point the program promptly stops working for everybody until fixed.
 
Hi All,

Thanks for the help so far, but even after the weekend and me trying everything I have come up with the following, and am hoping you might point me in the right direction of an answer!!!

SELECT DISTINCT bookinguniqueid, ApexID, BookingReference, Location_Ref, --Payroll_No,
Schedule_Date, Start_Time, Finish_Time, Cust_Order_No, Category, Staff_Grade, Notes,
TransportRequired, Mileage_Rate_ID, tempIT_Status, Shift_Type, Booking_Status, TransportChargeable,
PayStaff, WardReference
INTO #temp
FROM schedule


Now the only field that caused a problem - the one I have highlight out, is the payroll_no, all the other fields bring over the correct amount of records.

The payroll no, as you may have guessed is assigned to each member of staff, some staff have them, some staff don't.

Maybe its me, but can anyone think why this could be causing a problem?

Thanks for your help so far!!!
 
just a thought, could I export these records, to another table, then find out the payroll number from a seperate table using the apexid? I know this can be done, but my skills in sql don't stretch that far!!!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top