Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Deleting duplicate records from cursor

Status
Not open for further replies.

TheLazyPig

Programmer
Sep 26, 2019
94
PH
Hi!

I wanted to delete only the old records but I ended up deleting both.

To get duplicate records I used having count(*) > 1 then Delete from region.

Code:
SET DELETED ON
SET SAFETY OFF

LOCAL pcPopFile,pcCurrDir

*----------------------------------
pcPopFile    = "C:\BACKUP_FILES\NETPAY"
pcCurrDir    = "C:\BACKUP_FILES\Region"
pcScrPath    = pcCurrDir+"\230308\region.dbf"
pcBckPath    = pcCurrDir+"\230308\regionbk.dbf"
pcRegion     = 'NCR' &&REGION
pcAsOf       = GOMONTH(DATE(2023,02,01),1)-.1 &&ASOF
pcBkDate     = DATE(2023,02,28)
pnCtr        = 0
*----------------------------------

IF ATC("REGION.DBF",pcScrPath) = 0
	MESSAGEBOX("Invalid File Name!",48)
	
ELSE 	
	
	USE (pcScrPath) IN 0 SHARED 
	
	SELECT region
	COUNT TO lnRegCnt
	
	WAIT WINDOW NOWAIT "Checking for duplicate records....."
	
	**To Check Duplicate Records by Region	
	SELECT * ;
	FROM region ;
	WHERE RG+DIV+STA+EMPNO+FNAME+MI+LNAME IN ( ;
	    SELECT RG+DIV+STA+EMPNO+FNAME+MI+LNAME ;
		FROM region ;
		WHERE RG = pcRegion ;
		GROUP BY 1 ;
		HAVING COUNT(*)>1 ;
	) ORDER BY EMPNO ;
	INTO CURSOR curDup READWRITE

	llWithDup = reccount("curDup") > 0

	IF llWithDup 
	   MESSAGEBOX("With Duplicate Record Found!")
	   
	   **Duplicates that are less than asof date
	   IF asof < pcAsOf
	       
	       **get count of duplicate
	       SELECT curDup	 
	       COUNT TO lnDupCnt
		   WAIT WINDOW NOWAIT "No. of Duplicate Records : " + TRANSFORM(lnDupCnt,"###,###,###.##")
		   
		   **to insert duplicate record to regionbk for history
		   WAIT WINDOW NOWAIT "....Backup Records...."
		   =regBackUp()
		   MESSAGEBOX("Successfully Inserted Records!")
		   
		   **delete duplicate records where asof less than pcAsOf 
		   [highlight #FCE94F]=deleteDuplicate()[/highlight]
		   		   
		   MESSAGEBOX("Successfully Deleted!")
		   
	   ENDIF
	ENDIF
	
	**count of all deleted records from region
	SELECT * FROM region WHERE DELETED()
	COUNT TO lnDelCnt
	WAIT WINDOW NOWAIT "No. of All Deleted Records : " + TRANSFORM(lnDelCnt,"###,###,###.##")
    

	SELECT region
	COUNT TO lnTotCnt
	
	MESSAGEBOX("Total Count : " + TRANSFORM(lnTotCnt,"###,###,###.##"))
 
ENDIF


**To backup deleted records from region to regionbk
PROCEDURE regBackUp
CLOSE ALL

  USE (pcBckPath) ALIAS RegionBk
  SELECT 0
  USE (pcScrPath) IN 0 SHARED ALIAS Region

  SELECT * ;
	FROM region ;
	WHERE RG+DIV+STA+EMPNO IN ( ;
	    SELECT RG+DIV+STA+EMPNO ;
		FROM region ;
		WHERE RG = pcRegion ;
		GROUP BY 1 ;
		HAVING COUNT(*)>1 ;
	) AND asof < pcBkDate ;
	INTO CURSOR curDup

	COUNT TO lnCurDup
	WAIT WINDOW NOWAIT "No. of Records Backup : " + TRANSFORM(lnCurDup,"###,###,###.##")

	**Insert Duplicate Records in RegionBK (old records) 
	SELECT Regionbk
	APPEND FROM DBF('curDup')
	
	WAIT WINDOW NOWAIT "Append RegionBK with Duplicate Records....."

	**Replace blank deleted date to SYSDATE
	SELECT RegionBK 
	REPLACE Deleted WITH DATE() FOR EMPTY(Deleted) IN RegionBK
	
	WAIT WINDOW NOWAIT "Updated empty deleted with system date....."

ENDPROC

**To Delete Duplicate Records
PROCEDURE [highlight #FCE94F]deleteDuplicate[/highlight]
CLOSE ALL

  USE (pcScrPath) IN 0 SHARED ALIAS Region

      SELECT * ;
	FROM region ;
	WHERE RG+DIV+STA+EMPNO IN ( ;
	    SELECT RG+DIV+STA+EMPNO ;
		FROM region ;
		WHERE RG = pcRegion ;
		GROUP BY 1 ;
		HAVING COUNT(*)>1 ;
	) AND asof < pcBkDate ;
	INTO CURSOR toDelete
	
	SELECT region
	DELETE FROM region WHERE RG+DIV+STA+EMPNO IN (SELECT * ;
	                                                FROM region ;
	                                               WHERE RG+DIV+STA+EMPNO IN ( ;
	                                                 SELECT RG+DIV+STA+EMPNO ;
		                                           FROM region ;
		                                          WHERE RG = pcRegion ;
		                                          GROUP BY 1 ;
		                                          HAVING COUNT(*)>1 ;
	                                                ) AND asof < pcBkDate)

        **get count of deleted
	SELECT toDelete
	COUNT TO lnDelCnt
	WAIT WINDOW NOWAIT "No. of Deleted Records : " + TRANSFORM(lnDelCnt,"###,###,###.##")
		   	 
ENDPROC

How can I remove only the records from selected duplicates...

Thank you!
 
LazyPig, you will no doubt get some good answers from the experts in this forum.

However, it is not at all helpful when someone posts a huge chunk of code, without any explanation of what it does or what it is meant to achieve, and then expects forum members to work through it to try to understand it. (The same is true of your post in thread184-1820806.)

The first step in solving a problem is to be able to explain the problem clearly and concisely.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
The innermost SELECT, does SELECT RG+DIV+STA+EMPNO... GROUP BY 1 HAVING COUNT(*)>1.

That identifies records which are double by repeated values in all those fields. Fine. And then you delete all of them, right. You want to keep the first record with that value combination, then you have to do that and not do your DELETE.


Chriss
 
Okay, I understand.

The program is working there's no error found. I checked the region table if duplicates were deleted correctly, I used the empno that has duplicate record but both of them is gone instead of removing only the previous date.

Sample record from table region. The highlighted row should be deleted.
Screenshot_2023-03-14_174303_blucsd.png



Below is my code for deletion.

Code:
**To Delete Duplicate Records
PROCEDURE deleteDuplicate
CLOSE ALL

  USE (pcScrPath) IN 0 SHARED ALIAS Region

      SELECT * ;
	FROM region ;
	WHERE RG+DIV+STA+EMPNO IN ( ;
	    SELECT RG+DIV+STA+EMPNO ;
		FROM region ;
		WHERE RG = pcRegion ;
		GROUP BY 1 ;
		HAVING COUNT(*)>1 ;
	) AND asof < pcBkDate ;
	INTO CURSOR toDelete
	
	SELECT region
	DELETE FROM region WHERE RG+DIV+STA+EMPNO IN (SELECT * ;
	                                                FROM region ;
	                                               WHERE RG+DIV+STA+EMPNO IN ( ;
	                                                 SELECT RG+DIV+STA+EMPNO ;
		                                           FROM region ;
		                                          WHERE RG = pcRegion ;
		                                          GROUP BY 1 ;
		                                          HAVING COUNT(*)>1 ;
	                                                ) AND asof < pcBkDate)

        **get count of deleted
	SELECT toDelete
	COUNT TO lnDelCnt
	WAIT WINDOW NOWAIT "No. of Deleted Records : " + TRANSFORM(lnDelCnt,"###,###,###.##")
		   	 
ENDPROC


Thank you!
 
Okay, you're one step further. You have to define which of the two records to keep, this isn't automatically decidable. So all your code should do is display the duplicates and let you decide which of them is relevant. You might want to sum some other data into the record you want to keep. And in some cases you might also have three records in the same grouping.

It can only be decided by code, if you give an order that identifies the one record to keep as first in that order, for example. But is there even an ordering criterion that moves the most important record to the top, always, and is all data in the second and maybe further records just waste that can go away?

Define more precisely what you want to do and you can solve it yourself. And that's very likely simpler for you than needing to explain the conditions to us without needing a long shot of explaining the meaning of your data.

Chriss
 
Hi TheLazyPig,

May I add: it is good practice to add a PKey (primary key) to all your records - it allows you to uniquely identify them.

Since this does NOT seem to be the case, and since your tables don't seem to be indexed either you'll have to locate the record to delete.

Please have a look at the code below. It works without a PKey, assumes that there are only a few duplicates and that the first located record is the one to be deleted. On larger tables it might be slowing down.

Btw, you may want to check the validating procedure in order to avoid to add duplicates.

Enjoy!

Code:
*!*	Check for duplicates - Found records are hilighted, bold and italic

PUBLIC go_Form

SET DELETED ON 

go_Form = CreateObject ("frmForm")
go_Form.Visible = .T.
go_Form.Show

READ Events
CLOSE ALL
CLEAR ALL

***********************************************************************

DEFINE CLASS frmForm As Form
  Width = 540
  MinWidth = 540
  Height = 390
  MinHeight = 390
  AutoCenter = .T.
  Caption = "Identical Code2Check = Duplicate"
  
	ADD OBJECT grdNames as Grid WITH ;
		Visible = .t., Top = 12, Left = 12, Width = 540 - 24, Height = 390 - 66, DeleteMark = .F., Anchor = 15

	ADD OBJECT cmdCancel As CommandButton WITH ;
		Width = 60, Height = 30, Left = 12, Top = 390 - 42, Caption = "Release", Anchor = 6

	ADD OBJECT cmdDelete As CommandButton WITH ;
		Width = 90, Height = 30, Left = 78, Top = 390 - 42, Caption = "Delete Dupes", Anchor = 6
		
	[highlight #FCE94F]	PROCEDURE cmdDelete.Click()
			LOCAL lcCode2Check as Character
			LOCAL ARRAY laDupes[1]

			SELECT cDrawNo + cBallNo + cCode ;
				FROM csrDemo GROUP BY 1 HAVING COUNT(cDrawNo + cBallNo + cCode) >= 2 ;
				INTO ARRAY laDupes
				
			FOR lni = 1 TO ALEN(laDupes)
				IF VARTYPE(laDupes[lni]) = "C"
					lcCode2Check = laDupes[lnI]
					SELECT csrDemo
					LOCATE FOR csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode = lcCode2Check
					DELETE NEXT 1

				ENDIF
			ENDFOR 

			ThisForm.Refresh() 

		ENDPROC[/highlight] 
			
		
	PROCEDURE Load()
	
		CREATE CURSOR csrdemo (cDrawNo C(5), cBallNo C(3), cPartNo C(10), cCode C(1))
		INSERT INTO csrdemo VALUES ('8342C','130','61225-350',"N")
		INSERT INTO csrdemo VALUES ('8342C','130','61225-350',"N")
		INSERT INTO csrdemo VALUES ('8342C','130','61225-350',"N")
		INSERT INTO csrdemo VALUES ('8342D','130','61225-350',"Y")
		INSERT INTO csrdemo VALUES ('8342C','131','834X-131',"Y")
		INSERT INTO csrdemo VALUES ('8342D','131','834X-131',"Y")
		INSERT INTO csrdemo VALUES ('8342C','132','834X-132',"N")
		INSERT INTO csrdemo VALUES ('8342D','132','834X-132',"N")
		INSERT INTO csrdemo VALUES ('8342C','133','61225-362',"Y")
		INSERT INTO csrdemo VALUES ('8342D','133','61225-362',"Y")
		INSERT INTO csrdemo VALUES ('8342C','137','834X-137',"Y")
		INSERT INTO csrdemo VALUES ('8342D','137','834X-137',"Y")
		INSERT INTO csrdemo VALUES ('8342C','141','834X-141',"Y")
		INSERT INTO csrdemo VALUES ('8342D','141','834X-141',"N")
		INSERT INTO csrdemo VALUES ('8342D','141','834X-141',"N")
		INSERT INTO csrdemo VALUES ('8342C','150','67242-107',"N")
		INSERT INTO csrdemo VALUES ('8342D','150','67242-107',"N")
		INSERT INTO csrdemo VALUES ('8342C','152','8332-124',"N")
		INSERT INTO csrdemo VALUES ('8342D','152','8332-124',"Y")
		INSERT INTO csrdemo VALUES ('8342C','153','834X-153',"Y")
		INSERT INTO csrdemo VALUES ('8342D','153','834X-153',"Y")
		INSERT INTO csrdemo VALUES ('8342D','153','834X-153',"N")
		INSERT INTO csrdemo VALUES ('8342D','153','834X-153',"Y")
		INSERT INTO csrdemo VALUES ('8342C','154','834X-154',"Y")
		INSERT INTO csrdemo VALUES ('8342D','154','834X-154',"Y")
	ENDPROC 
	
	PROCEDURE Refresh()
	
*!*	BEGIN of code to check for duplicates		

		SELECT csrDemo.*, csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode as cCode2Check, csrDupes.lDupes ;
			FROM csrDemo ;
				JOIN (SELECT cDrawNo, cBallNo, cCode, .T. as lDupes FROM csrDemo GROUP BY 1, 2, 3 HAVING COUNT(cDrawNo + cBallNo + cCode) >= 2) csrDupes ;
					ON csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode = csrDupes.cDrawNo + csrDupes.cBallNo + csrDupes.cCode;
			UNION ALL ;
				SELECT csrDemo.*, csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode as cCode2Check, csrNoDupes.lDupes ;
					FROM csrDemo ;
					JOIN (SELECT cDrawNo, cBallNo, cCode, .F. as lDupes FROM csrDemo GROUP BY 1, 2, 3 HAVING COUNT(cDrawNo + cBallNo + cCode) < 2) csrNoDupes ;
						ON csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode = csrNoDupes.cDrawNo + csrNoDupes.cBallNo + csrNodupes.cCode ;			
			ORDER BY 2, 1 ;
			INTO CURSOR csrAll READWRITE 
			
*!*	END of code to check for duplicates

		WITH ThisForm.grdNames
			.ColumnCount = -1
			.RecordSource = "csrAll"
			.Column1.Header1.Caption = "DrawNo"
			.Column1.Width = 60
			.Column2.Header1.Caption = "BallNo"
			.Column2.Width = 72
			.Column3.Header1.Caption = "PartNo"
			.Column3.Width = 72
			.Column4.Header1.Caption = "in Stock"
			.Column4.Width = 60
			.Column5.Header1.Caption = "Code2Check"
			.Column5.Width = 90
			.Column6.Header1.Caption = "Duples?"
			.Column6.Width = 48
			.SetAll("DynamicFontItalic", "csrAll.lDupes", "Column")	  
			.SetAll("DynamicFontBold", "csrAll.lDupes", "Column")	  
			.SetAll("DynamicBackColor", "ICASE(csrAll.lDupes, RGB(225,0,0), ;
											MOD(ASC(RIGHT(csrAll.cDrawNo, 1)), 2) = 0, RGB(200,200,200), ;
											RGB(254,254,254))", "Column")	  	
		ENDWITH 

	ENDPROC 

	PROCEDURE cmdCancel.Click()
		CLEAR Events
		ThisForm.Release

	ENDPROC
  
	PROCEDURE Destroy()
		CLEAR Events
		ThisForm.Release

	ENDPROC
ENDDEFINE

***********************************************************************

hth

MarK
 
Thanks Mark,


I think this is a big favor of picking up the idea to show duplicates and implementing it as an example. I think you can do that, as you can do forms and logic, you just have to overccome your own name here. It's almost implied you want to keep the record with the latest asof date. MarK's remark...

MarK said:
You may want to check the validating procedure in order to avoid to add duplicates.

...is really one more important piece of advice to not get into such a bad situation. If an employee's netpay changes, it changes, why are there two records at all? It's viable, I guess for tax office/IRS to keep the history of salaries, but I'd recommend doing so in separate history tables. The other way to look at it is that it could be the intended strategy to keep current and data history in one DBF for some reason and then navigate/select currently valid data by picking the records with max asof date. Even more so, if net pay is something that changes per month and includes bonus payments, provisions, etc. As an outsider, I have no idea how to handle such data and won't give you a solution you then simply use, as it doesn't error. If the source code doesn't error, that's no proof of it being correct.

All in all, I can't overcome the urge to tell you that while being lazy can also lead to exactly the best tidy solutions that make everything easy to use and maintain, in this case, there are some things to do you can't just outsource to the forum and so you have to overcome your self given name. There is no "on-size-fits-all-cases" solution to remove duplicates of that sort, where you have to decide which one is the one record to keep. You can prevent full duplicate records or also detect them and then be quite sure if they are not expected you can get rid of any of them. If you had PKs, that again would need further looking for which record has been referenced by other tables in 1:n relationships and may cascade the need to look into more data to find out which to keep.

Technical advice on how to prevent duplicates is indexing a group of fields as one expression like your RG+DIV+STA+EMPNO expression is, with a candidate index. There is an index type in VFP, a unique index, that's a red herring in that aspect. It would still allow duplicates to be created. Just if you SET ORDER to such an index, it suppresses all but the first of them. So in your case, it "keeps" the oldest one, not the latest one.

Chriss
 
This is how I remove duplicates in a file with Swedish postal numbers:
Code:
CLOSE DATABASES
CLEAR
USE \FOXDEV\POSTNUM2 ORDER pstnr EXCLUSIVE ALIAS aktuell
INDEX ON pstnr+gata+fron+till TAG test
SELECT pstnr+gata+fron+till AS test, COUNT(*) AS ant, * FROM ALIAS() GROUP BY 1 HAVING ant >1
* This creates a CURSOR with duplicates from the DBF above
IF RECCOUNT()>0
	?RECCOUNT()
	INDEX ON test TAG test
	SELECT aktuell
	SET RELATION TO pstnr+gata+fron+till INTO Query
	DELETE FOR NOT EOF('Query')
	* The code above deletes ALL duplicate rows regardless of how many they are
	APPEND FROM DBF('Query')
	* Code above reads back only one duplicate
	* The column TEST and ANT will not be included since they're not present in the original table
	PACK && Take away deleted rows
ELSE
	* No duplicates
ENDIF
 
Dan Olson said:
* The code above deletes ALL duplicate rows regardless of how many they are
Yes, and that's what TheLazyPig doesn't want to do but keep one record.

And if I had to limit my advice to one sentence it's: Concentrate on the row you want to keep and how to find that.

Chriss
 
Hi Chris,

Thanks for the hint. You find the demo code below.

Code:
*!*	Check for duplicates - Found records are hilighted, bold and italic

PUBLIC go_Form

SET DELETED ON 

go_Form = CreateObject ("frmForm")
go_Form.Visible = .T.
go_Form.Show

READ Events
CLOSE ALL
CLEAR ALL

***********************************************************************

DEFINE CLASS frmForm As Form
  Width = 540
  MinWidth = 540
  Height = 390
  MinHeight = 390
  AutoCenter = .T.
  Themes = .F.
  Caption = "Identical Code2Check = Duplicate"
  
	ADD OBJECT grdNames as Grid WITH ;
		Visible = .t., Top = 12, Left = 12, Width = 540 - 24, Height = 390 - 66, ReadOnly = .T., DeleteMark = .F., Anchor = 15

	ADD OBJECT cmdCancel As CommandButton WITH ;
		Width = 60, Height = 30, Left = 12, Top = 390 - 42, Caption = "Release", Anchor = 6

	ADD OBJECT cmdDelete As CommandButton WITH ;
		Width = 90, Height = 30, Left = 84, Top = 390 - 42, Caption = "Delete Dupes", Anchor = 6
		
[highlight #8AE234]		PROCEDURE cmdDelete.Click()
			LOCAL lcCode2Check as Character
			LOCAL ARRAY laDupes[1]

			SELECT cDrawNo + cBallNo + cCode ;
				FROM csrDemo GROUP BY 1 HAVING COUNT(cDrawNo + cBallNo + cCode) >= 2 ;
				INTO ARRAY laDupes
				
			FOR lni = 1 TO ALEN(laDupes)
				IF VARTYPE(laDupes[lni]) = "C"
					lcCode2Check = laDupes[lnI]
					SELECT csrDemo
					= INDEXSEEK(lcCode2Check, .T., "CSRDEMO")
					SKIP 1 
					DELETE while csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode = lcCode2Check

				ENDIF
			ENDFOR 

			ThisForm.ShowDupes() 

		ENDPROC 
[/highlight]
	ADD OBJECT cmdRecall As CommandButton WITH ;
		Width = 90, Height = 30, Left = 186, Top = 390 - 42, Caption = "Recall All", Anchor = 6
			
			PROCEDURE cmdRecall.Click()
				SELECT csrDemo
				Recall ALL 
				
				WITH ThisForm
					.cmdDelete.Enabled = .T.
					.ShowDupes()
				ENDWITH 
			
			ENDPROC 
		
	PROCEDURE Load()
	
		CREATE CURSOR csrdemo (cDrawNo C(5), cBallNo C(3), cPartNo C(10), cCode C(1))
		
		INSERT INTO csrdemo VALUES ('8342C','130','61225-350',"N")
		INSERT INTO csrdemo VALUES ('8342C','130','61225-350',"N")
		INSERT INTO csrdemo VALUES ('8342C','130','61225-350',"N")
		INSERT INTO csrdemo VALUES ('8342D','130','61225-350',"Y")
		INSERT INTO csrdemo VALUES ('8342C','131','834X-131',"Y")
		INSERT INTO csrdemo VALUES ('8342D','131','834X-131',"Y")
		INSERT INTO csrdemo VALUES ('8342C','132','834X-132',"N")
		INSERT INTO csrdemo VALUES ('8342D','132','834X-132',"N")
		INSERT INTO csrdemo VALUES ('8342C','133','61225-362',"Y")
		INSERT INTO csrdemo VALUES ('8342D','133','61225-362',"Y")
		INSERT INTO csrdemo VALUES ('8342C','137','834X-137',"Y")
		INSERT INTO csrdemo VALUES ('8342D','137','834X-137',"Y")
		INSERT INTO csrdemo VALUES ('8342C','141','834X-141',"Y")
		INSERT INTO csrdemo VALUES ('8342D','141','834X-141',"N")
		INSERT INTO csrdemo VALUES ('8342D','141','834X-141',"N")
		INSERT INTO csrdemo VALUES ('8342C','150','67242-107',"N")
		INSERT INTO csrdemo VALUES ('8342D','150','67242-107',"N")
		INSERT INTO csrdemo VALUES ('8342C','152','8332-124',"N")
		INSERT INTO csrdemo VALUES ('8342D','152','8332-124',"Y")
		INSERT INTO csrdemo VALUES ('8342D','153','834X-153',"Y")
		INSERT INTO csrdemo VALUES ('8342D','153','834X-153',"Y")
		INSERT INTO csrdemo VALUES ('8342D','153','834X-153',"Y")
		INSERT INTO csrdemo VALUES ('8342D','153','834X-153',"N")
		INSERT INTO csrdemo VALUES ('8342D','153','834X-153',"Y")
		INSERT INTO csrdemo VALUES ('8342C','154','834X-154',"Y")
		INSERT INTO csrdemo VALUES ('8342D','154','834X-154',"Y")
		
		INDEX on cDrawNo + cBallNo + cCode TAG xcDBCheck
		
		SET ORDER TO xcDBCheck 
		
	ENDPROC 
	
	PROCEDURE Init()
		This.ShowDupes()

	ENDPROC 
	
	PROCEDURE ShowDupes()
		LOCAL ARRAY laDupes[1]
	
*!*	BEGIN of code to check for duplicates		

		SELECT csrDemo.*, csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode as cCode2Check, csrDupes.lDupes ;
			FROM csrDemo ;
				JOIN (SELECT cDrawNo, cBallNo, cCode, .T. as lDupes FROM csrDemo GROUP BY 1, 2, 3 HAVING COUNT(cDrawNo + cBallNo + cCode) >= 2) csrDupes ;
					ON csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode = csrDupes.cDrawNo + csrDupes.cBallNo + csrDupes.cCode;
			UNION ALL ;
				SELECT csrDemo.*, csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode as cCode2Check, csrNoDupes.lDupes ;
					FROM csrDemo ;
					JOIN (SELECT cDrawNo, cBallNo, cCode, .F. as lDupes FROM csrDemo GROUP BY 1, 2, 3 HAVING COUNT(cDrawNo + cBallNo + cCode) < 2) csrNoDupes ;
						ON csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode = csrNoDupes.cDrawNo + csrNoDupes.cBallNo + csrNodupes.cCode ;			
			ORDER BY 2, 1 ;
			INTO CURSOR csrAll READWRITE 
			
		SELECT COUNT(lDupes) FROM csrAll WHERE lDupes INTO ARRAY laDupes
		
		IF VARTYPE(laDupes[1]) = "N" AND  laDupes[1] > 0
			ThisForm.cmdDelete.Enabled = .T.

		ELSE
			ThisForm.cmdDelete.Enabled = .F.

		ENDIF 
		
			
*!*	END of code to check for duplicates

		WITH ThisForm.grdNames
			.ColumnCount = -1
			.RecordSource = "csrAll"
			.Column1.Header1.Caption = "DrawNo"
			.Column1.Width = 60
			.Column2.Header1.Caption = "BallNo"
			.Column2.Width = 72
			.Column3.Header1.Caption = "PartNo"
			.Column3.Width = 72
			.Column4.Header1.Caption = "in Stock"
			.Column4.Width = 60
			.Column5.Header1.Caption = "Code2Check"
			.Column5.Width = 90
			.Column6.Header1.Caption = "Duples?"
			.Column6.Width = 48
			.SetAll("DynamicFontItalic", "csrAll.lDupes", "Column")	  
			.SetAll("DynamicFontBold", "csrAll.lDupes", "Column")	  
			.SetAll("DynamicBackColor", "ICASE(csrAll.lDupes, RGB(225,0,0), ;
											MOD(ASC(RIGHT(csrAll.cDrawNo, 1)), 2) = 0, RGB(200,200,200), ;
											RGB(254,254,254))", "Column")	  	
		ENDWITH 

	ENDPROC 

	PROCEDURE cmdCancel.Click()
		CLEAR Events
		ThisForm.Release

	ENDPROC
  
	PROCEDURE Destroy()
		CLEAR Events
		ThisForm.Release

	ENDPROC
ENDDEFINE

***********************************************************************

Enjoy

MarK
 
MarK,

I see what you did by SKIP 1 and DELETE WHILE. But I don't think that's the way to go, as it only sorts by group crieter, what's necessary is knowing by which further field to sort within the groups.

My strtegy would be:
A) find relevant duplicate records by just using this part of the SELECT-SQL:
Code:
SELECT * FROM region ;
	       WHERE RG+DIV+STA+EMPNO IN ( ;
	       SELECT RG+DIV+STA+EMPNO ;
		FROM region ;
		WHERE RG = pcRegion ;
		GROUP BY 1 ;
		HAVING COUNT(*)>1 ;
	       ) AND asof < pcBkDate

B) index that result in the relevant order putting the record to keep at the top. That would need an index on RG+DIV+STA+EMPNO+something to primarily sort by groups and +something being whatever is necessary to put the record you want to keep at the top of each group. And I dont lnow, what.

If that was depending on the asof date, it would need to be something in reverse order of DTOS(asof). Because at first you need something that is of character type, to be able to add it to RG+DIV+STA+EMPNO and on the other side adding DTOS(asof) wold put the most relevant record at the bottom of each group.

So, to not overcommplicate things, I'd rather handle finding the MAX(asof) within each group by
Code:
SELECT RG+DIV+STA+EMPNO, MAX(asof) as asoftokeep;
		FROM region ;
		WHERE RG = pcRegion and asof<pcBkDate;
		GROUP BY 1 ;
		HAVING COUNT(*)>1
Using that, I'd still not be sure whether that's the groups you want to determine, TheLazyPIg, as that might just fail to detect duplicates if one of the group records is before pcBkDate and one of them is after pcBkDate

So you better sort this out yourself, The LazyPig. You can, of course, take guidance from MarK on how to present this data. The strategy to locate the top of a group, then SKIP 1, and then DELETE WHILE is fine, as long as you find a way to sort the group data. And that requires an index that has more than just the expression you group by.

What's easier to sort by is using SQL ORDER BY, as that's enabling you to sort not by one expression, but by a list of fields, and you can ask to sort by ASCendin or DESCending order individually, so you could do SQL ORDER BY RG, DIV, STA, EMPNO, asof DESC.

Chriss

PS: Also notice whatever cursor you prepare to display the duplicates, you want to finally do ddeletions in the region.dbf, not in that cursor. That requires to identify the corresponding record, so what you need there is a primary key in region.dbf.
 
Hi Chris,

You're right - additional criteria are required to know which record to keep. Thanks for sharing your ideas.

Have a nice day.

MarK
 
Thank you for the replies.

I created a new table (region2) same column as the region, which I use to append the correct records from duplicates. Delete all duplicate records then PACK. After that, I append region2 to region.

I used ZAP to region2 before append so I'll only insert the current records from duplicate.

 
I wonder about all the data that's not in groups with dplicates, but I guess you've got that right, all in all.

Chriss
 
Chris Miller said:
Quote (Dan Olson)
* The code above deletes ALL duplicate rows regardless of how many they are
Yes, and that's what TheLazyPig doesn't want to do but keep one record.

But I AM keeping one record by reading it back in the next line, right?
 
Dan,

what do you mean? do you mean APPEND FROM DBF('Query')?

Well, and that makes some fields empty, doesn't it? You don't skip one record from deleting and you don't get it back by only appending the queried fields, do you?

It may work somehow in your special case, but not generally, does?

Assume I have fields 1-10, detect duplicates by using fields 3-5 for grouping, then what about the content in fields 1,2,6-10? By append from the query I don't add these values back, do I?

Aside from the fact that TheLazyPig has solved the problem, what you would get back is a record that belongs to this group, but you lost all the detail data. If you're okay with that, fine. But if you want to keep one of the records data in all fields, that's not leading to that goal, does it?

Chriss
 
My example assumes that ALL duplicates are exactly the same.
My solution won't work otherwise as you pointed out...
 
Hi,

Below the demo code for deleting duplicates. It assumes that the latest record of the duplicates should be kept. I made it a two steps approach in order to show the different aspects. However these two steps could easily be made one. Enjoy.

Code:
*!*	Check for duplicates - delete those you don't want to keep

PUBLIC go_Form

SET DELETED ON 

go_Form = CreateObject ("frmForm")
go_Form.Visible = .T.
go_Form.Show

READ Events
CLOSE ALL
CLEAR ALL

***********************************************************************

DEFINE CLASS frmForm As Form
  Width = 630
  MinWidth = 630
  Height = 420
  MinHeight = 420
  AutoCenter = .T.
  Themes = .F.
  Caption = "Identical Code2Check = Duplicate"
  
	ADD OBJECT grdNames as Grid WITH ;
		Visible = .t., Top = 12, Left = 12, Width = 630 - 24, Height = 420 - 66, ReadOnly = .T., DeleteMark = .F., Anchor = 15

	ADD OBJECT cmdCancel As CommandButton WITH ;
		Width = 90, Height = 30, Left = 12, Top = 420 - 42, Caption = "Release", Anchor = 6

		PROCEDURE cmdCancel.Click()
			CLEAR Events
			ThisForm.Release

		ENDPROC 

	ADD OBJECT cmdDelete As CommandButton WITH ;
		Width = 90, Height = 30, Left = 114, Top = 420 - 42, Caption = "Update", Anchor = 6, Enabled = .F., BackColor = RGB(0, 180, 180)
		
		PROCEDURE cmdDelete.Click()
			LOCAL liPKey as Integer
			LOCAL ARRAY laDupes2Delete[1]
			
			SELECT iPKey FROM csrAll WHERE !(lKeep) INTO ARRAY laDupes2Delete

			FOR lni = 1 TO ALEN(laDupes2Delete)
				IF VARTYPE(laDupes2Delete[lni]) = "N"
					liPKey = laDupes2Delete[lnI]
					= INDEXSEEK(liPKey, .T., "CSRDEMO")
					DELETE IN csrDemo

				ENDIF
			ENDFOR
			
			WAIT WINDOW + TRANSFORM(ALEN(laDupes2Delete)) + " records deleted" TIMEOUT .5

			WITH ThisForm
				.ShowAll() 
				.ShowGrid()
			ENDWITH 
		ENDPROC 

	ADD OBJECT cmdEdit As CommandButton WITH ;
		Width = 150, Height = 30, Left = 216, Top = 420 - 42, Caption = "Show dupes to keep", Anchor = 6
		
		PROCEDURE cmdEdit.Click()
			LOCAL ARRAY laRecords2Keep[1]
		
			SELECT cCode2Check, MAX(dDate) FROM csrAll WHERE lDupes GROUP BY 1 INTO ARRAY laRecords2Keep
			
			IF ALEN(laRecords2Keep) > 1
				FOR lnI = 1 TO ALEN(laRecords2Keep) STEP 2
					UPDATE csrAll SET lKeep = .T. WHERE cCode2Check = laRecords2Keep[lnI] AND dDate = laRecords2Keep[lnI + 1]

				ENDFOR
	
				ThisForm.cmdDelete.Enabled = .T.
	
			ENDIF 
		ENDPROC 			

	ADD OBJECT cmdRecall As CommandButton WITH ;
		Width = 90, Height = 30, Left = 378, Top = 420 - 42, Caption = "Recall All", Anchor = 6
			
		PROCEDURE cmdRecall.Click()
			SELECT csrDemo
			Recall ALL 
				
			WITH ThisForm
				.ShowAll()
				.ShowGrid()
			ENDWITH 
		ENDPROC 
		
	PROCEDURE Load()
	
		CREATE CURSOR csrdemo (iPKey I AUTOINC NEXTVALUE 100, cDrawNo C(5), cBallNo C(3), cPartNo C(10), cCode C(1), dDate D)
		
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342C','130','61225-351',"N", DATE() - 31)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342C','130','61225-350',"N", DATE() - 32)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342C','130','61225-352',"N", DATE() - 29)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','130','61225-350',"Y", DATE() - 28)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342C','131','834X-131',"Y", DATE() - 10)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','131','834X-131',"Y", DATE() - 11)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342C','132','834X-132',"N", DATE() - 12)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','132','834X-132',"N", DATE() - 5)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342C','133','61225-362',"Y", DATE() - 7)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','133','61225-362',"Y", DATE() - 13)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342C','137','834X-137',"Y", DATE() - 52)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','137','834X-137',"Y", DATE() - 60)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342C','141','834X-141',"Y", DATE() - 51)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','141','834X-141',"N", DATE() - 42)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','141','834X-141',"N", DATE() - 5)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342C','150','67242-107',"N", DATE() - 8)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','150','67242-107',"N", DATE() - 9)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342C','152','8332-124',"N", DATE() - 7)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','152','8332-124',"Y", DATE() - 17)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','153','834X-153',"Y", DATE() - 18)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','155','834X-153',"Y", DATE() - 16)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','155','834X-153',"Y", DATE() + 3)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','153','834X-153',"N", DATE() + 1)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','153','834X-153',"Y", DATE() - 29)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342C','154','834X-154',"Y", DATE() - 27)
		INSERT INTO csrdemo (cDrawNo, cBallNo, cPartNo, cCode, dDate ) VALUES ('8342D','154','834X-154',"Y", DATE() - 31)
		
		INDEX on iPKey TAG xiPKey    
		
		SET ORDER TO xiPKey 
		
	ENDPROC 
	
	PROCEDURE Init()
		WITH This
			.ShowAll()
			.ShowGrid()
		ENDWITH 
	ENDPROC 

	PROCEDURE ShowAll()
	
*!*	BEGIN of code to show all data

		SELECT csrDemo.*, csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode as cCode2Check, csrDupes.lDupes, csrDupes.lKeep ;
			FROM csrDemo ;
				JOIN (SELECT cDrawNo, cBallNo, cCode, .T. as lDupes, .F. as lKeep FROM csrDemo GROUP BY 1, 2, 3 HAVING COUNT(cDrawNo + cBallNo + cCode) >= 2) csrDupes ;
					ON csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode = csrDupes.cDrawNo + csrDupes.cBallNo + csrDupes.cCode;
			UNION ALL ;
				SELECT csrDemo.*, csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode as cCode2Check, csrNoDupes.lDupes, csrNoDupes.lKeep ;
					FROM csrDemo ;
					JOIN (SELECT cDrawNo, cBallNo, cCode, .F. as lDupes, .T. as lKeep FROM csrDemo GROUP BY 1, 2, 3 HAVING COUNT(cDrawNo + cBallNo + cCode) < 2) csrNoDupes ;
						ON csrDemo.cDrawNo + csrDemo.cBallNo + csrDemo.cCode = csrNoDupes.cDrawNo + csrNoDupes.cBallNo + csrNodupes.cCode ;			
			ORDER BY 3, 2 ;
			INTO CURSOR csrAll READWRITE 
			
*!*	END of code to show all data

		WITH This
			.cmdDelete.Enabled = .F.
			.ShowGrid()
		ENDWITH 
	ENDPROC 

	PROCEDURE ShowGrid()
		WITH ThisForm.grdNames
			.ColumnCount = -1
			.RecordSource = "csrAll"
			.DeleteColumn(1)
			
			.Column2.Header1.Caption = "DrawNo"
			.Column2.Width = 60
			
			.Column3.Header1.Caption = "BallNo"
			.Column3.Width = 72
			
			.Column4.Header1.Caption = "PartNo"
			.Column4.Width = 72
			
			.Column5.Header1.Caption = "in Stock"
			.Column5.Width = 60
			
			.Column6.Header1.Caption = "Date"
			.Column6.Width = 90

			.Column7.Header1.Caption = "Code2Check"
			.Column7.Width = 90
			
			.Column8.Header1.Caption = "Duples?"
			.Column8.Width = 60
			.Column8.NewObject("chkDupes","CheckBox")
			.Column8.CurrentControl = "chkDupes"
			.Column8.chkDupes.Caption = ""
			.Column8.chkDupes.Visible = .T.
			.Column8.Sparse = .F.
			
			.Column9.Header1.Caption = "Keep?"
			.Column9.Width = 60
			.Column9.NewObject("chkKeep","CheckBox")
			.Column9.CurrentControl = "chkKeep"
			.Column9.chkKeep.Caption = ""
			.Column9.chkKeep.Visible = .T.
			.Column9.Sparse = .F.

			.SetAll("DynamicFontItalic", "csrAll.lDupes", "Column")	  
			.SetAll("DynamicFontBold", "csrAll.lDupes", "Column")	  
			.SetAll("DynamicBackColor", "ICASE(csrAll.lDupes, RGB(225,0,0), RGB(0,240,240))", "Column")  	

		ENDWITH 
	ENDPROC 

	PROCEDURE Destroy()
		CLEAR Events
		ThisForm.Release

	ENDPROC
ENDDEFINE

***********************************************************************

MarK
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top