Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Speeding up this .prg

Status
Not open for further replies.

inteleserve

IS-IT--Management
Dec 13, 2002
75
US
Does anyone know how I can speed this up? Its just not cutting it anymore with 2,000,000 record lists. Its just a simple program that finds two records with identicle phone number fields and tags them both with "X" for deletion when completed. It would really help me out....


Thanks,

Ross


*********************************************************

close all
clear
use temp.dbf
index on phone tag phone
use temp.dbf order phone exclusive
set status off
set scoreboard off
set console off
set device to screen
STORE RECCOUNT() to Xtotal
STORE 0 TO nHowMany
DO WHILE !EOF()
nHowMany = nHowMany + 1
IF nHowMany % 1000 = 0 &&... every 1000 records
@12,20 say str((nHowMany / Xtotal)*100) + "% done deduping..."
ENDIF
store phone to p1
skip 1
store phone to p2
if p1=p2
replace state WITH "X"
skip -1
replace state WITH "X"
skip 1
endif
enddo
 

Im not sure about speed, I have nothing to test.

Code:
SELE COUNT(phone) - 1 as Dupes,phone ;
FROM myTable ;
HAVING COUNT(phone) > 1 ;
ORDER BY 2 ;
GROUP BY phone INTO cursor myCursor

Now you can loop through your table that holds the duplicates against your original table

Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
 
SELECT * from temp GROUP BY phone

If you specifically want the most recent value, you'd need a way to 'know' which one is later, such as an date.

Brian

 
What I am looking for is to have a new list with both records that had identical phone fields to be tagged a certain way. So I could:

1. know which records were duplicated

2. if a record was found more then once, they need to be tagged a certain way so all of the 5073455611 phone numbers can be taken out.












 
inteleserve

The suggestion I made gives you a cursor of all records that are duplicates. Do you need more?

Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
 
This code leaves you with only non-duplicates:
SELECT * from temp GROUP BY phone

To get the list of what was duplicate:
SELECT *,sum(1) as N from temp GROUP BY phone HAVING n>1
 
Hi Ross

Change your index to structural (.cdx) and save yourself the time of indexing. I think that it also helps with Rushmore if you decide to change to select.

If you have 2,000,000 records and progress is being shown as a 0 - 100% you should only change the display every 20,000 records.

store phone to p1
skip 1
if p1=phone && Why take the time to "store" to P2?
replace state WITH "X"
skip -1
replace state WITH "X"
skip 1
endif

Test this code to see how much time you can save not doing a modulus.

tnHowMany = tnHowMany + 1
IF nHowMany = 1000 &&... every 1000 records ...20,000???
nHowMany = nHowMany + 1000 && initialize nHowMany at program start
@12,20 say str((nHowMany / Xtotal)*100) + "% done
tnHowMany = 0
deduping..."
ENDIF

Regards,

Mike
 
Oops!
Typo.
IF tnHowMany = 1000

Regards,

Mike

PS Did you intend to mark both entries? For deletion, or what?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top