Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Append Program

Status
Not open for further replies.

PremiumL

MIS
Nov 3, 2004
10
US
I have 2 databases.

Mail.dbf
Phone.dbf

I need to match the address and zip code field in mail.dbf to phone.dbf. When I get a match, I need the program to append the phone number field (field is called phone) from phone.dbf to mail.dbf. The address and zip code fields are called address and zip. I'm dealing with databases that have over 1 million records in them so I was looking to make a program that can be pretty quick. Thanks for all the help in advance.

Thanks,

Jarrett Chanzes
 
I try to run that program but it says alias not found. What should I do? Thanks for the help.

Thanks,

Jarrett
 
* The fastest way to do that is this program.
* You should note that the content of the address field needs to be consistent otherwise there will be no match!

Code:
close all
use mail in 0
use phone in 0
Select mail
index on zip+address tag sometag && if zip is of type char
* if zip is of type numerical:
* index on str(zip,5)+address tag sometag
set order to sometag
Select phone
index on zip+address tag sometag && if zip is of type char
* if zip is of type numerical:
* index on str(zip,5)+address tag sometag
set order to sometag

select Mail 
scan
  if seek(Zip+address,"PHONE") && if zip is of type char
    * if zip is of type numerical:
    * if seek(str(zip,5)+address,"PHONE")
    replace phone with phone.phone
  endif 
endscan

Rob.


 
None of the provided solutions are fast or organized enough. Relations will handle all details at top speed if you use them right.

clos data
erase ("temp.dbf")
use phone
sort on zip,address to temp fields zip,address,phone for not empty(phone)
clos data
use temp alias source
index on zip+address unique tag zau
set order to zau
sele 0
use mail alias target
set rela to zip+address into source
repl for not eof("SOURCE") phone with source.phone
use
sele source
dele tag all
clos data
erase ("temp.dbf")

This simple sequence is so valuable that I have made it a generic PRG.

DO UPDATE WITH ...6 parameters...

A few DO UPDATE statements can shovel data around in a real hurry.

>I need the program to append the phone number field

This request is confusing. I have posted the phone number from phone.dbf to mail.dbf, overwriting whatever may have been in mail.dbf in the matching records. It is not obvious what 'append' means here but many different things can be accomplished with slight modifications to the REPLACE statement.
 
Severach,

Sorting to another temporary dbf AND creating an index on the fields you sorted on for sure isn't the fastest way.
It also isn't the most organized way.
Its a way of spending CPU-time on something that isn't needed.

Your so valuable sequence looks like a very strange solution to me.

Rob.


 
>Sorting to another temporary dbf AND creating an index on the fields you sorted on

If the Fox team were a bit smarter, SORT+INDEX could have been provided as a single operation which would make it faster without exception. Perhaps SQL subselects already do this.

>for sure isn't the fastest way.

Sorting to a temporary file is almost always faster. It could only be slower because of specific conditions of phone.dbf that neither of us know. If phone.dbf is all of
* Well ordered on our posting key
* Has few fields other than those required for the post
* Has few empty phone numbers or other kinds of phone numbers that we don't want to post
then creating the sorted temporary file will waste the time used to sort. If any of those are not true then creating a sorted and indexed temporary file will be faster. I post phone numbers between large files just as the OP wants. I used to index the original file which is 300,000 records and 165MB long and in random order against the posting key but it was much too slow. Now I perform a 10 second SELECT which extracts the 150,000 postable phone numbers and orders the file on the posting key into a 4MB file. Remembering that the keys in mail.dbf are expected to be in random order because it was not sorted on the key, it should be clear that randomly traversing 165MB of randomly ordered records via an index is extremely slow, randomly traversing a 165MB file in perfect order is better but almost as slow, and randomly traversing a 4MB file held entirely within the Fox buffer and in perfect order is super fast. Dramatically different speeds, same result. Now my large original files are permanently read-only because I have learned that every time it seems like I need to index them to post something, indexing a sorted temporary file ends up being significantly faster and always safer. I didn't believe it either at first but constant working with large files made it obvious quickly that sorting before indexing provides far more gain than it costs. The OP is working on files about 5x larger than mine and my techniques with indexed sorted files and relations will perform well long after the other techniques have become too slow to be useful.

The reason why sorting a randomly ordered file is fast and traversing a randomly ordered file via an index is slow is a problem of optimization. SORT and INDEX ON are operations which you specify the input and the output but not the method. Every optimization method for sorting is usable to produce the sorted output. Traversing file via an index additionally specifies the method, which is that every record be visited one by one in order. By clipping and sorting the file before indexing, you can provide the optimizations that the keys are in perfect order and the file is as small as possible. The following sequences produce the same output. The INDEX sequence is permanently disabled because it will always be slower.

USE UNSORTED
IF .F. THEN
INDEX ON SORTKEY TAG SORTKEY
COPY TO SORTEDFIL && Super Slow
DELE TAG SORTKEY
ELSE
SORT ON SORTKEY TO SORTEDFIL
ENDIF

Whether or not posting from a temporary file is safer depends on yet more factors we don't know. If phone.dbf is read-only due to the read-only mark, reading from a read-only media, or a read-only network connection, or if phone.dbf has indicies on it that we can't safely delete then it is necessary to create the temporary file whether or not it is faster.

If it is known that indexing the original file is safe and fast then the temporary file can be avoided by removing a few lines. OP asked for fast and what I provided is always fast without knowing anything special about the original files.

>It also isn't the most organized way.

It is far more organized to the computer and it is the computer that will punish you for doing it wrong.
 
Mike: Your code is biased.

The creation of the index file for the first two methods is done outside of the timing for those methods. The impact is minor on the eventual timing (probably because of caching).

I moved the creation and deleting of the ixRel tags to each of the first two tests.

Some empirical tests (files on local hard drive):

Set relation : 0.649
Lookup : 0.753
Sort + Index : 4.189

Added 3 junk fields to tblPhone
Set relation : 0.622
Lookup : 0.704
Sort + Index : 1.952

Added 3 junk fields to tblMail
Set relation : 0.571
Lookup : 0.646
Sort + Index : 1.271

4 runs with quasi-randomized tblPhone using RAND()
Set relation : 0.657 / 0.676 / 0.665 / 0.669
Lookup : 0.751 / 0.762 / 0.755 / 0.766
Sort + Index : 1.271 / 2.168 / 1.225 / 1.214

4 runs with knrecs = 1 million
Set relation : 7.524 / 7.464 / 7.483 / 7.451
Lookup : 8.621 / 8.614 / 8.585 / 8.692
Sort + Index : 17.127 / 16.619 / 17.409 / 16.166

knrecs = 1 million, default directory on network share
Set relation : 7.536 / 7.536 / 7.457 / 7.544
Lookup : 8.645 / 8.671 / 8.722 / 8.673
Sort + Index : 41.518 / 40.087 / 40.165 / 40.079


My conclusion:
Set relation edges it from lookup
They both have a 2:1 performance benefit vs Sort+Index
Sort+Index degrades badly when using a network share.
 
>The production of the file via SORT or COPY TO is not necessary for this task.

It's not necessary. It's usually faster to do what looks like extra work.

>Added 3 junk fields to tblPhone

My original file is 500 character per record * 300,000 records for 150MB in random order. When I extract and sort the keys + postable phone numbers to a temporary file, it's 150,000 records * 23 characters per record for 4MB in key order. At 4MB, the entire file fits within the FOX cache so ordering the file in key order isn't that significant. It's not much slower to do a sort fields ... than a copy fields... When the file is larger than the FOX buffer sorting before indexing can make a substantial difference.

>I wanted to show it was acceptable that they be there in advance and not built on the fly.

As can be seen above, indicies on my original files are too slow to be useful whether created on the fly or there all the time.

>Sort+Index degrades badly when using a network share.

Your times should be far worse than average if you sort back to the network share because all caching is defeated. Your times should be far better than average if you sort from a network share to local file to post from.

>Set relation edges it from lookup

I've never used LOOKUP() and I'm surprised that it can keep up with relations. If REPLACE can process LOOKUP() almost as fast as it processes relations then there may be cases where LOOKUP() is faster if the conditional can avoid the LOOKUP() until the last possible moment.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top