Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Mass deletion of duplicated files 2

Status
Not open for further replies.

TreborOz

Programmer
May 7, 2002
7
AU
Our FM 5.5 Pro DB has more than 30,000 files of which 20,000 are duplicated [ie file is duplicated 3 times]. Is there a simple way to mass delete 2 of the 3 files leaving us with a clean, duplicate free DB?
 
Lets see if I can describe this.
It’s easier to do than explain.
I use this technique as a standard in all my databases, protect by a password and usually only access. by the developer for maintenance purposes.
For this workaround, you should understand global fields and know how to define fields.

!!!!Caution!!!!! Deleting records is undoable. Use of this script to delete duplicate records should be run on a copy of your database file. Always keep a backup of your database before performing any scripts that include steps for deletion of records.

1.Create an unique value for each record. i.g. a ClientID field. Some records contain the same identification number, these are the duplicate records you want to find.
If your database doesn’t have a unique identification field, define a calculation field to create one from existing fields. For ex., combine first name, last name, and birthday create a unique identification for each client or person.

2. Add two fields
2.1. a text field, called “Del”, when the script finds a duplicate record, it places an “x” in this field to mark the record.
2.2. a global field, called Global, to store the unique identification while comparing records. Global should be the same data type as the ClientID field.

3.Display the “Del” field.
Create a layout that display the Del field, or add the field to an existing layout. You’ll use this layout to store Find setting in the following step.

4.Store settings. Store Sort and Find settings for the script.

4.1. Sort : in Browse mode, choose Sort from the Mode menu. If fields appear in the Sort Order List, click Clear All. Click ClientID, choose Ascending order, click Move and click Done.

4.2. Find : Choose a layout that displays the Del field. In Find mode, type an X into the Del field and then switch to Browse mode (you don’t have to perform the find request)

5. Define the script, now comes the tricky part
In ScriptMaker, define the Find Duplicates script.

Find all
Sort (Restore, No dialog)
Goto Record – Request – Page (First)
Replace (No dialog, “Del” , “”””)
Set Field (“Global” , “ClientID”)
Loop
Goto Record – Request – Page (Exit after last, Next)
If (“Global = ClientID”)
Set Field (“Del” , “”X””)
Else
Set Field (“Global” , “ClientID”)
End IF
End Loop
Perform Find
When you perform the Find Duplicates script,

1.It finds all records, and then sort them by the ClientID field, so that records with the same ClientID are grouped together.

2.Starting with the first record, the script copies the value from the ClientID field into the Global field.

3. The script goes to the next record and compares the value in ClientID with the value in Global.
3.1. If the values match, the record is a duplicate, the script puts an X in the Del field.
3.2. If the value don’t match, FM copies the ClientID value into Global, and it doesn’t change the Del field.

4. The script repeats step 3 until it reaches the last record in the file.

5. The script finds all records with an X in the Del field, and displays the found set of the duplicate records.

Now delete the duplicate records.
To delete the duplicate records you can manually delete the found set using the Delete All command from the Mode menu after running the script.
This way you can control the deleting of records before making any mistake.
If your script works properly to find the duplicate records, you can add the Delete All command as a last step into the script.
This will automate the whole process.

Hope this will help

Happy FM - John

 
Wow,

Thanks John! I will now go away and try it. On rereading you post, it does seem a sound solution for which I am very greatful.

Cheers

TreborOz
 
Hi John,

Ran into problems near the end of the Script.

1. Created a new Unique Value Field by combining 2 alpha and one numeric fields in the master file. Called it UniqueID
2. Created 2 new text fields a] called "DEL" [included inverted comma's.] b] called GLOBAL as a global field.

Can view all three fields in the Master File

I believe I did all that was required for the Store Settings step for the csript, but don't know how to check this out at this point.

In defining the script, I have several issues:
1. NO 'Find All' option is available in my list
2. Replace - how do I define """" after "Del"
3. Set Field - - how do I define ""X"" after "Del"

I can't get past this point.

Am I missing something? Or just thick?

Cheers

TreborOz
 
Hi,

There can be some differences in labels between the versions of FM...

1. Find all - show all records
2. Replace - specify - Replace with calculated result - "" - calculated result must be text
3. Set field - Del - specify - "X" - calculated result must be text.

If you have any trouble let me know.

HTH

John
 
John, TreborOz,
A word of caution. Identifying duplicate records where the records correspond to people is a nightmare. (I know an academic who is making a fortune identifying genuine duplicates for insurance companies and the like.)
A ClientID consisting of <Firstname & LastName & Birthdate> is usually not adequate if you are going to just delete the duplicates.

Delete can be made undoable. I have a solution in which I export the records to be deleted before deleting them. Undo = import.
In TreborOz's case, I would use the (inadequate) ClientID definition above, export the apparent duplicates, delete them, and then use a relationship between the exported records and the retained ones to ensure the other data matches.
Cheers,
Paul J.
 
Hi Paul,

yes you're right about the caution.

The ClientID consisting of <Firstname & LastName & Birthdate> is usually not adequate if you are going to just delete the duplicates.
This always needs some manual control.

It was only given as example to use an 'unique' ID.

It depends of how the use of unique ID is set up and how the db is used.

One thing to consider about using IDs like Client ID etc is make absolutely sure you have complete control over it and never allow user entry on that field. Once a record is created via a script attached to a button, that ID should never change.

This is extremely vital when you have children records associated with them. Example Invoice and Line items. Each line item is assoicated with the invoice. If ever that invoice ID changes it will orphan every line item on that invoice and you'd spend considerable time trying to find the orphans and re linking the data.

I give all of my files a PrimaryID_key, which is an auto-entered calculation into a text**** field. The calculation is the concatenation of the Filemaker internal RecordID, plus the current Date and the current Time.


Right( &quot;000000&quot; & Status(CurrentRecordID), 6 ) & &quot;-&quot; &Right( &quot;000000&quot; & Status(CurrentTime), 6 ) & &quot;-&quot; &Right( &quot;000000&quot; & Status(CurrentDate), 6 )

This generates a value something like:007865-040474-730452

But in some apps the use of the current Record ID is not just a good idea. Which put us back to zero.....

I think every developer has to search for a waterproof unique ID system for every app.

Thanks Paul for pointing that out.

John
 
Another dumb question, but...

For deduping--why not just create some sort of hashcode using client data--maybe, LN+left(FN,4)+zip or something more specific to your records--

--then create a self-join relationship, with the hashcode as the relation--then create a serial number field and a calculated field defined like this:
if (counter=selfjoinrelationshipname::counter, &quot;unique&quot;, &quot;duplicate&quot;)?

Then do a find in the calculated field for all &quot;duplicate&quot;, eyeball them via a few different sort orders--or better yet, export them-- and then delete them?

I do a TON of deduping, and this seems to be a fairly simple solution which has worked for my purposes--what are the drawbacks to this vs. the one described earlier, other than the difficulty of getting an adequate hashcode?

I'm fairly new to Filemaker--and I'm trying to learn the best ways to do things, so if there's a better way than the one I'm using, I'd love to have it in my toolbox!!

Thanks...
--Regina
 
In all my years of developing I learned one thing.

You almost never have THE solution, more often just A solution.
You can use a workaround in application A and B, and it will work fine, but it can not work for app C.

If the solution works for you, in your application, and you feeling fine with it, just use it.
Which doesn't mean if you find a better one, you don't have to use it.

That's part of the learning curve.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top