Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations sizbut on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

combine two tables minus duplicates

Status
Not open for further replies.

rssgcmy

Programmer
Dec 27, 2007
4
Hi all

I have two tables from two different branches that i need to combine while at the same time remove any duplicates. I used the following SQL statement:

Code:
Select * From table1 Union Select * From table2 Into Table MERGED2

However, some duplicates are not removed by virtue of having one different field value. For example:

table 1

01092009, 300.00, REC1234, HQ222

Table 2

01092009, 300.00, REC1234, BR222


The above records are identical except having different branch codes. How to compare the two records by ignoring the branch code while keeping only one as the other is a duplicate? Thanks in advance
 
Dear Dave

The actual fields in the table are many. What needs to be excluded is only one field. I am not too familiar with using SQL commands which may be peanuts to you guys. Any guidance is appreciated.
 
You might try something like this:

Code:
Select * From table1 ;
  Union Select * From table2 ;
  where Field4 NOT IN (Select Field4 from Table1);
  Into Table MERGED2

asuming Field 4 is the name of the field that might contains the duplicate values.

Note that the above is OTTOMH ("off the top of my head") and not tested at all.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk
 

Also not tested, but I would try something like this (also not tested):

Code:
SELECT * FROM Table1 INTO CURSOR Tmp1 READWRITE
ALTER TABLE Tmp1 DROP COLUMN Field4   && since you are not interested in its value anyway

SELECT * FROM Table2 INTO CURSOR Tmp2 READWRITE
ALTER TABLE Tmp2 DROP COLUMN Field4

SELECT * FROM Tmp1 UNION SELECT * FROM Tmp2 INTO TABLE Merged2
USE IN Tmp1
USE IN Tmp2

Hi Mike,

Sorry, but it looks like your query won't work, as all the records should be picked up based on all other fields' values.

Say, both tables have all the same records, but Table1 has all Field4 values equal to "A", and Table2 has all Field4 values equal to "B". In this case, Field4 NOT IN (Select Field4 from Table1 is true, and all the records in Table2 will be added - but they shouldn't. And the other way around, if all the different records in both tables which really should be merged, but they wouldn't if they all (or part of them) have the same value 'A" in Field4.

At least, that's the way I understood the task and your solution.
 
Stella,

Yes, I can see you're right (in your comments re my proposed solution). I wasn't allowing for the possibility of the duplicates being in the same table. There's no reason why they shouldn't be.

I did say it was OTTOMH.

Another idea would be to do a UNION ALL, and then do another pass to eliminate the ones that are duplicated on Field4. But you'd still have to write a list of all the fields (except Field4), which is what the questioner was trying to avoid.

But your ALTER TABLE / DROP COLUMN solution looks like a good possiblity.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk
 
Mike,

I wasn't allowing for the possibility of the duplicates being in the same table.

Frankly, I wasn't even thinking about duplicates in the same table - defining duplicates as records identical in all the fields without taking into account Field4, as we called it.

I was thinking more in the direction that the Field4 is not important for this task, and I don't know what's in it, and it could be anything at all - like, say, a logical value, which can only be .T. or .F., so it will be almost unavoidably repeated many times, whether the rest of the record has duplicates or not.

But your ALTER TABLE / DROP COLUMN solution looks like a good possiblity.

Thanks. And I cannot even test it as it is :) - VFP6 doesn't have READWRITE clause in SELECT-SQL, so I would have to rewrite it.
(I would, probably, SELECT into tables, then ERASE them after I am done - or just plain old list the fields in the original UNION.
If there are too many fields to type, I would LIST STRUCTURE to a text file, then copy it back to the program and edit.)
 
Dear Mike and Stella

Thanks for the insightful solutions. Removing the "offending" field is the probable solution since in combining the records, what is important is keeping the payment records with the exact date, amount paid, receipt number etc from being duplicated so as to have an accurate outstanding balance. The removed field can be added later after the procedure has been completed. Thanks again.
 

rssgcmy,

That raises the question, do you really need that field from the start, if you are not interested in its contents and are going to drop it and then create again later?

Is it some sort of work field/flag/indicator, etc., required by the program, or you can live without it until the point where you already done combining records from the two branches?
 
Dear Stella

I do need the field because it is the identifier for the branch concerned. The problem occurs when both branches keyed in the exact details of a payment made to update their records. This is not supposed to happen if the client follows the procedure of exchanging updated data through email at the end of the day where discrepancies can be addressed immediately. Only once and a while when they decided to exchange data, it shall become cluttered with duplicated payments and naturally they came running asking for help. This client has been using the software for more than 11 years and old habits die hard. That is why i decided to write the procedure for them to execute whenever this problem occurs. The identifier field needs to be excluded to effectively track the duplicates.
 
I'm not sure if this is resolved nor am I too clear on SQL... if you can index on your fields, index on one field that stores the same information regardless of which branch you are at. I would use field3 seeing as that looks like the record identifier. Import everything into one table as you originally posted.

I don't know how to write SQL but if you think you can translate this code written in vfp into SQL then by all means:

set order to FIELD3 &&Again, field3 seems the best candidate
go top
do while not eof()
store FIELD3 to mFIELD3
skip
do while FIELD3 = mFIELD3
delete
skip
enddo
enddo
count for deleted() to mdeleted
if mdeleted > 0
set filter to deleted()
go top
brow
endif


That code does what you need it to. I'd recommend creating a backup of whichever database you want to use it on.

On ending note I should repeat my previous statement: I don't know SQL. This was written in VFP for use on dbf3's. If that language is no good to you, atleast you have an idea of what your program needs to do.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top