Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Comparison of tables 1

Status
Not open for further replies.

sdmy

Programmer
Dec 3, 2004
12
0
0
US
Please let me know how can i compare two tables with same data in Oracle which has huge volume of data. My requirement is like this:
I have two environment ..1)Oracle Source ---informatica ETL---Oracle target and 2) Oracle Source ---Datastage ETL---Oracle target. (OS is UNIX Solaris for both)
Once the data get generated into Oracle, i need to compare the table data to find out whether the data gerneated using the ETL tools are in Sync or not. The table will have huge volume of data.
I tried with some DBcomparison tools, but it takes very long time for comparison.

One thing i can do is, i can write the table data into a flat file and do the data comparison.
The problem in this approach is, 1) the order of record may be different and so i need to the orderby based on index columns in the table itself before writing into flat file. i.e I need to make sure that the record order should be the same in both the flat files.
2) I need to skip the columns that holds the time data for the comparison, otherwise all the records will show as mismatch.

Please suggest me the ways to implement this or any new way (sql procedure/shell scripting/or any other method) to solve this issue.

All your help is really appreicated.

Thanks in Advance.
 
SDMY,

Welcome to Tek-Tips. (I noticed you joined just today.) I hope we can be helpful to you.

Oracle provides extremely powerful and fast comparison capabilities via its (Venn-diagram-style) set operators, UNION, MINUS, and INTERSECT.

To find out what rows reside in TABLE_A that do not appear in TABLE_B (leaving off the "time data" columns, and not needing to sort the rows), you would say:
Code:
SELECT col1, col2, col3,...,coln FROM TABLE_A
MINUS
SELECT col1, col2, col3,...,coln FROM TABLE_B;

The above code does a positional column-for-column comparison between the two tables.

Then to identify rows in TABLE_B that do not appear in TABLE_A, you would do just the opposite:
Code:
SELECT col1, col2, col3,...,coln FROM TABLE_B
MINUS
SELECT col1, col2, col3,...,coln FROM TABLE_A;

If you have additional questions, please let us know.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
@ 02:03 (04Dec04) UTC (aka "GMT" and "Zulu"),
@ 19:03 (03Dec04) Mountain Time
 
Thanks for your reply and it's very helpful.
Since the tables exists in a separate databases, i can create a dblink from one database and access the other by selecting the required columns and do the DBComparison.

Does the query (usage of "minus") will run fast for the huge volume of data? Any performance bottlenecks?
 
SDMY,

Conventional wisdom is that Oracle's set operators (UNION, MINUS, and INTERSECT) are the fastest processing per row of any row-processing code except for a full-table scan. Use them in good health and peace of mind.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
@ 20:52 (07Dec04) UTC (aka "GMT" and "Zulu"),
@ 13:52 (07Dec04) Mountain Time
 
I heard that using MINUS to find out the table differences is not reliable if you are comparing millions of records. If yes, is there any alternative way (Stored procedures/triggers) for doing the comparison.
Also, i want the differences to be copied into excel or text file...
Any help and suggestions are appreciated.

Thx in AdV
 
SDMY,

I have never heard of any problems with any volume of data using MINUS or any of the other set operators. If they were somehow unreliable, Oracle would be jeopardising their software's credibility...something they avoid at any costs. Any aspirsions you may have heard on this topic, I believe you can safely disregard. I would bet the the problem was a PICNIC issue: Problem In Chair Not In Ccomputer. [wink]

Then, if you wish to copy the differences to a .csv (Comma-Separated Values file for Excel) or a text file, that is not a problem with MINUS...you just formulate your SELECT using this pseudo code:
Code:
set echo off
set pagesize 0
set feedback off
set trimspool on
spool diffs.csv
SELECT col1||','||col2||','||...||colx
from 
((SELECT <table_a> MINUS <table_b>) UNION
 (SELECT <table_b> MINUS <table_a>));
spool off

Let us know your outcomes.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
@ 03:09 (22Dec04) UTC (aka "GMT" and "Zulu"),
@ 20:09 (21Dec04) Mountain Time
 
Thanks mufasa for your reply. The code works good.

The real problem that i'm facing is, i have to compare the output for around 300 tables from both the databases and each table has an average of around 50 to 60 columns. Moreover, there are 4 to 7 columns with date field, which i need to skip during the comaparison since they definitely won't match between both the tables.

Please let me know how to do this.

Thanks in advance
 
SDMY,

Unfortunately, Oracle doesn't have a set of "easier" commands when comparing 300+ tables. The "easy" commands they offer are "MINUS" and "UNION" since they relieve you from creating individual compare commands for each of the 50-60 columns in each table.

To actually compare each pair of tables, ignoring the date columns, you "SELECT <exp1>, <exp2>,...<exp60> FROM..." for each of the tables in the comparison pair. Just remember to omit the date columns that you do not want to compare.

If you don't want to code up each column, then write a "SQL-writing-SQL" script to access USER_TAB_COLUMNS and assemble the comparison scripts for you.

So the bottom line is, you need to put on your Nike's and "Just Do It".

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
@ 19:55 (22Dec04) UTC (aka "GMT" and "Zulu"),
@ 12:55 (22Dec04) Mountain Time
 
Thanks.

"If you don't want to code up each column, then write a "SQL-writing-SQL" script to access USER_TAB_COLUMNS and assemble the comparison scripts for you"

Please suggest me how to do the above.

Thanks
 
1) The date columns are the same across all the 300 tables.

Is there any way that i can exclude these columns and use the following query..
SELECT col1||','||col2||','||...||colx
from
((SELECT <table_a> MINUS <table_b>) UNION
(SELECT <table_b> MINUS <table_a>));


2) Even if i exclude the timepart of the date columns will also work.

How to implement this scenario?
Is there any Stored procedures for this???

Thanks in ADV
 
Hi,
In order to create a script for your self for each table you can simply run the following procedure.
Chnage the YTL_DIR path as per the settings of your database.

Code:
create or replace procedure myseq is
CURSOR C1 IS
 SELECT TABLE_NAME FROM USER_TABLES;

Cursor First_Table_Cur(P_TAB VARCHAR2) IS
   Select  COLUMN_NAME,
        DATA_TYPE
   From User_Tab_Columns
   Where Table_name =P_TAB;

FILEHANDLER             UTL_FILE.FILE_TYPE;
L_DIR                   VARCHAR2(200) := '/u02/testfiles';
L_SQL_TEXT  		VARCHAR2(4000):=NULL;
L_FINAL_STR 		VARCHAR2(4000):=NULL;

BEGIN
   L_DIR :='Z:\u315\admin\dbn005\utlfile\upload'
;
   FOR I IN C1 LOOP
      L_SQL_TEXT:=NULL;
      L_FINAL_STR:=NULL;
       FOR J IN First_Table_Cur(I.TABLE_NAME) LOOP
          IF J.DATA_TYPE='DATE' THEN
               NULL;
          ELSE
            IF L_SQL_TEXT IS NULL THEN
              L_SQL_TEXT:=J.COLUMN_NAME;
            ELSE
              L_SQL_TEXT:=L_SQL_TEXT||','||J.COLUMN_NAME;
            END IF;
          END IF;
      END LOOP;
      L_FINAL_STR:=' SELECT '||L_SQL_TEXT||' FROM ((SELECT '||L_SQL_TEXT||' FROM '||I.TABLE_NAME||
                   ' MINUS SELECT '||L_SQL_TEXT||' FROM '||I.TABLE_NAME||'@DBLINK)'||
                   ' UNION (SELECT '||L_SQL_TEXT||' FROM '||I.TABLE_NAME||'@DBLINK'||
                   ' MINUS SELECT '||L_SQL_TEXT||' FROM '||I.TABLE_NAME||'));';
       FILEHANDLER:=UTL_FILE.FOPEN(L_DIR,'MYSCRIPT.SQL','W');
       UTL_FILE.PUT_LINE(FILEHANDLER,L_FINAL_STR);
END LOOP;
UTL_FILE.FCLOSE(FILEHANDLER);
EXCEPTION
  WHEN OTHERS THEN
         UTL_FILE.FCLOSE(FILEHANDLER);
         DBMS_OUTPUT.PUT_LINE('AAAA');
END;
/

HTH
Regards
Himanshu
 
Hi,

Thanks for sending me the proc. I tried to use the same procedure to get the DDL's and i got struck with the following error.
ORA-01756: quoted string not properly terminated

set server output on;
create or replace procedure myseq is
CURSOR C1 IS
SELECT table_name FROM USER_TABLES;
Cursor rst_Cur(P_TAB VARCHAR2) IS
Select COLUMN_NAME, DATA_TYPE From User_Tab_Columns Where Table_name ='rst';
FILEHANDLER UTL_FILE.FILE_TYPE;
L_DIR VARCHAR2(200):= 'c:/sample';
L_SQL_TEXT VARCHAR2(4000):=NULL;
L_FINAL_STR VARCHAR2(4000):=NULL;
BEGIN
FOR I IN C1 LOOP
L_SQL_TEXT:=NULL;
L_FINAL_STR:=NULL;
FOR J IN rst_Cur(I.TABLE_NAME) LOOP
IF J.DATA_TYPE='DATE' THEN
NULL;
ELSE
IF L_SQL_TEXT IS NULL THEN
L_SQL_TEXT:=J.COLUMN_NAME;
ELSE
L_SQL_TEXT:=L_SQL_TEXT||','||J.COLUMN_NAME;
END IF;
END IF;
END LOOP;
L_FINAL_STR:=' SELECT L_SQL_TEXT FROM ((SELECT L_SQL_TEXT FROM I.rst
MINUS SELECT L_SQL_TEXT FROM I.cst@link2
UNION (SELECT L_SQL_TEXT FROM I.cst@link2
MINUS SELECT L_SQL_TEXT FROM I.rst));
FILEHANDLER:=UTL_FILE.FOPEN(L_DIR||abc.sql,'W');
UTL_FILE.PUT_LINE(FILEHANDLER,L_FINAL_STR);
END LOOP;
UTL_FILE.FCLOSE(FILEHANDLER);
EXCEPTION
WHEN OTHERS THEN
UTL_FILE.FCLOSE(FILEHANDLER);
DBMS_OUTPUT.PUT_LINE('AAAA');
END;
/

Note: I'm running the proc in windows Operating system. rst,cst are the table names with the same structure and link2 is the dblink name.

Please let me know the mistakes that i have made in the code..

Thanks
 
Hi,

Thanks for your reply.
The tool cannot be used if the database is huge and has lot of tables and the table has more columns and huge volume of data. The database that i'm having has all the above said properties...

Any other ideas???

Thanks
 
Hi sdmy ,
As you have not implemented the code the way I have written it it is giving you errors.

Your code
Code:
      L_FINAL_STR:=' SELECT L_SQL_TEXT FROM ((SELECT L_SQL_TEXT FROM I.rst
                     MINUS SELECT L_SQL_TEXT FROM I.cst@Link2 
                     UNION (SELECT L_SQL_TEXT FROM I.cst@Link2
                     MINUS SELECT L_SQL_TEXT FROM I.rst));

Should be written as
Code:
L_FINAL_STR:=' SELECT L_SQL_TEXT FROM ((SELECT'||L_SQL_TEXT||' FROM '||I.rst||
                     ' MINUS SELECT '||L_SQL_TEXT||' FROM '||I.cst||'@Link2'|| 
                     ' UNION (SELECT '||L_SQL_TEXT||' FROM '||I.cst||'@Link2'||
                     ' MINUS SELECT '||L_SQL_TEXT||' FROM '||I.rst||'))';

Also do not hardcode Table_name ='rst'

Use the varibale as mentioned in my code else this sql will write only one statement in the file as many times as the number of tables in your DB.

HTH
Regards
Himanshu
 
I think that using db_link to compare tables is quite inefficient. It seems to be even better to spool table data to flat files and compare them using some file comparing utility. But if your database is really huge...
BTW what's the business need to compare 2 huge databases? Do you need to merge their data? Isn't it better to use some kind of timestamp or any other stamp and make incremental comparison? Or add some staging area with intermediate data.

Regards, Dima
 
Thanks for your reply.
I'm getting problems in the following part of the code:
-----------------------------------------------------------
FILEHANDLER UTL_FILE.FILE_TYPE;
L_DIR VARCHAR2(200):= 'c:/sample';
L_SQL_TEXT VARCHAR2(4000):=NULL;
L_FINAL_STR VARCHAR2(4000):=NULL;
BEGIN
L_DIR :='c:/sample';
-------------------------------------
FILEHANDLER:=UTL_FILE.FOPEN(L_DIR||abc.sql,'W');
UTL_FILE.PUT_LINE(FILEHANDLER,L_FINAL_STR);
END LOOP;
UTL_FILE.FCLOSE(FILEHANDLER);
EXCEPTION
WHEN OTHERS THEN
UTL_FILE.FCLOSE(FILEHANDLER);
DBMS_OUTPUT.PUT_LINE('AAAA');
END;
------------------------------------------------------------

Answering to Dima,
The volume of the data is in millions in both the tables with the same source data. I'm migrating from one tool to another tool and i want to check whether the new tool works properly or not for the same source data.

Any suggestions/help.

 
SDMY,

I'm sure that Dima's first question will be, "When you say, 'I'm getting problems in the following part of the code...', what type of problems are you having: syntax errors, run-time errors, or logic errors?" Whatever type of error you are receiving, Dima will want to see a copy-and-paste of whatever error evidence you are receiving.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
@ 20:43 (29Dec04) UTC (aka "GMT" and "Zulu"),
@ 13:43 (29Dec04) Mountain Time
 
Again, do you need to compare the results against the same source data? Or the source data of different sources? Millions is too wide: 2 000 000, 900 000 000? For the first number I still suggest to spool them to flat files and move via network and compare using OS, not Oracle utilities. The reason is that obtainig such number of records via sql*net is quite complex task (better unload data locally, and then move archived) and comparison field-by-field and row-by-row even using optimized Oracle techniques is far more complex than simple one way byte-by-byte file comparison.

Regards, Dima
 
SDMY,
Again there is a syntax error in your code.

Instead of writing
Code:
FILEHANDLER:=UTL_FILE.FOPEN(L_DIR||abc.sql,'W');

You are supposed to write
Code:
FILEHANDLER:=UTL_FILE.FOPEN[b](L_DIR,'abc.sql','W');[/b]
Also please ensure that directory 'c:/sample' exist for the utl_file parmaeter in your DB initialization parameter file & you have read/write access to it.

HTH
Regards
Himanshu
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top