Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Selecting Duplicating records 1

Status
Not open for further replies.

greatwhite1

Programmer
Nov 13, 2019
4
CA
Hi new to this site, hoping someone might be able to help me with this SQL question.

So I have a table called T_EE_EVALUATIONS that has duplicate records within in it. The table has the following structure below. I need to find the duplicate address where the house_id is not the same as it's duplicate address. Neither ID's are P keys. I tried this but it doesn't really work the way I want it to work.

Select E.eval_id , E.house_id , E.ClientCity, E.ClientAddr, E.Houseregion, T.eval_id , T.house_id, T.ClientCity , T.ClientAddr , T.Houseregion
From T_EE_EVALUATIONS E INNER JOIN
(Select eval_id , house_id,ClientCity, ClientAddr, Houseregion, COUNT(*) AS CountOf
FROM T_EE_EVALUATIONS GROUP BY eval_id , house_id,ClientCity, ClientAddr, Houseregion-- HAVING COUNT(*)>1
) T
ON E.ClientCity || E.ClientAddr || E.Houseregion = T.ClientCity || T.ClientAddr || T.Houseregion



eval_id , house_id , ClientCity, ClientAddr, Houseregion
12345 56455 Quebec 34 Anyplace Quebec
12345 23765 Quebec 34 Anyplace Quebec
12345 94213 Quebec 34 Anyplace Quebec
13456 23456 Russell 214 mystreet Ontario
13456 33456 Russell 214 mystreet Ontario
14526 14567 Quispamsis 456 Thatstreet New Brunswick
14526 13567 Quispamsis 456 Thatstreet New Brunswick
14526 16789 Quispamsis 456 Thatstreet New Brunswick
19534 19432 Halifax 23 Apple Street Nova Scotia
19534 19432 Halifax 23 Apple Street Nova Scotia
 
Code:
SELECT *
    FROM T_ee_evaluations
   WHERE (Eval_id,
          Clientcity,
          Clientaddr,
          Houseregion) IN (  SELECT Eval_id,
                                    Clientcity,
                                    Clientaddr,
                                    Houseregion
                               FROM T_ee_evaluations
                           GROUP BY Eval_id,
                                    Clientcity,
                                    Clientaddr,
                                    Houseregion
                             HAVING COUNT (*) > 1)
ORDER BY Eval_id,
         Clientcity,
         Clientaddr,
         Houseregion;

Bill
Lead Application Developer
New York State, USA
 
I wasn't able to run your script above because I keep running out of temp space and our DBA refuses to increase it.

However I did some research and can up with this using a Join it seems to work but I am just not sure as I was expecting a larger set of records, although I don't know for sure.


Select T1.eval_id, T1.house_id, T1.Clientcity || T1.Clientaddr || T1.Houseregion, T2.eval_id, T2.house_id, T2.Clientcity || T2.Clientaddr || T2.Houseregion

From T_ee_evaluations T1 join T_ee_evaluations T2

On T1.Clientcity || T1.Clientaddr || T1.Houseregion = T2.Clientcity || T2.Clientaddr || T2.Houseregion AND T1.house_id <> T2.house_id
AND T1.eval_id = T2.eval_id

order by T1.Clientcity || T1.Clientaddr || T1.Houseregion, T1.eval_id
 
Try the following. It should be much easier on your temp space

Code:
WITH
    Find_rows
    AS
        (  SELECT /*+ materialize */ 
                  Eval_id,
                  Clientcity,
                  Clientaddr,
                  Houseregion
             FROM T_ee_evaluations
         GROUP BY Eval_id,
                  Clientcity,
                  Clientaddr,
                  Houseregion
           HAVING COUNT (*) > 1)
  SELECT B.*
    FROM T_ee_evaluations B, Find_rows A
   WHERE     A.Eval_id = B.Eval_id
         AND A.Clientcity = B.Clientcity
         AND A.Clientaddr = B.Clientaddr
         AND A.Houseregion = B.Houseregion
ORDER BY B.Eval_id,
         B.Clientcity,
         B.Clientaddr,
         B.Houseregion;

Bill
Lead Application Developer
New York State, USA
 
Thanks Bill but I still can't run it.

I have never used a statement like this going to read a bit about it.

ORA-01652: unable to extend temp segment by 128 in tablespace TEMP_1
01652. 00000 - "unable to extend temp segment by %s in tablespace %s"
*Cause: Failed to allocate an extent of the required number of blocks for
a temporary segment in the tablespace indicated.
*Action: Use ALTER TABLESPACE ADD DATAFILE statement to add one or more
files to the tablespace indicated.
 
To be able to help we need some information on your current indexes, please run the following script and past the results into this thread

SELECT A.Table_owner,
Owner Index_owner,
A.Index_name,
Uniqueness,
Status,
LISTAGG (B.Column_name, ',') WITHIN GROUP (ORDER BY Column_position) Columns
FROM All_indexes A
INNER JOIN All_ind_columns B
ON (A.Index_name = B.Index_name) AND (Owner = Index_owner)
WHERE A.Table_name = 'T_EE_EVALUATIONS'
GROUP BY Owner,
A.Table_owner,
A.Index_name,
Uniqueness,
Status;

By the way your DBA is there is support you, your not there to support the DBA. If you are unable to do such a simple query your DBA either doesn't have adequate indexes on the table or their temp table space is extremely undersized. For example at my shop we have allocated 120 gig just for temp space because we have to support a retail operation of over 100 locations. Storage is cheap

Bill
Lead Application Developer
New York State, USA
 
I asked if they would increase the temp space and the reply I received is it was always enough in the past and will will need to improve my query.
 
It's sad that your DBA is not knowledgeable enough to understand that when you are crunching large data sets you need the temp space to do the work. For your DBA to say that because it was enough in the past you will never need any additional space is a sign of a DBA who hasn't been working as a DBA very long. Do you expand your temp space every single time someone runs out, of course not but to say they will NEVER increase the space no matter what the need is is not a good sign. Send me the results of the index query for the table and I might be able to figure out a way to do it. Building a standalone table for the sub select and then using the table will not work either because that's what the " /*+ materialize */ " hint does and it still blew out.

You might want to go to your boss or the DBA's boss and bring your problem up to them. They may decide that they don't the data from the query or they may inform the DBA to increase the temp space. By the way, assuming the space is available on your storage array/drives the command to expand the temp tablespace takes about 30 seconds to write and run.

Good Luck

Bill
Lead Application Developer
New York State, USA
 
Is eval_id part of the house address? If not, remove it from your subquery; it may cut down the sorting required:
Code:
... IN
(SELECT Clientcity,
       Clientaddr,
       Houseregion
  FROM T_ee_evaluations
 GROUP BY Clientcity,
          Clientaddr,
          Houseregion
  HAVING COUNT (*) > 1) ...
If you need to keep eval_id in the search for duplicates and there is no way to gain more temp space (or dropping eval_id doesn't help), you may need to create a temporary auxiliary table:
Code:
CREATE TABLE temp_table AS (SELECT 
                  Eval_id,
                  Clientcity,
                  Clientaddr,
                  Houseregion
             FROM T_ee_evaluations
         GROUP BY Eval_id,
                  Clientcity,
                  Clientaddr,
                  Houseregion
           HAVING COUNT (*) > 1)
Then you can just pick out the records from your main table that correspond to the houses in your temp table. Once that's done, you can drop the temp table. It's klugey, but it appears you are caught between a rock and an inertia-rich DBA.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top