Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

question with duplicates 3

Status
Not open for further replies.

daddypost

Technical User
Oct 21, 2004
98
US
Okay, i've got a table that has duplicate rows(except one column). In other words, rows will have the same ID#(cookie), same visit_date, but different entry page.

ie.

765-432 06-07-2007 affiliate
765-432 06-07-2007 search.

I'm trying to find a way to combine these rows and make the third field a concatenation of the two entry pages,

765-432 06-07-2007 affiliate/search.


How in the world would I be able to do that? The reason I need to do this is I'm tying the iD# to the ID# associated with orders, so when I'm totaling everything up, I'm getting double what I should, but at the same time, I can't just ignore any key where there are dups, because this probably accounts for 50% of the records? Any help would be extremely beneficial. Thanks.
 
First, Daddy, if your bathtub is overflowing, turn off the water...if you don't want duplicate rows, ensure that whatever is creating them, instead looks for existing rows with the same identifying columns, then does an UPDATE to the existing row's ENTRY_PAGE column, instead of doing an entirely new INSERT.

Once you know that no new duplicates are posting to your table, you can set about resolving the "duplicates" problem with a one-time remedy script. (If you can indulge me for a bit of time this afternoon, I'll post a sample of a one-time remedy script after I resolve a pressing issue for my regular job.[smile])

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
unfortunately, this is data that is coming over from Omniture, which breaks it down this way. There is no way around the data issues. That is why I need to figure out a way to sort them out and combine mostly duplicated rows. Thanks for your help.
 
Daddy, Although the data may come across this way from Omniture, there is no law that says that you must live with it in that form. Must you retain their data feed in their form for some business/technical reason, or can you not post their feed data to a table that behaves the way you want it to behave?

Daddy said:
...when I'm totaling everything up, I'm getting double what I should
Can you not just fix your "totaling" query? What is the query you are using now?

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
here is just two examples of what I'm dealing with
1 5/1/2007 5059447697749064422-720588443029085730 Feeds
2 5/1/2007 5059447697749064422-720588443029085730 Search
3 5/1/2007 5059454251869154762-720589177468485635 Affiliate
4 5/1/2007 5059454251869154762-720589177468485635 Email

everytime I join the long key value to the same value in my purchase table(which shows $ purchased by key value) then I get two values, one for each record. So when I total, I'm getting double the amount. I can't just delete one record where there are duplicated key values, I need to combine them to show just one record with a combination of what channel the key came from (ie. feeds/search, affiliate/email) I know I don't have to keep the info in the way it was sent, however, it contains what I need. I just have to manipulate it to do what I want, and unfortunately I don't know how to do that.
 
See if this works for you, substitute your own table , column names etc.. If it does you can then join this with your ppurchase table and your duplicated totals problem will also be gone


select
big_key,
max(sys_connect_by_path(feed_type, ' ' )) ft
from (
select big_key,
feed_type,
row_number() over partition by big_key
order by feed_type) rn
from your_table
)
start with rn = 1 connect by prior rn = rn-1
and prior big_key = big_key
group by big_key
order by deptno
 
This is what I typed in:
select
visitor_id,
max(sys_connect_by_path(channel, ' ' )) ft
from (
select visitor_id,
channel,
row_number() over partition by visitor_id
order by visitor_id) rn
from nc_first_touch
)
start with rn = 1 connect by prior rn = rn-1
and prior visitor_id = visitor_id
group by visitor_id

gave me the following error:
ORA-30484; missing window specification for this function.????????
Thanks for all the help everyone.
 
Daddy,

What I recommend, then, is your creating a view that organises your feed table in the way you want. You can then use that view to join to your purchases table to give you the correct dollar amounts.

I'm posting an example of that solution, below, using your sample data from above. (Note: the "brains" of the code below [the "matrix" function) appears in the last section, but in real life, you should compile the "matrix" function prior to attempting to create your view.)

Section 1 -- Sample data:
Code:
select * from feed;

ID DT         KEY                                      CHANNEL
-- ---------- ---------------------------------------- --------
 1 5/1/2007   5059447697749064422-720588443029085730   Feeds
 2 5/1/2007   5059447697749064422-720588443029085730   Search
 3 5/1/2007   5059454251869154762-720589177468485635   Affiliate
 4 5/1/2007   5059454251869154762-720589177468485635   Email
Section 2 -- Creation of "FEED_VIEW" and query of that view:
Code:
create or replace view feed_view as
select dt,key
      ,substr(matrix('select channel from feed where dt = '''||dt
                     ||''' and key = '''||key||''''),1,60) keys
  from feed
 group by dt,key
/

View created.

select * from feed_view;

DT         KEY                                      KEYS
---------- ---------------------------------------- ---------------
5/1/2007   5059447697749064422-720588443029085730   Feeds,Search
5/1/2007   5059454251869154762-720589177468485635   Affiliate,Email
*******************************************************************
Section 3 -- The "matrix" function:
Code:
CREATE OR REPLACE FUNCTION matrix (query_in in VARCHAR2) RETURN VARCHAR2 IS
    incoming    varchar2(4000);
    hold_result varchar2(4000);
    c sys_refcursor;
Begin
    open c for query_in;
    loop
        fetch c into incoming;
        exit when c%notfound;
        hold_result := hold_result||','||incoming;
    end loop;
    return ltrim(hold_result,',');
END;
/

Function created.
The "matrix" function represents some pretty advanced SQL principles, so let us know of questions you have.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
Oops, missing opening bracket before the
partition key word

select
visitor_id,
max(sys_connect_by_path(channel, ' ' )) ft
from (
select visitor_id,
channel,
row_number() over (partition by visitor_id
order by visitor_id) rn
from nc_first_touch
)
start with rn = 1 connect by prior rn = rn-1
and prior visitor_id = visitor_id
group by visitor_id
 
SantaMufasa;
That matrix really did the trick. If you have time sometime in the near future, could you give me a short explanation of how that works. specifically :
exit when c%notfound;
hold_result := hold_result||','||incoming;

I get that c is the cursor but I'm confused as how the query itself grabs both channels and combines them. Anyway, thank you so much for your help. It worked wonders and saved me a ton of time.
 
Daddy said:
...could you give me a short explanation of how that works?
People say that I don't do anything "short". <grin>


But I'll try to be brief, yet clear, in my explanation of how matrix works. Let's analyse each procedural statement in the function. Here is a restatement of the code, followed by statement-by-statement explanations:
Code:
CREATE OR REPLACE FUNCTION matrix (query_in in VARCHAR2) RETURN VARCHAR2 IS
    incoming    varchar2(4000);
    hold_result varchar2(4000);
    c sys_refcursor;
Begin
    open c for query_in;
    loop
        fetch c into incoming;
        exit when c%notfound;
        hold_result := hold_result||','||incoming;
    end loop;
    return ltrim(hold_result,',');
END;

1) open c for query_in; -- Populate a cursor named "c" with the results of a query. The query to use to populate the cursor is the query we pass into the function using the argument "query_in". (Remember, we re-execute this function for each row that results from the GROUP BY statement in the calling (outer) query. So, each row in the outer query calls this "matrix" function. The query we pass to the "matrix" function gets recoded for each calling row. Using your data, there are two grouped rows (since we GROUP BY dt,key). The first row calls "matrix", passing this query into the function:
Code:
select channel from feed
 where dt  = '5/1/2007'
   and key = '5059447697749064422-720588443029085730'
If you were to execute the above first query by hand, the results would be:
Code:
Feeds
Search
So, those two "rows" of data would reside in the cursor, "c".

2) loop...end loop; -- Execute the intervening commands until something gets you out of the loop.

3) fetch c into incoming; -- copy the first "row" of data from "c" ("Feeds") into the 4000-byte-maximum memory variable, "incoming".

4) exit when c%notfound; -- Check the current value of the automatically created, boolean (TRUE/FALSE/NULL) cursor variable, "notfound". If the "fetch" command successfully retrieved a row of data from the cursor "c", then the value of "c%notfound" is false; eventually (when we loop enough times to have read all of the row data in "c"), Oracle will "not find" data, so the value of "c%notfound" becomes TRUE. When "c%notfound" becomes TRUE, then execution passes to the command following the "END LOOP" statement.

5) hold_result := hold_result||','||incoming; -- "hold_result" starts out NULL (since we did not initialise "hold_result" to any value). The first time through, the above command stores:
Code:
,Feeds
...in "hold_result".

6) (reaches the "END LOOP" and goes back to the top of the LOOP, and executes at Step 3. At Step 5, the contents of "hold_result" become:
Code:
,Feeds,Search
The next time through the LOOP, Oracle finds no data and exits the LOOP to...

7) return ltrim(hold_result,','); -- Oracle "left trims" any commas from the beginning of "hold_result", yielding "Feeds,Search". This value RETURNs to the calling program.

Let me know if questions remain.


[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
Santa,

You never cease to amaze me.
star.gif
from me too!

Fee

The question should be [red]Is it worth trying to do?[/red] not [blue] Can it be done?[/blue]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top