Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Compare Two Tables

Status
Not open for further replies.

mrbti

Technical User
Feb 28, 2011
5
GB
I have this sql query that compares two tables (t1 to t2) and creates a new table with anything that doesn't exist in the 'site' colum of t1:


CREATE TABLE newtable
SELECT t1.site, t1.rank
FROM t1
LEFT OUTER
JOIN t2
ON t2.site = t1.site
WHERE t2.site IS NULL


This works fine but I need the sql query to compare 'site' fields in t1 and t2 and recognise 'google.com/users/test.html' or 'google.com' or ' to be the same domain therefore not outputting it to the new table.

Perhaps with regexp? Not sure how to go about this though.

Any help would be appreciated!
 
Auctually that was round the wrong way. Correct SQL query is:

SELECT t2.site, t2.rank
FROM t2
LEFT OUTER
JOIN t1
ON t1.site = t2.site
WHERE t1.site IS NULL
 
Hi

Will this query be executed frequently ? Will the table content change frequently ? I would think to introduce a separate column for the stripped down domain name.

The MySQL documentation String Functions page's User Contributions section there is a sub_domain() function posted by Balaji Devarajan. I would try that first.

Feherke.
 
Thanks for replt but I'm using LOAD DATA LOCAL INFILE to load data into a table from a .csv file.

Not sure if you can alter the data as it goes in with this.

The csv file has 1,000,000 rows.
 
Hi

mrbti said:
Not sure if you can alter the data as it goes in with this.
There is no need for that, you can add it in the next step. Given the amount of data, an index would also be helpful :
Code:
[b]load[/b] [b]data[/b] [b]infile[/b] [green][i]'whatever.csv'[/i][/green] [b]into[/b] [b]table[/b] t2[teal];[/teal]

[b]alter[/b] [b]table[/b] t2 [b]add[/b] [b]column[/b] domain [maroon]varchar[/maroon][teal]([/teal][purple]100[/purple][teal]);[/teal]

[b]create[/b] [b]index[/b] t2_domain_idx [b]on[/b] t2 (domain)[teal];[/teal]

[b]update[/b] t2 [b]set[/b] domain[teal]=[/teal]sub_domain[teal]([/teal]site[teal]);[/teal]

[b]create[/b] [b]table[/b] newtable [b]as[/b]
[b]select[/b] t2[teal].[/teal]site[teal],[/teal] t2[teal].[/teal]rank
[b]from[/b] t2
[b]left[/b] [b]join[/b] t1 [b]using[/b] [teal]([/teal]domain[teal])[/teal]
[b]where[/b] t1[teal].[/teal]site [b]is[/b] [b]null[/b][teal];[/teal]
I suppose t2 is the table with new data and t1 is the table with already processed data. t1 should have a similar domain column but it would be filled in another moment of processing.

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top