Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Comparing data within the same table

Status
Not open for further replies.

luvmy2Js

IS-IT--Management
Sep 1, 2010
43
US
I have a table that a set of data from 2 sources. The difference is the employeeid has letters in the front when coming from table b:

employee id [Pay Group ID]
WD506197 US01
506197 USA-1

Below is the code I have so far, but I am not sure how to distinguish the tables and if this is going to work:

Select
a.[employee id], a.[Pay Group ID],
'Analysis' = Case when a.[Pay Group ID] = b.[Pay Group ID] then ''
when a.[Pay Group ID] <> b.[Pay Group ID]
then 'MisMatch' end
from whse_AssignEmployeetoPayGroup a
inner join whse_AssignEmployeetoPayGroup b on b.[employee id] = a.[employee id]


I want something like this as the results:

employee id [Pay Group ID]
WD506197 'MisMatch'


Any help is greatly appreciated!
 
Is it always 'WD', or can the 2 letters be different? Is it always 2 letters? Does one table have the 2 letters and the other doesn't, or is there a mixture within the tables?


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
It is always WD and it is a mixture within the same table.

Thanks,
 
I totally missed the fact that you were doing a self join, so it's all the same table.

I suppose the easiest thing would be to replace WD with an empty string for your comparisons. Something like this:

Code:
Select 
a.[employee id], a.[Pay Group ID],
'Analysis' = Case when a.[Pay Group ID] = b.[Pay Group ID] then ''
     when a.[Pay Group ID] <> b.[Pay Group ID]
     then 'MisMatch' end 
from whse_AssignEmployeetoPayGroup a
inner join whse_AssignEmployeetoPayGroup b on [!]Replace([/!]b.[employee id][!], 'WD', '')[/!] = [!]Replace([/!]a.[employee id][!], 'WD', '')[/!]

Using a function like this in the where clause will certainly cause the query to take longer to execute. Ultimately, you would be better suited to fix your data. The way I see it, if all of the rows should be compared to every other row, and the only 2 letter prefix is always WD, and the WD doesn't really mean anything (in terms is distinguishing data), then you don't need it. It would be like adding a 'species' column to your employee table, where every row had the same value of 'human'.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Correct, I need to compare the 'WD1234' value to the value '12345' to see which fields do NOT match.
 
What is the SQL Server version?
Code:
Select 
a.[employee id], a.[Pay Group ID],
'Analysis' = Case when a.[Pay Group ID] = b.[Pay Group ID] then ''
     when a.[Pay Group ID] <> b.[Pay Group ID]
     then 'MisMatch' end 
from whse_AssignEmployeetoPayGroup a
inner join whse_AssignEmployeetoPayGroup b
on a.[Employee ID] = substring(b.[Employee ID],3,len([Employee ID])) and left(b.[Employee ID],2) = 'WD'



PluralSight Learning Library
 
markros,

Each record *that matches* will be joined to itself, but the 'Analysis' column would end up being an empty string. The goal is to see which columns don't match, so I presume these rows would be filtered out anyway.

Without more information it's hard to say. I think your query assumes that there will always be a version of the employee id without the WD, whereas mine does not make this assumption. For example, if there were multiple rows with an employeeid = '12345', your query would not match those rows together to compare the [pay group id], but mine would.

Without having more information (and possibly some sample data), it's difficult to know which version is correct.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Actually, looking at the original desired output, I think we may want to slightly change my query (and simplify). It makes an assumption that there are only 2 records always for an EmployeeID - one from the table a with the 'WD' and one from the table b without 'WD'

Code:
Select 
a.[employee id], a.[Pay Group ID],
'Analysis' = Case when a.[Pay Group ID] = b.[Pay Group ID] then ''
     when a.[Pay Group ID] <> b.[Pay Group ID]
     then 'MisMatch' end 
from whse_AssignEmployeetoPayGroup a
inner join whse_AssignEmployeetoPayGroup b
on a.[Employee ID] = 'WD' + b.[Employee ID] where a.[Employee ID] like 'WD%'

PluralSight Learning Library
 
The data is all in one table.. Here is sample data from the table:

Employee ID PayGroupID
500007 ARG-1
WD500007 AR01
500008 ARG-1
WD500008 AU01
500012 ARG-1
WD500012 AT01

There may be instances where there isn't a corresponding "WD" record or vice versa. The ultimate goal is to compare the records to see which employee has different data because it should be the same although it is coming from from 2 different sources (but it is all stored in a warehouse table)...

Thanks again for all your help with this!!
 
Just for giggles, can you run this query to see if it works ok for you?

Code:
Select	Replace([Employee Id], 'WD', '') As [Employee Id]
From	@Temp
Group By Replace([Employee Id], 'WD', '')
Having Count(Distinct PayGroupId) > 1

If I am not mistaken, this will return all the employees that appear in the table multiple times but with different PayGroupId.

Please let me know.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
It does return everyone in the table who has more than 1 id? What about those that are there with "WD" but not the other id? I need those to show as well. The objective is to find records that do not match and ultimately if there is one and not the other it would be considered a mismatch?
 
Suppose your data looked like this:

[tt][blue]

Employee Id PayGroupID
-------------------- --------------------
500007 ARG-1
WD500007 AR01

500008 ARG-1
WD500008 AU01

500012 ARG-1
WD500012 AT01

500019 Red
WD500019 Red

WD500020 Blue

500021 Green
[/blue][/tt]

Do you want to return the last 2 rows (WD500020 & 50021)? Should the '50019' row be returned?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Based on the data you provided the only rows that shouldn't be returned would be:

500019 Red
WD500019 Red

Because they match; the others don't and those are the ones I need to return.

Thanks a million!!!
 
Last question (I think)...

Should there (normally) be exactly 2 rows in the table for each employee?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
There should, but of course there could be instances where there is only one..
 
How about this:

Code:
Select   Replace([Employee Id], 'WD', '') As [Employee Id], 'Pay Group Mismatch' As Analysis
From     whse_AssignEmployeetoPayGroup
Group By Replace([Employee Id], 'WD', '')
Having   Count(Distinct [Pay Group Id]) > 1

Union All

Select  T.[Employee Id], 'Only one row'
From    whse_AssignEmployeetoPayGroup T
        Inner Join (
          Select Replace([Employee Id], 'WD', '') As EmployeeId
          From   whse_AssignEmployeetoPayGroup
          Group By Replace([Employee Id], 'WD', '')
          Having Count(*) = 1
          ) As A
          On Replace(T.[Employee Id], 'WD', '') = A.EmployeeId

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Try

Code:
Select 
a.[employee id], a.[Pay Group ID],
'Analysis' = Case when a.[Pay Group ID] = b.[Pay Group ID] then ''
     when a.[Pay Group ID] <> b.[Pay Group ID]
     then 'MisMatch' end 
from whse_AssignEmployeetoPayGroup a
inner join whse_AssignEmployeetoPayGroup b
on a.[Employee ID] = 'WD' + b.[Employee ID] where a.[Employee ID] like 'WD%' and a.[Pay Group ID] <> b.[Pay Group ID]

UNION ALL
SELECT
a.[employee id], a.[Pay Group ID],
'Only One Instance (Table A)'
from whse_AssignEmployeetoPayGroup a
where not exists (select 1 from 
whse_AssignEmployeetoPayGroup b
where a.[Employee ID] = 'WD' + b.[Employee ID])
UNION ALL
SELECT
a.[employee id], a.[Pay Group ID],
'Only One Instance (Table b)'
from whse_AssignEmployeetoPayGroup a
where not exists (select 1 from 
whse_AssignEmployeetoPayGroup b
where b.[Employee ID] = 'WD' + a.[Employee ID])


[url=http://www.pluralsight.com/main/olt/?ac=1905603345]PluralSight Learning Library[/url]
 
Code:
Select 
a.[employee id], a.[Pay Group ID],
'Analysis' ='MisMatch'  
from whse_AssignEmployeetoPayGroup a
inner join whse_AssignEmployeetoPayGroup b
on a.[Employee ID] = 'WD' + b.[Employee ID] where a.[Employee ID] like 'WD%' and a.[Pay Group ID] <> b.[Pay Group ID]

UNION ALL
SELECT
a.[employee id], a.[Pay Group ID],
'Only One Instance (Table A)'
from whse_AssignEmployeetoPayGroup a
where not exists (select 1 from 
whse_AssignEmployeetoPayGroup b
where a.[Employee ID] = 'WD' + b.[Employee ID])
UNION ALL
SELECT
a.[employee id], a.[Pay Group ID],
'Only One Instance (Table b)'
from whse_AssignEmployeetoPayGroup a
where not exists (select 1 from 
whse_AssignEmployeetoPayGroup b
where b.[Employee ID] = 'WD' + a.[Employee ID]

PluralSight Learning Library
 
markros,

I built a table variable for testing.

Code:
Declare @Temp Table([Employee Id] VarChar(20), [Pay Group ID] VarChar(20))

Insert Into @Temp Values('500007'  ,'ARG-1')
Insert Into @Temp Values('WD500007','AR01')

Insert Into @Temp Values('500008'  ,'ARG-1')
Insert Into @Temp Values('WD500008','AU01')

Insert Into @Temp Values('500012'  ,'ARG-1')
Insert Into @Temp Values('WD500012','AT01')

Insert Into @Temp Values('500019'  ,'Red')
Insert Into @Temp Values('WD500019','Red')

Insert Into @Temp Values('WD500020','Blue')

Insert Into @Temp Values('500021','Green')

I am getting an error with your code.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top