Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Percentage of similar records in Table.

Status
Not open for further replies.

ded

Programmer
Nov 3, 2000
41
GB
I have been wracking my brains trying to solve this problem but I have come to the conclusion that it possibly cannot be done in SQL. I have a table which has three fields:
AutoID INTEGER
MID varchar(50)
CategoryID INTEGER

Example Data Could be
1,DD45, 1
2,DD45, 3
3,DD45, 6
4,DD45, 8
5,RT56, 3
6,RT56, 8
7,UJ78, 1

MID is effectively a userid and CategoryID contains a id of a category to which the user has subscribed. The problem is that i am trying to write a SELECT statement which when given a MID will return a list of records of all other users in the table and the percentage(or number) of categories the users have in common with given MID. So for the example data above if MID DD45 was given the following records would be returned.

RT56, 50%
UJ78, 25%

Any ideas?
 
--a derived table will do this - following is syntax for SQL Server - what are you using?

create table #temp(
AutoID INT not null,
MID varchar(10) null,
CategoryID INT null)
INSERT into #temp values (1,'DD45', 1)
INSERT into #temp values (2,'DD45', 3)
INSERT into #temp values (3,'DD45', 6)
INSERT into #temp values (4,'DD45', 8)
INSERT into #temp values (5,'RT56', 3)
INSERT into #temp values (6,'RT56', 8)
INSERT into #temp values (7,'UJ78', 1)

SELECT MID, Count(CategoryID)/T1.AllCatCount
FROM #temp
, (SELECT CAST(Count(DISTINCT CategoryID) AS REAL) AS AllCatCount FROM #temp) AS T1
WHERE MID = 'UJ78'
GROUP BY MID, T1.AllCatCount
 
cheers for that. I am afraid that it doesn't solve the problem fully. Perhaps I didn't explain clearly what the problem is. As far as I can see, the solution which you proposed will only ever return 1 record because it is limited by the WHERE clause. What I need are records returned for each MID in the DB with the percentage of the categoryIDs for that MID which are the same as the supplied MID. Using some code which I got to return the number of similar ones.

Code:
SELECT mi1.MID, COUNT(mi1.CategoryID) As Similar
FROM MemberInterests mi1
WHERE mi1.CategoryID IN (SELECT CategoryID FROM MemberInterests WHERE MID='DD032900145704')
GROUP BY mi1.MID

but not the total of categoryIDs for each MID so therefore no percentage. I used the code you gave and mixed the two to get this

Code:
SELECT mi1.MID, (Count(mi1.CategoryID)/T1.AllCatCount*100) As Similar
FROM MemberInterests AS mi1, (SELECT CAST(Count(DISTINCT CategoryID) AS REAL) AS AllCatCount FROM MemberInterests WHERE MID=mi1.MID) AS T1
WHERE mi1.CategoryID IN (SELECT CategoryID FROM MemberInterests WHERE MID='DD032900145704')
GROUP BY mi1.MID, T1.AllCatCount

but unfortunately it doesn't work. I am unsure as to where the bug is but I think it has something to do with WHERE clause in the SELECT CAST bit.

Thanks for your help.
 
You can use != (Not equal)

For MID DD45 use

SELECT ....
FROM ....
WHERE != DD45

This will list all MIDs' except DD45.



Ged Jones

Top man
 
Changing the equal to not equal - that has a subtle but profound impact. Good one, Ged.
Regarding the CAST issue, the only reason the CAST exists in this is that you are dividing an integer by an integer, and the result will be an integer. Which is not always the commutative result that you want. If you want to get around using CAST, then just multiply the numerator by 100 first. Your result will be in percentage points ie 75 instead of .75.
 
I think I see what the problem is now. It is the fact that the SELECT CAST clause selects the total number of occurances of a category instead of the total number of categories for a particular user. In my first example, DD45 would have 4, RT56 2 and UJ78 1. I have also realised my example data was bad because for each user 100% of their categories existed within DD45s categories. If we had another category to RT56 ie 99 this would make his percentage 67%.

So in pseudocode the problem is

display each user who is not DD45 and the percentage of their categories which they have in common with DD45.

The percentage being the number of the categories for the user which exist in the list of categories of DD45 divided by the total number of categories for the user.

Thanks for all your help so far.

 
OK - I think I understand - I changed the data slightly.
I'm not convinced this is the best solution (it strikes me as very inefficient, crude, etc) but seems to work. But is this what you had in mind?

create table #temp(
AutoID INT not null,
MID varchar(10) null,
CategoryID INT null)
INSERT into #temp values (1,'DD45', 1)
INSERT into #temp values (2,'DD45', 3)
INSERT into #temp values (3,'DD45', 6)
INSERT into #temp values (4,'DD45', 8)
INSERT into #temp values (5,'RT56', 3)
INSERT into #temp values (6,'RT56', 8)
INSERT into #temp values (7,'RT56', 99)
INSERT into #temp values (8,'UJ78', 1)

SELECT #temp.MID
, T2.CatCount
, SUM(CASE WHEN CategoryID = T1.CommonCat THEN 1 ELSE 0 END) AS CommonCat
, 1.0 * SUM(CASE WHEN CategoryID = T1.CommonCat THEN 1 ELSE 0 END)/T2.CatCount AS Percentage
FROM #temp
INNER JOIN (SELECT MID, Count(DISTINCT CategoryID) AS CatCount FROM #temp GROUP BY MID) AS T2
ON #temp.MID = T2.MID
, (SELECT DISTINCT CategoryID AS CommonCat FROM #temp WHERE MID = 'DD45') AS T1
WHERE #temp.MID <> 'DD45'
GROUP BY #temp.MID, T2.CatCount

produces

MID CatCount CommonCat Percentage
--------- ----------- ---------- --------------------------
RT56 3 2 .666666666666
UJ78 1 1 1.000000000000
Malcolm
wynden@telus.net
November is &quot;be kind to dogs and programmers&quot; month. Or is that &quot;dogs or programmers&quot;?
 
Thank you very much for your help. That works perfectly. I will make it a stored procedure to speed it up and then that should be fine. If I find time I will sit down and try and work out how it works and maybe try and optimise it. Once again thanks for your help.

Duncan
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top