Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations dencom on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Query to create tag cloud

Status
Not open for further replies.

HestonJames

Programmer
Aug 11, 2008
187
GB
Hello Guys,

Looking for a little help forming this query as the concept is currently confusing me. The end goal is to pull data which I can use to create a 'tag' or 'keyword' cloud.

I have three tables which look like this:

Media
------
Media_ID


Media_Lnk_MediaTag
------------------
Media_ID
MediaTag_ID


MediaTag
--------
MediaTag_ID
TagText


What I want to do it pull all the MediaTag's for a given Media_ID record along with a calculated percentage popularity for the MediaTag against any other Media_IDs is has relationships with.

Does that make sense? So for instance if the count of relationships to other media looks something like this:

TagText | Count_Of_Relationships_To_Other_Media
---------------------------------------------
Hello | 5
Goodbye | 5
Hello Again | 10

Then it would return me a record set like this to me:

TagText | Percentage_Of_Relationships_To_Other_Media
---------------------------------------------
Hello | 25%
Goodbye| 25%
Hello Again | 50%

I'm sorry if that's not a particularly good explanation, if you have any questions then please shoot away and I'll try my best to answer them.

Cheers guys, really appreciate this.

Heston
 
Something like:
Code:
SELECT Tbl1.Media_ID,
       MediaTag.TagText,
       Tb1l.Cnt*1.0/Tbl2.Cnt AS Percentage
FROM MediaTag
INNER JOIN (SELECT Media_ID, MediaTag_ID, COUNT(*) AS Cnt
                   FROM Media_Lnk_MediaTag
            GROUP BY Media_ID, MediaTag) Tbl1
      ON MediaTag.MediaTag_ID = Tbl1.MediaTag_ID
INNER JOIN (SELECT Media_ID, COUNT(*) AS Cnt
                   FROM Media_Lnk_MediaTag
            GROUP BY Media_ID) Tbl2
      ON Tbl1.Media_ID = Tbl2.Media_ID
NOT TESTED!

Borislav Borissov
VFP9 SP2, SQL Server 2000/2005.
 
Hi Borislav,

Thanks a great deal for the suggestion but I got a little confused when trying to work with it and couldn't get it to run. :-(

I've managed to get a query half way myself, this queries the tags for a particular media and gives a count for each one:

Code:
Select	MediaTag.Text,
		Count(OtherMessages.Media_ID)
From	Media_Lnk_MediaTag
Inner Join	MediaTag 
			On Media_Lnk_MediaTag.MediaTag_ID = MediaTag.MediaTag_ID
Left Join (
			Select	Media_ID,
					MediaTag_ID
			From	Media_Lnk_MediaTag
			) As OtherMessages
			On	MediaTag.MediaTag_ID = OtherMessages.MediaTag_ID
Where	Media_Lnk_MediaTag.Media_ID = 1248
Group By MediaTag.Text

I'm stuck on trying to figure out how to get it to calculate the percentage for me though, at the moment this returned the recordset which looks like:

TagText | Count_Of_Relationships_To_Other_Media
---------------------------------------------
Hello | 5
Goodbye | 5
Hello Again | 10

Any suggestions on how to expand upon this to get it giving the percentages too?

Thanks Borislav.

Heston
 
Ok, if This give you the right result, then you need the total count of TAGs for that Media:
Code:
Select MediaTag.Text,
       Count(OtherMessages.Media_ID)*100.0/AllMessages.Cnt AS Prc
From    Media_Lnk_MediaTag
Inner Join MediaTag
            ON Media_Lnk_MediaTag.MediaTag_ID = 
               MediaTag.MediaTag_ID
INNER Join (
            Select Media_ID,
                   MediaTag_ID
            From Media_Lnk_MediaTag) As OtherMessages
            On MediaTag.MediaTag_ID = 
               OtherMessages.MediaTag_ID
INNER Join (
            Select Media_ID,COUNT(*) AS Cnt
            From Media_Lnk_MediaTag
            Where Media_Lnk_MediaTag.Media_ID = 1248) As AllMessages
            On MediaTag.Media_ID = AllMessages.Media_ID
Where Media_Lnk_MediaTag.Media_ID = 1248
Group By MediaTag.Text
Again this is NOT tested.

Borislav Borissov
VFP9 SP2, SQL Server 2000/2005.
 
Hey Borislav,

Thanks for that, I've run the code, with a minor change:

Code:
Select	MediaTag.Text,
		Count(OtherMessages.Media_ID) * 100.0 / AllMessages.Cnt AS Prc
From	Media_Lnk_MediaTag
Inner Join	MediaTag 
			On Media_Lnk_MediaTag.MediaTag_ID = MediaTag.MediaTag_ID
Left Join (
			Select	Media_ID,
					MediaTag_ID
			From	Media_Lnk_MediaTag
			) As OtherMessages
			On	MediaTag.MediaTag_ID = OtherMessages.MediaTag_ID
Inner Join (
            Select	Media_ID,
					COUNT(*) AS Cnt
            From	Media_Lnk_MediaTag
            Where	Media_ID = 1248
			Group By	Media_ID
			) As AllMessages
            On Media_Lnk_MediaTag.Media_ID = AllMessages.Media_ID
Where	Media_Lnk_MediaTag.Media_ID = 1248
Group By	MediaTag.Text,
			AllMessages.Cnt

and it returns a result set but one that doesnt look like I expect it to:

newtag1 | 100.000000000000
newtag2 | 33.333333333333
newtag3 | 66.666666666666

We've got a total of almost 200% :-s :-D

I feel we're getting closer but we're not quite there.

Heston
 
:)
Could you please post some example data from all tables?

Borislav Borissov
VFP9 SP2, SQL Server 2000/2005.
 
Hi,

I haven't looked in detail at the code, but why 'OtherMessages'? Shouldn't you be calculating the fraction ThisMessage (MediaTagId and MediaId) as a percentage of AllMessages (just MediaId)?

Puzzledly,
Simon.
 
Simon, I think you may be right :)

For you both, here is a table variable setup which represents the data in the first post:

Code:
DECLARE @MediaTag TABLE (
	MediaTag_ID int IDENTITY(1,1) PRIMARY KEY,
	Text varchar(50) Collate Database_Default
)

INSERT INTO @MediaTag (Text)
VALUES		('hello')
INSERT INTO @MediaTag (Text)
VALUES		('goobye')
INSERT INTO @MediaTag (Text)
VALUES		('hello again')

DECLARE @Media_Lnk_MediaTag TABLE (
	MediaTag_ID int,
	Media_ID int
)

INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(1, 1)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(2, 1)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 1)

INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(1, 2)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(1, 3)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(1, 4)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(1, 5)

INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(2, 2)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(2, 3)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(2, 4)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(2, 5)

INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 2)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 3)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 4)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 5)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 6)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 7)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 8)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 9)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 10)

Select	MediaTag.Text,
		OtherMessages.Counter As Counter
From	@Media_Lnk_MediaTag As Media_Lnk_MediaTag
Inner Join	@MediaTag As MediaTag
			On Media_Lnk_MediaTag.MediaTag_ID = MediaTag.MediaTag_ID
Left Join (
			Select	Count(Media_ID) As Counter,
					MediaTag_ID
			From	@Media_Lnk_MediaTag As Media_Lnk_MediaTag
			Group By	MediaTag_ID
			) As OtherMessages
			On	MediaTag.MediaTag_ID = OtherMessages.MediaTag_ID
Where	Media_Lnk_MediaTag.Media_ID = 1
Group By	MediaTag.Text,
			OtherMessages.Counter

See if that helps at all :-D

Thanks,

Heston
 
But then you don't care about from WHAT media you got TAGs.
Here:
Code:
DECLARE @MediaTag TABLE (
    MediaTag_ID int IDENTITY(1,1) PRIMARY KEY,
    Text varchar(50)
)

INSERT INTO @MediaTag (Text) VALUES        ('hello')
INSERT INTO @MediaTag (Text) VALUES        ('goobye')
INSERT INTO @MediaTag (Text) VALUES        ('hello again')

DECLARE @Media_Lnk_MediaTag TABLE (
    MediaTag_ID int,
    Media_ID int
)

INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (1, 1)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (2, 1)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (3, 1)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (1, 2)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (1, 3)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (1, 4)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (1, 5)

INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (2, 2)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (2, 3)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (2, 4)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (2, 5)

INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (3, 2)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (3, 3)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (3, 4)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (3, 5)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (3, 6)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (3, 7)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (3, 8)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (3, 9)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID) VALUES        (3, 10)


SELECT * FROM @Media_Lnk_MediaTag Where Media_ID = 1

Select MediaTag.Text,
       OtherMessages.Counter As OtherMsgCnt,
       AllMessages.Counter As AllMsgCnt,
       OtherMessages.Counter*100.0/AllMessages.Counter AS Prc
From   @Media_Lnk_MediaTag As Media_Lnk_MediaTag
Inner Join @MediaTag As MediaTag  On Media_Lnk_MediaTag.MediaTag_ID = MediaTag.MediaTag_ID
INNER JOIN ( Select Count(Media_ID) As Counter,
                    MediaTag_ID
                    From @Media_Lnk_MediaTag As Media_Lnk_MediaTag
            Group By MediaTag_ID ) As OtherMessages
      ON MediaTag.MediaTag_ID = OtherMessages.MediaTag_ID

INNER JOIN (Select Count(Media_ID) As Counter
                    From @Media_Lnk_MediaTag As Media_Lnk_MediaTag
            ) As AllMessages
      ON MediaTag.MediaTag_ID = OtherMessages.MediaTag_ID

Where Media_Lnk_MediaTag.Media_ID = 1
But I don't get it. You want only for Media_ID = 1, but you count ALL, not just for that Media_ID.


Borislav Borissov
VFP9 SP2, SQL Server 2000/2005.
 
o.k. I've made a quick amendment to the test data as the current query has a few problems.

Code:
DECLARE @MediaTag TABLE (
	MediaTag_ID int IDENTITY(1,1) PRIMARY KEY,
	Text varchar(50) Collate Database_Default
)

INSERT INTO @MediaTag (Text)
VALUES		('hello')
INSERT INTO @MediaTag (Text)
VALUES		('goobye')
INSERT INTO @MediaTag (Text)
VALUES		('hello again')
[b]
INSERT INTO @MediaTag (Text)
VALUES		('some more')
INSERT INTO @MediaTag (Text)
VALUES		('some more')
[/b]
DECLARE @Media_Lnk_MediaTag TABLE (
	MediaTag_ID int,
	Media_ID int
)
[b]
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(4, 2)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(5, 2)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(4, 3)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(5, 3)
[/b]

INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(1, 1)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(2, 1)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 1)

INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(1, 2)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(1, 3)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(1, 4)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(1, 5)

INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(2, 2)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(2, 3)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(2, 4)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(2, 5)

INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 2)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 3)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 4)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 5)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 6)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 7)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 8)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 9)
INSERT INTO @Media_Lnk_MediaTag (MediaTag_ID, Media_ID)
VALUES		(3, 10)

Select MediaTag.Text,
       OtherMessages.Counter As OtherMsgCnt,
       AllMessages.Counter As AllMsgCnt,
       OtherMessages.Counter*100.0/AllMessages.Counter AS Prc
From   @Media_Lnk_MediaTag As Media_Lnk_MediaTag
Inner Join @MediaTag As MediaTag  On Media_Lnk_MediaTag.MediaTag_ID = MediaTag.MediaTag_ID
INNER JOIN ( Select Count(Media_ID) As Counter,
                    MediaTag_ID
                    From @Media_Lnk_MediaTag As Media_Lnk_MediaTag
			Group By MediaTag_ID ) As OtherMessages
      ON MediaTag.MediaTag_ID = OtherMessages.MediaTag_ID

INNER JOIN (Select Count(Media_ID) As Counter
                    From @Media_Lnk_MediaTag As Media_Lnk_MediaTag
            ) As AllMessages
      ON MediaTag.MediaTag_ID = OtherMessages.MediaTag_ID
Where Media_Lnk_MediaTag.Media_ID = 1

Here I've added a couple more tags to the data and assigned them to other Media records, not that with ID of 1. However this effects the overall results but they should stay the same. The percentage should be calculated from the sum of the counts returned in my original query.

So, when we have:

TagText | Count_Of_Relationships_To_Other_Media
---------------------------------------------
Hello | 5
Goodbye | 5
Hello Again | 10

Percentages should be calculated like:

TagText | Count_Of_Relationships_To_Other_Media
---------------------------------------------
Hello | 5 / 20 * 100
Goodbye | 5 / 20 * 100
Hello Again | 10 / 20 * 100

Tags which are not assigned to this media record should not be taken into account for the percentage, only those tags assigned to this media.

Does that make sense?

Cheers for your help guys,

Heston
 
Ok, I came into work this morning with a fresh brain and managed to get as working model for the query:

Code:
With	Counts As	(
		Select MediaTag.Text,
			   OtherMessages.Counter As OtherMsgCnt
		From   @Media_Lnk_MediaTag As Media_Lnk_MediaTag
		Inner Join @MediaTag As MediaTag  On Media_Lnk_MediaTag.MediaTag_ID = MediaTag.MediaTag_ID
		INNER JOIN (
					Select	Count(Media_ID) As Counter,
							MediaTag_ID
					From	@Media_Lnk_MediaTag As Media_Lnk_MediaTag
					Group By MediaTag_ID
					) As OtherMessages
					ON MediaTag.MediaTag_ID = OtherMessages.MediaTag_ID
		Where Media_Lnk_MediaTag.Media_ID = 1
		)
Select	Text,
		100.0 * OtherMsgCnt / (
								SELECT SUM(OtherMsgCnt)
								FROM Counts 
								)
								AS pct
From	Counts
Group By	Text,
			OtherMsgCnt

I'm not too happy as it looks a little messy so I'm guessing there will be ways to optimize it? who knows? Does the job for now though :)

Thanks guys,

Heston
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top