SQL Problem With Joins

ralphiooo · Jul 20, 2006

Hi, i posted the other day about a problem i encountered when making a new site, trouble is i have another one. First here's my table structure:

players
- player_id (primary key)
- player_name

player_points
- player_id
- points
- date_created

teams
- team_id (primary key)
- team_name

team_players
- team_id
- player_id
- date_created

What i need to do is list all the players from the team_players table for a particular team and calculate the number of points accumulated for that player since that player was put in their team.

Here's what i have put together so far:

Code:

SELECT tp.player_id, p.player_name, t.team_name, SUM(pp.points) AS points
FROM team_players tp
LEFT OUTER JOIN players p ON tp.player_id = p.player_id 
LEFT OUTER JOIN teams t ON p.team_id = t.team_id
LEFT OUTER JOIN player_points pp ON p.player_id = pp.player_id
WHERE tp.team_id = $team_id AND pp.date_created >= tp.date_created
GROUP BY pp.player_id
ORDER p.player_name

The trouble i'm having is that if the player was put in the team_players table after any records being inserted into the player_points table then nothing is returned since, this condition:

pp.date_created >= tp.date_created

returns false. I tried changing the query to:

Code:

SELECT tp.player_id, p.player_name, t.team_name, SUM(pp.points) AS points
FROM team_players tp
LEFT OUTER JOIN players p ON tp.player_id = p.player_id 
LEFT OUTER JOIN teams t ON p.team_id = t.team_id
LEFT OUTER JOIN player_points pp ON p.player_id = pp.player_id AND pp.date_created >= tp.date_created
WHERE tp.team_id = $team_id
GROUP BY pp.player_id
ORDER p.player_name

but i got the same result. I thought that using a left outer join on the table would take care of this but i guess there's a mistake in my logic.

Appreciate it if someone could show me what i am doing wrong. Thanks

rac2 · Jul 20, 2006

Essentially you will need to have columns which store the date of real world events, such as date_points_scored and date_joined_team, instead of relying on database events such as date that a row was created in order to get the results you wish.

As to the difference between having the condition

pp.date_created >= tp.date_created

in the WHERE clause or in the JOIN clause, as you have discovered these are equivalent and have nothing to do with the OUTER JOIN.

Because, placing this condition in the JOIN clause results in a deficient subset of the player_points table; as you thought this deficiency does not eliminate any rows because of the OUTER JOIN. Nonetheless, there will be no data from the player_points table due to this condition. There will be rows but the columns will have NULL values. The aggregate functions using those columns will ignore them because of the NULLs.

Placing the condition in the WHERE clause eliminates the rows, hence they cannot be aggregated.

guelphdad · Jul 20, 2006

Your group by clause will almost always be incorrect when you include columns in your select statement that are not in your group by clause. all other databases will throw an error message and not run the query. Mysql warns about GROUP BY HIDDEN FIELDS in the manual and says you could end up with incorrect data.

As to your problem I believe you will make a simple change to your player_points table by adding a column for the teamid when the points are created.

As it is right now say a player starts the year with Boston, moves to Toronto, gets traded to Montreal and then to San Francisco. The only way you would have this information is in your team_players table and I think there is too much unnecessary coding needed to figure out what points were created by the player when he was playing for Montreal for instance.

ralphiooo · Jul 20, 2006

Hi cheers for your suggestion guelphdad. I'm just thinking this one over. Here's an example (with your idea):

Say i have 100 teams with 10 players from each team (200 players in total to select from) this gives me 1000 records in the player_points table. If i update this every week then it gives me 52,000 records. This is just 100 teams and if the thing grows really big it could be multiple thousands of teams giving me a potential million records in the player_points table.

Using the same example above with my previous method i would have 1000 records in the team_players table and 10400 records (pretty much constant since the number of players to select from shouldn't change much) in the player_points table.

Appreciate it if you could tell me if your method will be efficient enough if i have to group it up by player_id's and then sum the points to calculate a team total).

Thanks

guelphdad · Jul 21, 2006

How about if you show us some sample data from your tables. that might make your problem clearer.

raphael232 · Jul 21, 2006

Hi, here's some sample to help you:

players (200 rows):
player_id | player_name
-----------------------
1 | Player 1
2 | Player 2
3 | Player 3
4 | Player 4
5 | Player 5
...
200 | Player 200

teams (100 rows):
team_id | team_name
-------------------
1 | Team 1
2 | Team 2
3 | Team 3
...
100 | Team 100

player_points (i decided it would be better to put the points field in the team_players table, instead of the other way around - so now this table is redundant).

team_players (100 teams * 10 players * 52 weeks = 52,000):
team_id | player_id | date_created | points
-------------------------------------------
1 | 1 | 01/01/06 | 0
1 | 2 | 01/01/06 | 1
1 | 23 | 01/01/06 | 0
1 | 34 | 01/01/06 | 7
1 | 67 | 01/01/06 | 5
1 | 123 | 01/01/06 | 0
1 | 132 | 01/01/06 | 2
1 | 145 | 01/01/06 | 0
1 | 155 | 01/01/06 | 2
1 | 178 | 01/01/06 | 0
...
1 | 1 | 08/01/06 | 0
1 | 2 | 08/01/06 | 1
1 | 23 | 08/01/06 | 0
1 | 34 | 08/01/06 | 7
1 | 67 | 08/01/06 | 5
1 | 123 | 08/01/06 | 0
1 | 132 | 08/01/06 | 2
1 | 145 | 08/01/06 | 0
1 | 155 | 08/01/06 | 2
1 | 178 | 08/01/06 | 0
...

The above shows 2 weeks of data for 1 team. Over 52 weeks there would be 52,000 rows in this table.

Now say i wish to list all the players in a team and sum their points i would put:

Code:

SELECT SUM(tp.points), p.player_name
FROM team_players tp
LEFT OUTER JOIN players p ON tp.player_id = p.player_id 
GROUP BY tp.player_id
WHERE tp.team_id = $team_id

Would this be efficient enough. Eventually if i had 1000 teams then i would have 520,000 rows in the team_players table.

Appreciate if you could help. In the past i've only really dealt with small databases and have never had to go into thousands of rows of data.

Cheers

r937 · Jul 21, 2006

step back a second and take a moment to learn how to write GROUP BY clauses

your SELECT clause will often have a mix of aggreaget expressions and non-aggregate expressions --

foo, bar, sum(qux), count(*)

every non-aggregate column in the SELECT clause must(*) be in the GROUP BY clause

you have

SELECT SUM(tp.points), p.player_name
...
GROUP BY tp.player_id

and that's not right, is it

change it to:

GROUP BY p.player_name

^{(*) according to standard sql, which mysql ignores -- mysql lets you get away with non-standard sql, hence many mysql developers never actually learn how grouping is supposed to work}

r937.com | rudy.ca

guelphdad · Jul 21, 2006

I understand your concern about how big the table will be. An important thing to remember is this. IF your tables are properly normalized AND IF your tables are properly indexed then you will have very little trouble returning results from a query. Yes even millions of rows will return relatively quickly.

When tables are not normalized, then your queries can have to do more than necessary at that point you might not be able to use an index or your query might be more taxing.

For instance, a lot of time people want to store a list of values in a single column. When that happens either you can't use an index, or to find out information your query gets complicated.

Take a sec to look at what Rudy has posted above and see if you understand it. Test it that way and see if your results are what you expect.

One final thing I'd like to point out, DO NOT store your dates as you are doing. You are storing them in a varchar or char field. Doing so, you then lose all ability to use DATE and TIME functions (one of the things that will slow your queries down). Store them in a DATE format column in the format of yyyy-mm-dd.

Dave

ralphiooo · Jul 21, 2006

Hi, cheers for your help. I seem to be having this going good, however i've encountered one further problem when updating the player points. To update the points i run a script and it inserts in the team_players table. The trouble i'm having is that if the page fails executing half way through then it needs to be re run therefore for the rows which were processed the first time you get duplicate entries.

Is there anyway in mysql you can only make the changes commit to the database if the script has run? Please note i am programming with PHP if it helps.

Thanks

guelphdad · Jul 21, 2006

Yes look into using TRANSACTIONS. Check the mysql manual, make sure you are using the proper version (I forget what version is the earliest to support transactions). You need to use InnoDB tables to use transactions.

ralphiooo · Jul 22, 2006

Hi, cheers for your help people. Think i'm heading in the right direction now.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

SQL Problem With Joins

ralphiooo

Programmer

rac2

Programmer

guelphdad

Technical User

ralphiooo

Programmer

guelphdad

Technical User

raphael232

Programmer

r937

Technical User

guelphdad

Technical User

ralphiooo

Programmer

guelphdad

Technical User

ralphiooo

Programmer

Similar threads

Part and Inventory Search

Sponsor