Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Joining to the max previos row

Status
Not open for further replies.
Jul 19, 2003
132
NZ
I have a task table that I need to derive for each task which of the previous tasks was the parent.

The parents are found by ranking the tasks by various criteria then joining the task table to the previous task and comparing them. Depending on a series of business rules (compared with the previous task) then if the new task meets those rules it's a parent (called a Placement).

I end up with two tables, one is all the tasks ranked (1 milion rows), the other is the Placements (300,000 rows) with only those tasks that are parents. I thought this would be the hard part done.

I then want to select for each task id the parent id by looking for the max task id that is <= the current task id.

I've tried various options, added indexes, cte's, tables, views. It takes an icredibly long time, 19 hours+.

Here are a couple of queries I've tried, I'm sure there is a better way of doing this but I'm drawing a blank, any ideas?

Insert INTO dbo.PlacementTaskLink
SELECT
MAX(Placement_Id) AS Placement_Id,
Fact_Tasks_Id
FROM
(SELECT
p.Placement_Id,
t.Fact_Tasks_Id
FROM
dbo.TaskResultRanked as t CROSS JOIN
dbo.Placement p
WHERE p.RankRows <= t.RankRows) x
GROUP BY
Fact_Tasks_Id;

Insert INTO dbo.PlacementTaskLink
Select
(Select max(Placement_Id) from Placement p where p.RankRows <= tr.RankRows) as Placement_Id,
Fact_Tasks_Id
from
TaskResultRanked tr

Thanks

Bruce

 
did you create an index on Placement with the following columns RankRows , Placement_Id in this order?

and I would try the following after the index is added

Code:
Insert INTO dbo.PlacementTaskLink
Select
      (Select top 1 Placement_Id 
       from Placement p 
       where p.RankRows <= tr.RankRows
       order by RankRows
              ,Placement_id desc) as Placement_Id
      ,Fact_Tasks_Id
from TaskResultRanked tr

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Inserting 1.m^2 rows could be where your slowness is coming in.

CROSS JOIN on a table with 1m+ rows is less than ideal.
Isn't there some other qualifier other than the id that you can use to restrict down the joined rows?

Code:
FROM 
dbo.TaskResultRanked as t CROSS JOIN 
dbo.Placement p
WHERE p.RankRows <= t.RankRows) x

Code:
FROM 
dbo.TaskResultRanked as t 
LEFT JOIN dbo.Placement p ON
[tab]p.RankRows <= t.RankRows) x

Need you to make a script with a temp table and sample data to clean up your query.

Lodlaiden

You've got questions and source code. We want both!
There's a whole lot of Irish in that one.
 
fredericofonseca I'll try the clustered index as you suggest but the query you suggested won't work as it returns row 1 as the placement id in all cases.

Qik3Coder yes the join you suggest is what I rtried first, I chanbged to a cross join and a where statement when I weas starting to get desperate.

Thanks

Bruce

 
Got one error on my query. just noticed after your latest comment

Code:
Insert INTO dbo.PlacementTaskLink
Select
(Select max(Placement_Id) from Placement p where p.RankRows <= tr.RankRows) as Placement_Id,
Fact_Tasks_Id
from
TaskResultRanked tr


But in any case... I misread your query. the above query is retrieving the highest placement id where p.ranks is less than tr.ranks
As such the index I suggested is incorrect, and any index on this case will be of less use than what I though.

However try this index instead
index column placement_id
include RankRows (might not be required if the only 2 columns are these two and if you create the index as clustered)

and this query

Code:
Insert INTO dbo.PlacementTaskLink
Select
      (Select top 1 Placement_Id 
       from Placement p 
       where p.RankRows <= tr.RankRows
       order by Placement_id desc
      ) as Placement_Id
      ,Fact_Tasks_Id
from TaskResultRanked tr

Note that as I said above, this query should return exactly the same number of rows as your own query.
if you notice I am ordering by "placement_id desc" (e.g. the MAX number is returned first), and as there is a "top 1" clause, only 1 record is returned.
This construct equates to a select max()


Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Everything I tried failed however changing from p.RankRows <= tr.RankRows to p.Placement_Id <= tr.Fact_Tasks_Id changed the runtime from 19 hours to 3 seconds, I couldn't believe it.

Both of the above do the same thing, I'm not sure why I was fixated on using rankrows.

Thanks

Bruce

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top