Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

BIG problem while implementing pagination in back-end 4

Status
Not open for further replies.

orlyotero

Programmer
Jun 2, 2005
21
US
Hi all,

I'm really facing a bad problem.
I'm using SQLServer as back-end and ASP as front-end.
I need to display information from a table(joined with other 2 or 3 at most) that has more than 54,000 records, so after spending some time reading, I decided to implement pagination in back-end.
There are two major approach to implement pagination:
1) Retrieve more than 54,000 records to front-end and move the set pointer row by row up to the desired position, a very procedural solution on a relational model.
2) And this one that was what common sense told me to choose.

Let SQL Sever do what it is good for. I read a SP in a forum and I adapted it to my needs. The SP just retrieves the desired amount of records, say for instance 25 from page 1(meaning records 1 to 25), or 25 records from page 2(meaning records 26 to 50), and so on. This way front-end just uses 25 rows every time(no matter what page, they are just 25). So once front-end got the records, it's an easy job, but the problem is when sql server try to retrieve the records.
The following query took 1 min and 40 sec to show a result in query analizer(obviously my asp scripts timed out at all times, can't imagine 20 user executing the same script):

SELECT R.*, PB.pb_PubType, PB.pb_PubName, PR.pr_Desc FROM tblRecorder R
LEFT OUTER JOIN tblPublication PB ON R.rc_Directory=PB.pb_RecNo
LEFT OUTER JOIN tblProducts PR ON R.rc_Product=PR.pr_ProdNo

WHERE R.rc_AdNo IN

( SELECT TOP 25 R.rc_AdNo FROM tblRecorder R
LEFT OUTER JOIN tblPublication PB ON R.rc_Directory=PB.pb_RecNo
LEFT OUTER JOIN tblProducts PR ON R.rc_Product=PR.pr_ProdNo

WHERE 1=1 AND R.rc_AdNo NOT IN
( SELECT TOP 25225 R.rc_AdNo FROM tblRecorder R
LEFT OUTER JOIN tblPublication PB ON R.rc_Directory=PB.pb_RecNo
LEFT OUTER JOIN tblProducts PR ON R.rc_Product=PR.pr_ProdNo

WHERE 1=1
ORDER BY rc_ListName ASC )
ORDER BY rc_ListName ASC )
ORDER BY rc_ListName ASC

there is a total of 54687 rows, meaning 2188 pages of 25 rows each.
While retrieving the first pages(page 1 to 200), no problem, a very quick reponse, the same while retrieving the last pages(2000-), the big problem is while retrieving the middle pages(1000, 1010, etc.)

Now, if in the above query, 25 rows are switched to retrieve 50 rows, the query got executed in a fraction of second, obviously it is not a data domain value since the 50 selected rows includes the original 25 rows that took 2 minutes to be retrieved, plus 25 more records that doen't belong to the 25225 rows set that meets the same criteria.

I really dont know what is going on with this, I have read about views, etc, etc, etc, and I dont know why is this happening.

If I decide to choose 50 rows per page, what about when the table doubles the current amount?

Any idea? any comment or suggestion or help or anything will be greatly appreciated.


Regards,
Orlando Otero

P.S. sorry about my english.
 
I'm not an ASP person (yet). However, my first thought is to check your ASP connection. Are you using connection pooling? Are you only opening the connection for the length of time that you need it? If you have the connection open for the entire time of the select, that might be the problem. Try having the connection open only long enough to grab the first 25 records, close it, and then re-open it when the user goes to page 2, etc.

Check with the ASP group to make sure the above suggestion is feasible.

My second thought is maybe you need to use a cursor in T-SQL to limit the rowset pulled for each page. I know, cursors = Bad Thing <tm>, but sometimes they are neccessary. ISTR that Cursors have the functionality to do exactly what you want pagination to do, but I don't know that they would be any faster than what you currently have.

My last thought is to use "openrowset", but I've never had the need for that command, so I'm not sure if it will even help you out.

Anyone else have suggestions? (or corrections?)



Catadmin - MCDBA, MCSA
"If a person is Microsoft Certified, does that mean that Microsoft pays the bills for the funny white jackets that tie in the back???
 
Thanks,

Catadmin, the problem doesn't has to be with ASP(yet), I'm runing the SP from query analyzer, which I think doesn't time out. query analyzer spent 2 minutes executing the above query

jbenson001, I really don't know how nested select work internally, I don't know if nested select create cursor, but I just have to wait 2 min to get result from the execution of the above query, that is the only code I execute, no explicit cursor, no other piece of code.

Regards,
OO
 
Nested Select statements don't create cursors, they are subqueries. You actually have to Declare and Open a cursor to have one in your code.

And I apologize, I didn't realize you were testing in QA. I thought your query time was from the ASP window.

Have you checked your index structure on the tables in question? Also, go to Query -> Show Execution Plan on the menu the next time you run your code. It will tell you where the major bog down of your code is and knowing that will help us help you.



Catadmin - MCDBA, MCSA
"If a person is Microsoft Certified, does that mean that Microsoft pays the bills for the funny white jackets that tie in the back???
 
Lemme try to reformulate the problem: if you google for some information and get 54,000 results, would you rather:

- hit Next-Next-Next 2000+ times
- try to narrow search criteria and get fewer results?

Paging is OK to take some stress off web server and make web pages reasonably small (fast load). But paging without persistently stored search results - which is often not acceptable due to excessive storage - kills back-end server because with each web page server must repeat query from scratch.

Also: in order to get Nth row, server must materialize all previous (N-1) rows. Whatever you do, this is inevitable. With number of rows growing SQL Server uses less efficient joins (hash or merge, compared to initial nested loops) - and at some point it may decide to go table scan instead of using any index available. So personally I'd suggest you to avoid paging, or at least make deep paging harder (no "Go to last page" link or "Go to page" input field).

Yup, there are ways to optimize code but like I elaborated this would make execution faster but wouldn't solve problem in general.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
Thanks for all replays guys,

The example explicitly creates a temporal table, let me explain a litle bit more about the code:

I coded some SPs that retrieve lists or sets or collections from different tables, say for instance, 1 SP for retrieve categories, another for retrieve messages, another for retrieve users(the real app is a little more complex), but these SPs that their set results has nothing in commun call the same SP, name spGetPageData.

The Caller SPs receive parameters used as criteria, these Callers setup some variables(@Fields, @From, @Where, @PageSize, @PageNumber, etc) and use them as input for SP named spGetPagedData, which is who builds and executes the dynamic query, since almost ten or more SPs call the same spGetPagedData to implement pagination and since the sets retrieved are different in composition and Structure, I can't create and fill a Temporal table because all fields of Two SPs caller results are going to be different.

The query above written is one example of what spGetPagedData generates, and it takes 2 minutes in got executed. by the way, I didn't iterate on any list or any cursor, just copy and paste that query in query analizer, run the query and WAIT for the response.

Another point is that to navigate to page n it is not necesarely to navigate to page n-1 since I display First, Previous, Next, Last page buttons, BUT, I display too a combo with 2188 values, each corresponding to a particluar page, that way if the USER wants to go to page 2000 just select 2000 in the combo box.
Another thing is the query is filetered by some criteria, the criteria is choosen by the user in the interface or presentation page, if the user doesn;t choose to filter the set, there are 54,000 records in the table that meets no restriction since the user didn't restrict it.

Ideas?

Regards, OO
 
Combo with 2188 values [shocked]? Even that cracks out 50-100kB of HTML. Simple text box with some validation is much more efficient for a price of being less fancy.

Back to original query:
Code:
...  WHERE R.rc_AdNo IN
( SELECT TOP 25 R.rc_AdNo FROM tblRecorder R
[b]    LEFT OUTER JOIN tblPublication PB ON R.rc_Directory=PB.pb_RecNo
    LEFT OUTER JOIN tblProducts PR ON R.rc_Product=PR.pr_ProdNo [/b]
...
Lines in bold are not necessary - left joins are used adn IN() checks only column from table R.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
Vongrunt, you say:

"Paging is OK to take some stress off web server and make web pages reasonably small (fast load). But paging without persistently stored search results - which is often not acceptable due to excessive storage - kills back-end server because with each web page server must repeat query from scratch."

The ASP programming model is stateless, meaning you can't cache recordsets or other data server-side, easily, nor at all if you want scalability. Frankly, a technique like that I just posted, that requeries for each page request should be acceptable for MOST web applications and data sets. If the entire resultset is more than a couple hundred records, than paging in this manner won't be effective anyway, as no user wants to have to page, page, page, page to get to what they want to work with (then you need additional filters, or somesuch).

TJR
 
I'm going to try the textbox option, anyway this webapp is the company's Intranet application, not an Internet public app, and our Intranet is very fast with theorically 80 users, at most 30 in practice.

About the other point, in that particular query, the left joins are not necessary but If the user decides to filter the set, chossing for instance Publication Name, this particular field isn't in the main table, that's why they are in the query. Both LEFT JOINS use tables with fields that in this case weren't used to filter, but they could be used.

The execution plan tells 46% in the inner select, It got executed separately in a fraction of second, on the other hand the 2 most inner select got executed in almost 2 min(the problem must be here), and If I choose in the inner select 50000 instead of 25225 the query got executed in fraction of second, isn't this a contradiction?

Thanks guys a lot for your comments suggestions and time

Regards,
OO
 
TJR If I remember well, I think I tried the PageSize and AbsoultePage technique as the article shows, it doesn;t work well to me and I didn't know why, and thereafter I decided to use the SP version.

I'm using OLEDB Provider for ODBC, I don't know if this had some influence in the fact that PageSize, AbsolutePage and even RecordCount didn't get updated correctly.

The work I'm doing is a continuation of another person 2 years work, I tried to use Native OLEDB provider and there are a lot of erros in rs("aField") references, so I went back to the ODBC version, anyway, I'm going to try one more time with the articles code.

Thanks a lot,

Regards,
OO
 
TJRTech said:
The ASP programming model is stateless, meaning you can't cache recordsets or other data server-side, easily, nor at all if you want scalability. Frankly, a technique like that I just posted, that requeries for each page request should be acceptable for MOST web applications and data sets.
That brings us to another contradiction: deep paging always stinks, "shallow" paging is often not necessary because you can display all rows at once. Relatively tiny zone where paging is actually useful is not worth any server-side complication, so unless SQL Server gets LIMIT clause in SP5 personally I'd also implement it via ADO - and still disallow/discourage deep paging.

One thing people are rarely aware of: internal query optimization greatly depends on row estimates. I remember one case when paging worked nicely up to first 1000 rows and after that went X times slower. Reason - radically different exec plan. I guess this is exactly the same case here - after all, 54,000 rows are not much.

Another possibility that comes to mind are locks. If queried data undergoes constant changes, writers will block costly paging reads. Because SQL2k has no snapshot isolation developers are sometimes forced to use NOLOCK - usually no big deal for web searches & similar stuff but with dirty reads you never know.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
vongrunt, actually we have a production app and a development-testing app. They use different databases with the same structure.

I'm almost sure that at this moment there are two users at most accessing the testing app(it's lunch time), and I'm one of them, and just searching, so I don't think that write locks could be the problem at this moment.
For the time being I'm thinking on a mix of back-end and front-end pagination, if connection or command times out, uses the alternate. I particulary don't like the solution, hoping any news, but at least a set could be presented.

Thanks for the replays.

Regards,
OO
 
OK. There is one nifty way to simulate server-side paging without excessive I/O - but I need to know which column(s) in result set are unique (perhaps primary key from R table?) and which are ORDERed BY (rc_ListName?).

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
OK, PK rc_AdNo field from tblRecorder,

and the ORDER BY cluase could contain:
_ rc_ListName(String) from tblRecorder,

rc_Directory is a field in tblRecorder that references a directory id in tblPublications, and the user could choose to order the Publication name, that left us as follow:
_ pb_PubName(String) from tblPublication(one of the reason for outer join in the sample query),

rc_Prod is a field in tblRecorder that references a product id in tblProduct, and the user could choose to order by the Product name, that left us as follow:
_ pr_DescName(String) from tblProduct(the other reason for the other left join in the sample query),

_ rc_ListPhone(String)
_ rc_SaleAmount(Currency)
Both are fields of tblRecorder

Probably left joins could be changed to use inner join and that would left some rows out of the set, since those rows has no much meaning, but I don't think this is the main reason of this delay.

I'm wondering if is there another way to express the initial query using T-SQL?, I mean, using temp tables or "while" or any other alternative?

Regards,
OO
 
Something you should be aware of. Anytime you do a Select *, it's going to do a tablescan, which takes tons and tons of time. You have one listed in your initial post:

SELECT R.*, PB.pb_PubType, PB.pb_PubName, PR.pr_Desc

That first item there. Do you really need every single last column from table R? Regardless, try parsing out every single column by name. Harder to type in at the start, but it should save you some query time when things are run. And if you don't need every column, just list the ones you do need, which will save even more time.



Catadmin - MCDBA, MCSA
"If a person is Microsoft Certified, does that mean that Microsoft pays the bills for the funny white jackets that tie in the back???
 
I'm not going to need R.*, now the query is:

SELECT R.rc_AdNo, R.rc_ListPhone, R.rc_SaleAmount, PB.pb_PubType, PB.pb_PubName, PR.pr_Desc FROM tblRecorder R
LEFT OUTER JOIN tblPublication PB ON R.rc_Directory=PB.pb_RecNo
LEFT OUTER JOIN tblProducts PR ON R.rc_Product=PR.pr_ProdNo WHERE R.rc_AdNo IN
( SELECT TOP 25 R.rc_AdNo FROM tblRecorder R
LEFT OUTER JOIN tblPublication PB ON R.rc_Directory=PB.pb_RecNo
LEFT OUTER JOIN tblProducts PR ON R.rc_Product=PR.pr_ProdNo WHERE 1=1 AND R.rc_AdNo NOT IN
( SELECT TOP 25225 R.rc_AdNo FROM tblRecorder R
ORDER BY R.rc_ListName ASC )
ORDER BY R.rc_ListName ASC )
ORDER BY R.rc_ListName ASC

and it saves now between 20 and 30 sec, a big advance, thanks a lot Catadmin,
 
Anytime! @=)



Catadmin - MCDBA, MCSA
"If a person is Microsoft Certified, does that mean that Microsoft pays the bills for the funny white jackets that tie in the back???
 
Let's try to make it even faster (I simplified query, joins are easy to add):
Code:
-- "bookmark" vars
declare @rc_ListName varchar(30)
declare @rc_AdNo int

-- find unique values for 1000th row according to ORDER BY
select top 1000 @rc_AdNo = rc_AdNo, @rc_ListName = rc_ListName
from tblRecorder 
order by rc_ListName, rc_AdNo

-- get 50 rows after "bookmark", same ORDER BY
select top 50 <some columns>
from tblRecorder 
where (rc_ListName > @rc_ListName or (rc_ListName = @rc_ListName and rc_AdNo > @rc_AdNo))
order by rc_ListName, rc_AdNo
The trick is to have "unique ORDER BY" - that way it is possible to get unique "bookmark" with SELECT TOP... @variable query and then extract page of rows immediately after bookmark (same ORDER BY).

If things still get slow, you can play with join hints in first SELECT and eventually clean up unnecessary SQL (optimizer is smart but not too smart :X).

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top