Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reducing CPU impact of regexp matching

Status
Not open for further replies.

ChrisHunt

Programmer
Jul 12, 2002
4,056
GB
I'm having problems with my ISP over a script I've written to search my website. Apparently it's consuming a large % of cpu time when it runs, to the detriment of other users.

What it does is loop through all the HTML files in the site, slurp each one's body content into a variable and then do some regexp matching to look for matches. My thinking is that when those pages are long the regexp processing is very expensive, and because all this is happening at the heart of a tight little loop, it really hits the CPU. Maybe I'm wrong and it's all the disk I/O, but the same principle applies.

So what I'm thinking is, would it help if I added some pauses like this:
Code:
 select(undef, undef, undef, 0.05);
into the heart of the loop - giving other processes 50 milliseconds (or some other period) to get in and do their thing, or is this just going to do another loop within a loop? Is it possible to run a perl script with reduced priority, like the unix [tt]nice[/tt] command? (it's running on a unix server).

It's hard for me to explore and test this issue myself, as I don't have sh access to the server in question. I just have to tweak my script and see if they scream.

I don't really want to post the source here, as it's rather long and involved. I can make it available if people really want to see it (may take a while as I'm away for the weekend).

-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
couldn't you just run nicely?

nice 19 yourscript

Kind Regards
Duncan
 
There's a nice script called perlfect which generates an index at regular intervals, and search results are sought from the index. The index is quite intensive IIRC, but you might be able to reach agreement with the ISP as to a good time to schedule, or you could index a local copy of the site, and upload the index on a regular basis.

The big thing about search functionality, for the most part is that it's quick, and on demand regexing of a site would be far from quick in my book, unless we're talking very few files.

Hope this gives you a few ideas
--Paul

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top