Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

CPU Load Suspension

Status
Not open for further replies.

mpalmer12345

Programmer
Feb 16, 2004
59
US
I am running a Perl cgi-bin program that is causing my account to be suspended due to excessively high cpu load. All the program does is read in a site's html code via LWP::Simple and process it using Perl s// commands. Can anybody give me some general tips as to how I can write Perl code to minimize the cpu load, or whether I can even do this well enough to make a difference?
 
Young Palmer,
First thing ask your ISP to state what is an excessively high CPU load, and ask what causes that to be an excessively high load.

Chances are your ISP has just realised that his CPUs arent cutting the biscuit anymore, and he (they) can charge accordingly

Let me know how you get on

--Paul

There is a finite amount in the difference to how hard a processor can work, and when the ISP needs to re-invest

ROI - a needful thing
 
Thanks for that tip! Here's what he sent me, I can't pretend to know too well what the references mean, although if somebody else does, I'd love to hear about it.

I can also post the code in my cgi-bin file, if that would help get to the source of the problem.

#

The problem stems from a server load ( CPU usage) to be very high and hence
other users in the server had trouble accessing their site/email. Hence the
suspension. Here is the reference.

--------------------------------
162 processes: 154 sleeping, 4 running, 2 zombie, 2 stopped
CPU0 states: 15.2% user 1.4% system 0.0% nice 0.0% iowait 82.2% idle
CPU1 states: 99.0% user 0.4% system 0.0% nice 0.0% iowait 0.1% idle
CPU2 states: 86.2% user 0.3% system 0.0% nice 0.0% iowait 12.4% idle
CPU3 states: 99.2% user 0.3% system 0.0% nice 0.0% iowait 0.0% idle
Mem: 2064400k av, 2050832k used, 13568k free, 0k shrd, 173888k buff
1269668k actv, 287528k in_d, 43952k in_c
Swap: 4192956k av, 154716k used, 4038240k free 1394308k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
19760 arose 25 0 2156 808 60 R 99.1 0.0 696:27 3 HTMLWStrip01b.c
1962 arose 25 0 2036 2036 60 R 98.1 0.0 648:33 2 HTMLWStrip01b.c
19667 arose 25 0 2156 136 60 R 97.9 0.0 698:00 1 HTMLWStrip01b.c
 
Okay, because it's so short, I figure I may as well post the code I've written that is causing the problem, minus a few details. I invite anybody to tell me how I might revise this code to lessen the CPU load, or otherwise tighten it up to run more efficiently.

The program basically takes a user-inputted HTML file and wipes out all image-links and blots out the text. It's something I do as art, so don't ask!

#!/usr/bin/perl
use LWP::Simple;
use CGI;
my $cgi = new CGI;
my $url = $cgi->param("urlval");
my $text = get($url); # THIS GRABS THE URL'S HTML CODE
print "Content-type: text/html\n\n";
# print "Content-type: text/plain; charset=iso-8859-1\n\n";

my $repl = ".";

# REMOVES STRAY NEWLINES THAT ADD EXTRA BLANKS
$text =~ s/>(?:\r?\n)+?</></ig;

# CONVERT PRE TAGS TO SINGLE REPL - INCLUDE NBSP?? NO!!!
$text =~ s/&[a-zA-Z]+?;/$repl/ig;

# REPLACE IMAGE LINKS WITH BLANK JPGS
$text =~ s/(IMG .*?SRC=)[^ ]+( )/$1blank.jpg$3 /ig;

sub makeXs { # TO BE USED IN THE REPLACEMENTS BELOW
my $s = shift;
$s =~ s/\S/$repl/g;
return $s;
}

# REPLACE ALT= WITH DOTS
$text =~ s/alt="([^"]+)"/'alt="' . makeXs($1) . '"'/gie;
$text =~ s/alt='([^']+)'/"alt='" . makeXs($1) . "'"/gie;

# REPLACE VAL= WITH DOTS
$text =~ s/value=([^>]+)>/'value=' . makeXs($1) . '>'/gie;

# MAIN

my $tlen = length($text);
my $tsub = "";
my $xx = 0;
while ($xx < $tlen) {
$subs = substr($text, $xx, 1);
$tsub .= $subs;
$flag = 0;
if ($subs =~ "\n") { $tsub .= "\n"; }
if ($subs =~ ">") {
$flag = 1;
while ($subs !~ "<" && $xx < $tlen) {
$xx = $xx + 1;
$subs = substr($text, $xx, 1);
if ($subs !~ " " && $subs !~ "\n") {
$tsub .= $repl;
}
if ($subs =~ "\n") { $tsub .= "\n"; }
if ($subs =~ " ") { $tsub .= " "; }
}
}
if ($flag == 1) {
$xx -= 1;
chop $tsub;
}
$xx += 1;
}

print "$tsub\n";
 
Hmmmm....

This:
[tt]
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
19760 arose 25 0 2156 808 60 R 99.1 0.0 696:27 3 HTMLWStrip01b.c
1962 arose 25 0 2036 2036 60 R 98.1 0.0 648:33 2 HTMLWStrip01b.c
19667 arose 25 0 2156 136 60 R 97.9 0.0 698:00 1 HTMLWStrip01b.c
[/tt]
is the ouptut you get from a Unix ps command, it tells you about processes running on the machine. Pay particular attention to the TIME column. You have/had three instances of HTMLWStrip01b.c running, each of which has consumed over 10 hours of processing time.

Is it possible that while you were developing the script you did some test runs where the bit below "# MAIN" ran in an infinite loop? Maybe it still does for some malformed web pages? Have them kill off those three processes (if they haven't already done so) and see if the problem persists. It should be possible (and would be more efficient) to replace that nested loop with another s// expression - but my regular expression skills aren't up to suggesting how to do it.

-- Chris Hunt
 
Thanks for the info!

So I gather than an infinite loop in a cgi-bin program still runs even after the user breaks it off? Hm.
 
I think that's true. Looking at it from the web server's point of view - it receives a request from your browser for a particular page, it runs the script to generate the page, then it sends it back to you. It's got no way of knowing that you've hung up in the mean time.

That's my belief anyway, the CGI protocol may actually be clever enough to pick up on such events - I'm not really an expert on how it works.

-- Chris Hunt
 
You're correct. The process will continue even after the user has pushed the 'stop' button on their browser. If it's in an infinite loop, you'll have to kill the process yourself.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top