Smart questions
Smart answers
Smart people
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

LINK TO THIS FORUM!

Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Partner With Us!

"Best Of Breed" Forums Add Stickiness To Your Site
Partner Button
(Download This Button Today!)

Feedback

"...Your information in this site is absolutely WONDERFUL. It is the most useful site on the web to me right now. Thank You Thank You..."

Geography

Where in the world do Tek-Tips members come from?

find command with awfull performance

TSch (TechnicalUser)
2 May 12 1:53
Hello everybody,

I wrote a little script that's supposed to monitor our print server. It checks each print job in the print queue for several states (Delivery Problem, Failed, Aborted).

For every state I perform a separate find command and store the output in a variable.

e.G.

CODE

chk_1b=$(ssh spooler "find /data/SpoolIn -name job.history -exe
c grep -l FAIL {} \; | wc -l" | awk '{print $1}')
chk_1c=$(ssh spooler "find /data/SpoolIn -name job.history -exe
c grep -l ABOR {} \; | wc -l" | awk '{print $1}')

The SpoolIn directory contains a subdirectory for each printjob and every printjob subdirectory itself contains one subdirectory  for each step being performed within the printjob. The jobstate is being found within the job.history file.

So there's quite a large directory structure the find command has got to search that might contain more than 5.000 subdirectories plus several files within each subdirectory.

Now - as you might have guessed already winky smile - all those find commands take quite a long time to complete ...

Is there any way to improve the script to increase the performance and reduce the search time ?

Best Regards,
Thomas
 
feherke (Programmer)
2 May 12 5:12
Hi

Please post a job.history sample. Your Awk implementation and the used shell could be also interesting in picking an optimal solution.

And after all, what is your ultimate goal ? Just 4 shell variables, each with the count of a status ?

One thing in meantime : -execing separate grep instances for each found file is a bad idea. Either use + instead of ; or pipe to xargs. Both methods will run each grep instance with as many parameters as possible :

CODE

ssh spooler "find /data/SpoolIn -name job.history -exec grep -l FAIL {} \+ | wc -l" | awk '{print $1}'

# or

ssh spooler "find /data/SpoolIn -name job.history | xargs grep -l FAIL | wc -l" | awk '{print $1}'
By the way, are you sure the Awk part is necessary there ? When counting the standard input my wc outputs only the number.

Feherke.
http://feherke.github.com/

blarneyme (MIS)
2 May 12 7:56
I know this is in the scripting forum, but Perl is an alternative solution using Find::File like so in my example usage:

CODE

#!/usr/bin/perl

use File::Find;

sub findit {
        my @files = ();
    my @dirs = ();
    open DF,"df -k | grep -v ':/' | awk '{print $6}' | tail +2l|" or die;
    while(<DF>){
        ($fs,$blocks,$used,$avail,$pct,$mnt) = split;
        print "$fs $mnt\n";
            find(sub {
                       if ($_ =~ /\.rhosts$/) {
                               push(@files, [$File::Find::dir . "/$_", (stat($_))[7]]);
                       }
            }, "$mnt");
    }
    close(DF);
    map {
                print "$$_[0]\n";
                open(F, $$_[0]);
                while (<F>) {
                        print;
                }
                close(F);
                print "|\n";
        } sort {
                $$a[1] <=> $$b[1]
        } @files;
}
&findit;
exit(0);
feherke (Programmer)
2 May 12 8:03
Hi

And what that code does ? There seems to be a serious contradiction :

Quote (blarneyme):

CODE

    open DF,"df -k | grep -v ':/' | awk '{print $6}' | tail +2l|" or die;
    while(<DF>){
        ($fs,$blocks,$used,$avail,$pct,$mnt) = split;

Feherke.
http://feherke.github.com/

SamBones (Programmer)
2 May 12 12:19
If the job.history file is at a consistent depth in the directory tree, something like this can be much faster than a find.

CODE

ssh spooler "ls -1 /data/SpoolIn/*/*/job.history 2>/dev/null | xargs grep -l FAIL | wc -l"

 
TSch (TechnicalUser)
4 May 12 2:04
Hi everyone,

here's an example:

CODE

<br><br> 07:25:09.637 18.04.12 <br> CREATE NEW -&gt; PROCESSING job: 14033481<br><br> 07:26:25.439 18.04.12 <br> PROCESSING -&gt; FAILED job: 14033481<br><br> 08:33:23.478 18.04.12 <br> FAILED -&gt; REPRINTED job: 14034875

I'm using ksh under AIX.

The directory depth is not always the same cause there's a subdirectory being created for each step as well as another subdirectory for each substep of the print job and each directory contains a job.history ...

The idea is to script an ASCII monitor that scans for the states mentioned above and gives me a "red light" if there are failed/aborted jobs and a yellow light if there are delivery problems or both lights. So far it works perfectly. Only problem is the performance issue.

Ultimate goal is to get a screen full of monitoring windows for several system's states and get an overview of the landscape's condition ...

Best Regards
Thomas
feherke (Programmer)
4 May 12 5:31
Hi
  • How many job.history file are there regularly ?
  • Are they changing over the time ? If yes, how frequently ?
  • Are you running this check periodically ? If yes, how frequently ?
I asked these because I am thinking about asking find to consider only files newer than a certain age. Not sure if this can bring any improvement, just thinking.
  • Is that text you quoted above a single line ?
I ask because a big simplification would be to use a single find and grep for all "FAIL\|ABOR\|whatever" in once. The -o switch would solve to get only the matched word, but sadly if there are multiple matches in a line, it outputs them all, even if -m is specified. Or at least GNU grep does. If your grep has different behavior ( I mean echo 'foo foo' | grep -o -m 1 'foo' ( with or without the -m option ) outputs a single "foo" ) then tell us. But this would not be an issue if there would be no more than one matching word in a line, so this would work :

CODE

find /data/SpoolIn -name job.history | xargs grep -o -m 1 -h 'FAIL\|ABOR' | sort | uniq -c
This will output a list like :

CODE --> output

      2 ABOR
      3 FAIL
Similar thing could be achieved using Sed too :

CODE

find /data/SpoolIn -name job.history | xargs sed -n '/FAIL\|ABOR/s/.*\(FAIL\|ABOR\).*/\1/p' | sort | uniq -c
  • Quote (Feherke):

    And after all, what is your ultimate goal ? Just 4 shell variables, each with the count of a status ?
I asked that because as you can see, a simple list can be output easier, but if you definitely need separate shell variables, Awk would work better :

CODE

eval $(find /data/SpoolIn -name job.history | xargs awk 'FNR==1{f=0}!f&&match($0,/FAIL|ABOR/){s[substr($0,RSTART,RLENGTH)]++;f=1}END{for(i in s)printf"chk_1%s=%d\n[i]",i,s}')
This will create shell variables like chk_1ABOR=2 and chk_1FAIL=3.
 

Feherke.
http://feherke.github.com/

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Back To Forum

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close