Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Running command simultaneously, please help 3

Status
Not open for further replies.

h3nd

Programmer
Jul 1, 2006
147
AU
Hi guys,

I really need a favour here.
Currently I have 4 CPUs in my server,

I need to load the 144 data, if I load the data sequencially. It takes about 2.5 hours which is not good.

So I need to load the data 3 data at a time assign to each CPU,

Here's roughly my script atm :
Code:
for file in `ls`
do
    [load command] $file > log$file
done

what I want is roughly like this
Code:
for file in `ls`
do
  [load command] $file > log$file & [load command] $file > log$file & [load command] $file > log$file
done

And I need that 3 loading working independently, so whenever loading finish it doesnt have to wait for other to finish but continue loading the next one.

Can you guys shed some light here ?

Any input will be much appreciated.
 
How about writing a script that do the download, and pass the filename as an option to the script?
You can call the script several times, maby inside another script with each file you want to download.
:)
 
Did you mean like this (i.e. on separate lines)?

Code:
for file in `ls`
do
  [load command] $file > log$file & 
  [load command] $file > log$file & 
  [load command] $file > log$file
done

Annihilannic.
 
Sorry, it should be like this:

Code:
for file in `ls`
do
  [load command] $file > log$file &
done  

wait       # waits for all loads to complete

Annihilannic.
 
As I understand h3nd's issue it's about wanting to run 3 load simultanously out of a list of unknown length. I can see a script that looks something like
Code:
#!/bin/ksh

set -A arr * #load the filenames into an array arr
COUNT=0
while [ $COUNT -lt ${arr[*]} ] # while file to process
do
  #Run command 1
  [load command] $(arr[$COUNT]} > log$(arr[$COUNT]} &
  (( COUNT += 1 ))
  #Run command 2
  [load command] $(arr[$COUNT]} > log$(arr[$COUNT]} &
  (( COUNT += 1 ))
  #Run command 3
  [load command] $(arr[$COUNT]} > log$(arr[$COUNT]} &
  (( COUNT += 1 ))
  wait
done
but thi is not tested. Furthermore it doesn't address the perfect answer which is to have the next job start as soon as the first job ends. It might be posssible to do something with
Code:
set -A jobarr = £(jobs -p} #set array top process ids of outstanding jobs
[[ ${jobarr[*]} -lt 3 ]] && run_next_job
which would test the number of background tasks running. Put this in a sleep loop i.e
Code:
while jobs_to_do
do
  set -A jobarr = £(jobs -p} #set array top process ids of outstanding jobs
  [[ ${jobarr[*]} -lt 3 ]] && run_next_job
  sleep 5
done
and now the issue is to write the routines 'jobs_to_do' and 'run_next_job'. Again I would itterate through an array so we get
Code:
#!/bin/ksh

set -A files * #Create an array of filenames
COUNT=0
while [ $COUNT -lt ${files[*]} ]
do
  set -A jobarr $(jobs -p)
  [[ ${jobarr[*]} -lt 3 ]] && 
    {
    [load command] ${files[$COUNT]} &
    (( COUNT += 1 ))
    }
  sleep 5
done
Again this is untested!!!


Ceci n'est pas une signature
Columb Healy
 
Ah, I see. A script I wrote that does a similar job is parallelem:

Code:
#!/bin/ksh
#
# Ver   Date            Changes
# 1.0                   Original
# 1.1   22/12/2004      Added 'eval' so it can handle shell redirections, etc.
# 1.2   19/10/2005      Added grep -v to allow comments in command file.
# 1.3   15/09/2006      Optionally change sleeptime.
# 1.4   22/09/2006      Cosmetic change to log output.

SLEEPTIME=${4:-15}

if [[ $# -ne 3 ]]
then
        echo
        echo "usage: $0 <threads> <searchstring> <commandfile> [ <sleeptime> ]"
        echo
        echo "This script is designed to run a batch of like processes in parallel"
        echo "while ensuring that only a certain number are running at any one time."
        echo
        echo " <threads> is the number of processes to run in parallel."
        echo " <searchstring> is the command name to search for in 'ps -eo comm' output"
        echo " <commandfile> contains the list of commands to run.  Use '-' to read from"
        echo " standard input."
        echo " <sleeptime> is the time to sleep between counting threads.  Default is 15,"
        echo " however it may need increasing for commands that are slow to start."
        echo
        exit 1
fi

PPROCS="$1"
SEARCHSTRING="$2"
COMMANDFILE="$3"
DATECMD="date +%Z-%Y-%m-%d-%H:%M:%S"

echo "$(${DATECMD}): $0 started."
echo "$(${DATECMD}): Using ${PPROCS} threads, search string \"${SEARCHSTRING}\"."
echo "$(${DATECMD}): Reading commands from \"${COMMANDFILE}\"."

# cat required here to support '-' for reading of stdin
cat ${COMMANDFILE} | grep -v ^# | while read COMMAND
do
        until [[ $( ps -eo comm | grep "${SEARCHSTRING}" | wc -l ) -lt ${PPROCS} ]]
        do
                sleep ${SLEEPTIME}
        done
        echo "$(${DATECMD}): Running: \"${COMMAND}\"."
        ( eval ${COMMAND} ; echo "$(${DATECMD}): Completed: \"${COMMAND}\"." ) &
        sleep ${SLEEPTIME}
done

wait
echo "$(${DATECMD}): All commands processed."

This could be invoked using something like:

[tt]ls | awk '{print "loadcommand " $1 " > log" $1}' | parallelem 3 loadcommand -[/tt]

Annihilannic.
 
Nice script Annihilannic

I can see pros and cons between using your ps with a grep or my jobs -p but it certainly solves h3nd's problem

Ceci n'est pas une signature
Columb Healy
 
Thank you very much guys,

I'll test it tomorrow morning and let u guys know how it's going...

 
Guys, before I test it.

Is the three job at a time that I run will run in 3 CPUs ?

How do I check it ?

Thanks guys
 
Anni,

Could you explain me about this :
SLEEPTIME=${4:-15}

Thanks man,
 
It means that the SLEEPTIME variable will be set to the value of $4 (the fourth parameter) if it is defined, or to 15 if it is not defined.

Annihilannic.
 
Does your kernel handle multi-processors? If so, the load should be split automatically, no?

"That time in Seattle... was a nightmare. I came out of it dead broke, without a house, without anything except a girlfriend and a knowledge of UNIX."
"Well, that's something," Avi says. "Normally those two are mutually exclusive."
-- Neal Stephenson, "Cryptonomicon"
 
It doesn't have to be complicated. Using your original script example, this will keep three running at all times...
Code:
#!/bin/ksh

for file in `ls`
do
    [load command] $file > log$file 2>&1 &

    while (( $(jobs -p|wc -l) )) >= 3 ))
    do
        wait 5
    done
done
There's no guarantee that they will each be locked to a processor. If you want that, on Solaris you can define a processor group, and then assign a process to only run on that processor or processor group. But that makes the script a lot more complicated.

If they are very CPU intensive, they will generally stick to a single processor, but may be pushed off by other processes on the machine.

But, it should still do what you want which is keep the machine loaded, but not overloaded.
 
hey SamBones,

Your solution is working great !!!
Thanks alot man,

but I need more understanding about this
Code:
2>&1

and wait 5, does it mean if the [loading command running] more than 3 jobs it will wait 5 secs ?

Because some files are big and takes about 15 minutes to finish, does it matter ?

Explain to me pls ?? or any other guy ?

Anni,

I really like your coding, that looks professional. But I need more understanding to apply to my code,

in this part
Code:
until [[ $( ps -eo comm | grep "${SEARCHSTRING}" | wc -l ) -lt ${PPROCS} ]]
        do
                sleep ${SLEEPTIME}
        done

shouldn't it be -gt instead ?
could you explain your code using english ? Thanks man

columb,

Thanks alot for your explanation :)

Cheers,
 
I think SamBones meant to use sleep 5 there instead of wait 5, a typo I imagine.

2>&1 means that standard error is redirected to the same place as standard out. More specifically, it means that file descriptor 2 (which is standard error) is copied from file descriptor 1 (which is standard output).

No -gt is correct; in English that reads:

[tt]until the number of running commands is less than PPROCS
wait for a while[/tt]

In other words, using your example, we want to sit and wait until there are less than 3 loads running before starting another one.

Annihilannic.
 
Thx Anni,

can I ask more ?
but why do I need "2>&1" for ?
what happen if I dont use it ?

Thanks man


 
By convention Unix and Linux commands send their output to two places. Normal output goes to "standard output" (file descriptor 1) and error messages go to "standard error" (file descriptor 2). This is an example of the difference:

[tt]$ touch exist
$ ls exist notexist
ls: notexist: No such file or directory
exist
$ ls exist notexist > /dev/null
ls: notexist: No such file or directory
$ ls exist notexist > /dev/null 2>&1
$[/tt]

Notice how the "ls: notexist: No such file or directory" is still displayed when standard output is redirected to /dev/null? That is because standard error has not been redirected as well.

If you like you can do something like this to send standard output and standard error to different output files:

[tt]$ ls exist notexist > /tmp/ls.out 2> /tmp/ls.err
$ cat /tmp/ls.out
exist
$ cat /tmp/ls.err
ls: notexist: No such file or directory
$[/tt]

Does that make sense?



Annihilannic.
 
Oops, yes, I meant "sleep" and not "wait". That's what I get for typing without testing. [bigsmile]

My code annotated is...
Code:
#!/bin/ksh

# for each file in this directory, do what's in the loop
for file in `ls`
do
    # run your command ([load command] $file)
    # send normal output to the log (> log$file)
    # also send error msgs to same place (2>&1)
    # do the whole thing as a background job (&)
    [load command] $file > log$file 2>&1 &

    # while there are three or more background jobs
    # running, wait 5 seconds and check again
    while (( $(jobs -p|wc -l) )) >= 3 ))
    do
        sleep 5
    done

    # when one of the three jobs finishes, the previous
    # while loop will exit and the outer "for" loop will
    # start the next file provided by the "ls"
done

# if there is no "next file" to load, this "wait" will
# wait for the last two jobs to finish
wait
 
Need more coffee!!!

Make that while loop read...
Code:
while (( $(jobs -p|wc -l) >= 3 ))
...
I had too many parens!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top