using an external script in awk 3

theo67 · Jul 25, 2012

Hello to all,

i can not find a solution:

i have a file file1.txt looks like
4567;9803298ß;38840 (over thousands and thousands Lines)

And an external script "dosomething"
Now i need to do the following stepps:
1. Open the file
2 Cut the first 4 digits
3. Use them as a parameter to call an external script like: dosomething xxxx
4 Use the result of "dosomething xxxx" to replace the first 4 digits of this line...
5 print the line in output.

i am working with awk and i can not find a way to use this external skript in it...

Can someone PLEASE help?

thnaks!

PHV · Jul 25, 2012

Have a look at the getline function in your awk's documentation or man page:
command | getline var

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

feherke · Jul 25, 2012

Hi

Sorry, this is an off-topic answer, but my curiosity is at the end of its limits.

Why do you need and Awk solution for this ?

Do not take it personally, this is not strictly related to you or your question. I just saw during the years people coming and asking for Awk solutions and there are cases when I can not imagine why.

For example I would solve your problem like this, definitely without involving Awk :

Code:

[b]while[/b] [navy]IFS[/navy][teal]=[/teal][green][i]';'[/i][/green] [b]read[/b] -r begin end[teal];[/teal] [b]do[/b]
  echo [green][i]"$( dosomething "$begin" );$end"[/i][/green]
[b]done[/b] [teal]<[/teal] file1[teal].[/teal]txt

The above works in Bash, Dash, MKsh.

Feherke.
[link feherke.github.com/]

http://feherke.github.com/

[/url]

theo67 · Jul 26, 2012

Hi feherke,

i think you are right... i am a newbie so i have to learn a lot!!
And i am glad everytime i get an idea from people who have experience like you!!

Thank you very much for the answer. I used it and it is exactly what i need...

Theo

theo67 · Jul 26, 2012

hmm i ralise now thats very slow...
my file is 80 MB and the script runs since 3 hours....

I tested it with i small file and it worked fine but i did not realise that it takes so long if i have my orig file...

feherke · Jul 26, 2012

Hi

I am afraid there is no much to optimize in that code. But some strategies may help.

Maybe caching ? Previously you wrote :

Theo said:
4567;9803298ß;38840 (over thousands and thousands Lines)

[gray](...)[/gray]

2 Cut the first 4 digits

Given the huge amount and the shortness of codes, is it possible the 4 digit codes to not be unique ? In this case we could run dosomething for a given code only once and save its output, then later reuse that saved output without running dosomething again.

Maybe parallelising ? Some versions of [tt]xargs[/tt] and [tt]make[/tt] are able to execute tasks in parallel. This is especially useful if dosomething has idle times during the run or you have multicore processor. But even if not, running multiple dosomething processes in the same time should help. Of course, if the order of the output matters, this becomes abit more complicated, but bearable.

So give us some details on those codes and dosomething's activity.

Feherke.
[link feherke.github.com/]

http://feherke.github.com/

[/url]

theo67 · Jul 26, 2012

Hi Feherke and thank you so much for your help!

the first field (first 4 digits) are not unique.
"dosomething" is i binary and it takes this number and calculate a new one. The new number depends allways from the input. That means e.g. "dosomething 4567" gives allways 9878 as output.

Is this what you needed to know?

PHV · Jul 26, 2012

What about this ?

Code:

awk -F';' '{
 if(!d[$1])"dosomething "$1 | getline d[$1]
 print d[$1] substr($0,5)
}' file1.txt

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

feherke · Jul 26, 2012

Hi

The simplest version :

Code:

[navy]cache[/navy][teal]=[/teal][green][i]'/tmp/dosomething.cache'[/i][/green]

[b]while[/b] [navy]IFS[/navy][teal]=[/teal][green][i]';'[/i][/green] [b]read[/b] -r begin end[teal];[/teal] [b]do[/b]
  [teal][[[/teal] -f [green][i]"$cache/$begin"[/i][/green] [teal]]][/teal] [teal]||[/teal] dosomething [green][i]"$begin"[/i][/green] [teal]>[/teal] [green][i]"$cache/$begin"[/i][/green]
  echo [green][i]"$(< "$cache/$begin" );$end"[/i][/green]
[b]done[/b] [teal]<[/teal] file1[teal].[/teal]txt

This creates separate file for each code. Fast to write, fast to read, may be slow to search, but this probably depends on the used filesystem too.

Regarding that search slowness, I would just start the script, wait until there are a few thousand files in the cache directory, then do a [tt][teal][[[/teal] -f [green]'/tmp/dosomething.cache/4567'[/green] [teal]]][/teal][/tt] ( or any other code ) from the command line and see whether it takes whole seconds. If yes, tell us. Then we will look for other storage tricks ( for example separate subdirectories based on the first character ) or alternatives ( for example SQLite database ).

One thing to note :

man bash said:
BUGS

It's too big and too slow.

If you have Ksh, use that instead. ( On Linux you will probably find the public domain ( [tt]pdksh[/tt] ) or MirOS ( [tt]mksh[/tt] ) implementation. They are also faster. )

If you have Dash, use that instead. But Dash has only what POSIX specifies, so the above code will need minor rewrite.

Feherke.
[link feherke.github.com/]

http://feherke.github.com/

[/url]

feherke · Jul 26, 2012

Hi

Thinking again, my concern was exaggerated. Even there are thousands of lines, there will be no more than 10000 code pairs. So search speed can not be an issue.

Even more, neither the storage can be an issue. I mean, while actually running dosomething was reduced to minimum, PHV's Awk code should be also fast. ( With one minor glitch : a [tt]close()[/tt] after the [tt]getline()[/tt] would avoid running out of available file handles. )

Feherke.
[link feherke.github.com/]

http://feherke.github.com/

[/url]

PHV · Jul 26, 2012

Good catch, Feherke.

Code:

awk -F';' '{
 if(!d[$1]){cmd="dosomething "$1;cmd | getline d[$1];close(cmd)
 print d[$1] substr($0,5)
}' file1.txt

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

PHV · Jul 26, 2012

OOps, sorry for the typo:

Code:

awk -F';' '{
 if(!d[$1]){cmd="dosomething "$1;cmd | getline d[$1];close(cmd)}
 print d[$1] substr($0,5)
}' file1.txt

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

theo67 · Jul 26, 2012

@phv
i get as output, my input file without the first "column"

theo67 · Jul 26, 2012

oooh sorry.. i should refresh the site bevor posting

theo67 · Jul 26, 2012

Hi PHV,
using your code, i stopped the skript after a few minutes and opened the output file. I see the whole line but without the first "column".. That's the position where the 4 digits should be...

theo67 · Jul 26, 2012

Hi feherke,

i runed for a few seconds your 2nd version and braked it. should i now type in the commandline only:

[[ -f '/tmp/dosomething.cache/4567' ]]

???

feherke · Jul 26, 2012

Hi

Theo said:
should i now type in the commandline only:

[[ -f '/tmp/dosomething.cache/4567' ]]

Yes. You will see no output, only the exit code will be set. ( [tt]echo $?[/tt] to see the exit code of the previous command. But is irrelevant now. ) The key point was to see if a simple check for the file is affected by the huge amount of filesystem entries in that directory.

But as I mentioned in the next post, given that the cache directory will never have more than 10000 files, my concern was exaggerated.

Feherke.
[link feherke.github.com/]

http://feherke.github.com/

[/url]

PHV · Jul 26, 2012

And this ?

Code:

awk -F';' '{
 if(d[$1]==""){cmd="dosomething "$1;cmd | getline d[$1];close(cmd)}
 print d[$1] substr($0,5)
}' file1.txt

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

theo67 · Jul 26, 2012

@PHV WOW 12 seconds!!!!! and the file was ready!!

A big THANKS to all for your help!!!!!!! Those are the moments where i realise all the things i can NOT do [wink]

theo67 · Jul 26, 2012

PHV is it possible to explain to me a little bit your code?
I am not sure about it...
awk -F';' Fileseparator is ; (until here ok) [blush]

But for the rest i supose what it "could" mean..

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

using an external script in awk 3

Technical User

MIS

Programmer

Technical User

Technical User

Programmer

Technical User

MIS

Programmer

Programmer

MIS

MIS

Technical User

Technical User

Technical User

Technical User

Programmer

MIS

Technical User

Technical User

Similar threads

Log in

Part and Inventory Search

Sponsor