Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Variable number of input arguments and why did I need {/strt/,/end/} 1

Status
Not open for further replies.

FedoEx

Technical User
Oct 7, 2008
49
US
I advanced a lot in awk mainly thanks to this forum.
I am posting here my latest creation and the way I arrived to it. I am asking you gurus for alternative way to do this or for possible improvements, suggestions ....

Here is the example infile

Code:
$cat  infile
#S 1   line one    contents                          
#AAAA                                                
11 13 100                                            
12 14 100                                            
13 23 111                                            
end                                                  
#S 2   line two    contents                          
# a n c                                              
21 11 222                                            
22 24 222                                            
23 45 222
end
blah blah
#S 3   line three contents
x y z
31 42 333
32 26 333
33 56 333
end
#S 4   line four   contents
more text
41 15 444
42 24 444
43 12 444
end
a z characters between the blocks
#S 5   line five    contents
# some comments here
51 32 555
52 34 555
53 12 555
end block five
i j k
#S 6   line six     contents
61 34 666
62 23 666
63 56 666
end block six
and so fourth  another thousand of these

Simple code to extract single line containing the #S regex

Code:
$cat ARG.awk
BEGIN{
ARGC--
     }
$0~("#S " ARGV[2] " "){ print $0}
If I want to see the header line for block 5
Code:
$awk -f ARG.awk infile 5
#S 5   line five    contents
So for arbitrarily number of input arguments my code evolves to.
Code:
BEGIN {
ARGCin=ARGC
while (ARGC >2) ARGC-- 
}
  {
  for (i = 2; i < ARGCin; i++)     if($0~("#S " ARGV[i] " ")){print $0}
  }

My final goal is actually to be able to print not the header lines but the entire number part of a given block.
That is why I started the thread Another way to print between two regex and here is the code

Code:
BEGIN {
ARGCin=ARGC
while (ARGC >2) ARGC-- 
}
function formprint()
{if($0!~/[f-zA-Z]|#|^$/) print   $1," ",$2," ",$3}
{
for (i = 2; i < ARGCin; i++)  
if ($0 ~ ("#S " ARGV[i] "  ") ) { s=1; next; } if ($1 ~ /end/) s=0; if (s) formprint()
}
So if I now run

Code:
awk -f ARG.awk infile 2 3 5
21   11   222                               
22   24   222                               
23   45   222                               
31   42   333
32   26   333
33   56   333
51   32   555
52   34   555
53   12   555

However some times I need to extract range of blocks so I have the user input as
Code:
awk -f ARG.awk infile  93  - 104

And here is the entire code that takes into account the range option.

Code:
#!/bin/awk -f 
BEGIN {
# print "in "ARGC
ARGCin=ARGC
while (ARGC >2) ARGC-- 
}

function formprint()
{if($0!~/[a-z]|#|^$/)  printf("%1.5f %1.8f %1.5e \n",$1,$2,$3)}

ARGCin>2&&ARGV[3]~/-/{
if ($0 ~ ("#S " ARGV[2] "  ") ) { s=1; next; } if ($0 ~ ("#S " ARGV[4] "  ") ) s=0; if (s) formprint()
if ($0 ~ ("#S " ARGV[4] "  ") ) { i=1; next; } if ($1 ~ /end/)                i=0; if (i) formprint()
}

ARGV[3]!~/-/  {
for (i = 2; i < ARGCin; i++)  
if ($0 ~ ("#S " ARGV[i] "  ") ) { s=1; next; } if ($1 ~ /end/) s=0; if (s) formprint()
}
Thank you feherke, PHV, Annihilannic, LKBrwnDBA and the rest of you for the help.
 
I am stuck again.
Consider this code being used with the same infile
Code:
cat getnum.awk 
 #!/bin/awk -f 
BEGIN {
ARGCin=ARGC
while (ARGC >2) ARGC-- 
}

$0~("#S " ARGV[2] " "),/end/ {
 if($1!~/[f-zA-Z]|#|^$/)  printf("%1.5f %1.8f %1.5e \n",$1,$2,$3) >"data_"ARGV[2]
}
$0~("#S " ARGV[3] " "),/end/ {
 if($1!~/[f-zA-Z]|#|^$/) printf("%1.5f %1.8f %1.5e \n",$1,$2,$3)  >"data_"ARGV[3]
}
$0~("#S " ARGV[4] " "),/end/ {
 if($1!~/[f-zA-Z]|#|^$/) printf("%1.5f %1.8f %1.5e \n",$1,$2,$3)  >"data_"ARGV[4]
}
$0~("#S " ARGV[5] " "),/end/ {
 if($1!~/[f-zA-Z]|#|^$/) printf("%1.5f %1.8f %1.5e \n",$1,$2,$3)  >"data_"ARGV[5]
}
$0~("#S " ARGV[6] " "),/end/ {
 if($1!~/[f-zA-Z]|#|^$/) printf("%1.5f %1.8f %1.5e \n",$1,$2,$3)  >"data_"ARGV[6]
}

If I run it like this
Code:
./getnum.awk infile 2 3 5
will have the number part of data blocks 2 3 5 written in separate files
data_2 data_3 data_5
That code works for me and gives me the outputs I need. For most of my practical needs I don't need to extract data for more than four blocks.
I've been trying to rewrite that code in a neater way using
Code:
for(i=2;i<ARGCin;i++)
loop and redirecting the output to
Code:
>"data_"ARGV[[red]i[/red]]
.
So far unsuccessfully.
Can you help.
 
You need to remember that the way awk works is with an implicit "for every line in the input file(s) do ... " around all of your expressions (excluding the BEGIN clause of course), so you can't just put a for loop around them as the input is consumed on the first iteration.

I would do something like this:

Code:
#!/bin/awk -f

BEGIN {
        ARGCin=ARGC
        while (ARGC>2) {
                ARGC--
                wanted[ARGV[ARGC]]
        }
        section=0
}

/end/ { section=0 }

/#S [0-9]+ / && ($2 in wanted) { section=$2 }

section && !/[f-zA-Z]|#|^$/ {
        printf("%1.5f %1.8f %1.5e \n",$1,$2,$3) >"data_"section
}

The "wanted" sections are initially assigned to an array. The section variable is set to 0 unless we are currently processing a section that is in the wanted array, so it's being used both as a flag (whether to print the line or not) and to hold the current section number for inclusion in the output filename.

Annihilannic.
 
Thanks.
That is great.
I will try to rewrite my original code in a similar fashion.
 
One thing I didn't explain very well... I didn't exactly assign the wanted sections to the wanted[] array... I used them as indexes to the array. I could have assigned an arbitrary value, e.g. wanted[ARGV[ARGC]]=1 for example, however we would never actually have used the value assigned to it, just the index for the ($2 in wanted) check. So I've found that if you just mention the array index without assigning anything to it, it's enough to create it. Kinda weird!

Annihilannic.
 
Can you explain this line
Code:
/#S [0-9][red]+[/red] / && ($2 in wanted) { section=$2 }
I know that it turns on the section switch and assigns value to section to be used in the name of the output file later.
What that plus after the regex list means?
Is it possible to use something like this instead,...
Code:
("#S " ($2 in wanted))  { section=$2 }
 
+ is just part of the regular expression language. It means "the preceding expression, one or more times", so in this case, one or more digits.

/#S [0-9]+ / on its own is exactly equivalent to $0 ~ /#S [0-9]+ /. By default awk matches an expression against the entire input line.

You can't insert ($2 in wanted) because that returns a boolean (true or false, or 0 or 1), which is not what you want to match the input string against.

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top