Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Print key-value pairs 1

Status
Not open for further replies.

Zahier

MIS
Oct 3, 2002
97
ZA
Hello techies

I am trying to print key-value pairs specifically:
the date and time and the values for HOST and USER

This is the sample input text.

[pre] <txt>20-JUL-2015 07:58:22 * (CONNECT_DATA=(SID=ORADB3)(CID=(PROGRAM=perl)(HOST=winserver5)(USER=oem))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.10)(PORT=12345)) * establish * ORADB3 * 0
<txt>20-JUL-2015 07:58:38 * (CONNECT_DATA=(SID=ORADB4)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.10)(PORT=12345)) * establish * ORADB4 * 0
<txt>20-JUL-2015 08:01:09 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ZAHIER))(SERVICE_NAME=ORADB6)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ZAHIER))) * (ADDRESS=(PROTOCOL
=tcp)(HOST=10.10.10.10)(PORT=12345)) * establish * ORADB6 * 0[/pre]

I am working on Solaris 10 and tried using sed & awk (see below) to remove the text that I DO NOT want - but this is clearly inefficient.

Code:
grep HOST log.xml | sed 's/\*/ /g' | sed 's/(/ /g' | sed 's/)/ /g' | sed 's/<txt>/ /g' | sed 's/CONNECT_DATA= SERVICE_NAME=/ /g' | sed 's/CONNECT_DATA= SID=/ /g'  | sed 's/CONNECT_DATA=/ /g' | sed 's/CID=/ /g' | sed 's/ADDRESS=/ /g' | sed 's/PROTOCOL=tcp/ /g' | sed 's/PORT=.*/ /g' | tr -s ' '

How do I extract just the fields that I need?
 
Zahier said:
How do I extract just the fields that I need?
You can use the function match() like this:
zahier.awk
Code:
{
  [COLOR=#0000ff]# chomp every line[/color]
  chomp()
  [COLOR=#0000ff]# match TIME[/color]
  [COLOR=#008080]match[/color]([COLOR=#6a5acd]$0[/color][COLOR=#6a5acd],[/color] [COLOR=#ff00ff]/([[/color][COLOR=#804040][b]0-9[/b][/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]+[/color][COLOR=#ff00ff]:[[/color][COLOR=#804040][b]0-9[/b][/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]+[/color][COLOR=#ff00ff]:[[/color][COLOR=#804040][b]0-9[/b][/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]+[/color][COLOR=#ff00ff])/[/color][COLOR=#6a5acd],[/color] time)
  [COLOR=#804040][b]printf[/b][/color] [COLOR=#ff00ff]"TIME = [/color][COLOR=#6a5acd]%s[/color][COLOR=#ff00ff]"[/color][COLOR=#6a5acd],[/color] time[[COLOR=#6a5acd]1[/color]]
  [COLOR=#0000ff]#[/color]
  str = [COLOR=#6a5acd]$0[/color]
  [COLOR=#0000ff]# match HOST-USER pairs[/color]
  [COLOR=#804040][b]while[/b][/color] ([COLOR=#008080]match[/color](str[COLOR=#6a5acd],[/color] [COLOR=#ff00ff]/[/color][COLOR=#6a5acd]\([/color][COLOR=#ff00ff]HOST=([^[/color][COLOR=#6a5acd])[/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]*[/color][COLOR=#ff00ff])[/color][COLOR=#6a5acd]\)\([/color][COLOR=#ff00ff]USER=([^[/color][COLOR=#6a5acd])[/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]*[/color][COLOR=#ff00ff])[/color][COLOR=#6a5acd]\)[/color][COLOR=#ff00ff]/[/color][COLOR=#6a5acd],[/color] hu) > [COLOR=#ff00ff]0[/color]) {
    [COLOR=#804040][b]printf[/b][/color] [COLOR=#ff00ff]", HOST='[/color][COLOR=#6a5acd]%s[/color][COLOR=#ff00ff]', USER = '[/color][COLOR=#6a5acd]%s[/color][COLOR=#ff00ff]'"[/color][COLOR=#6a5acd],[/color] hu[[COLOR=#6a5acd]1[/color]][COLOR=#6a5acd],[/color] hu[[COLOR=#6a5acd]2[/color]]
    str = [COLOR=#008080]substr[/color](str[COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RSTART[/color] + [COLOR=#6a5acd]RLENGTH[/color])
  }
  [COLOR=#0000ff]# match HOST-PORT pairs[/color]
  [COLOR=#804040][b]while[/b][/color] ([COLOR=#008080]match[/color](str[COLOR=#6a5acd],[/color] [COLOR=#ff00ff]/[/color][COLOR=#6a5acd]\([/color][COLOR=#ff00ff]HOST=([^[/color][COLOR=#6a5acd])[/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]*[/color][COLOR=#ff00ff])[/color][COLOR=#6a5acd]\)\([/color][COLOR=#ff00ff]PORT=([^[/color][COLOR=#6a5acd])[/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]*[/color][COLOR=#ff00ff])[/color][COLOR=#6a5acd]\)[/color][COLOR=#ff00ff]/[/color][COLOR=#6a5acd],[/color] hp) > [COLOR=#ff00ff]0[/color]) {  
    [COLOR=#804040][b]printf[/b][/color] [COLOR=#ff00ff]", HOST='[/color][COLOR=#6a5acd]%s[/color][COLOR=#ff00ff]', PORT = '[/color][COLOR=#6a5acd]%s[/color][COLOR=#ff00ff]'"[/color][COLOR=#6a5acd],[/color] hp[[COLOR=#6a5acd]1[/color]][COLOR=#6a5acd],[/color] hp[[COLOR=#6a5acd]2[/color]]
    str = [COLOR=#008080]substr[/color](str[COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RSTART[/color] + [COLOR=#6a5acd]RLENGTH[/color])
  }
  [COLOR=#804040][b]printf[/b][/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color]
}

[COLOR=#0000ff]#[/color]
[COLOR=#804040][b]function[/b][/color] chomp() {
  [COLOR=#0000ff]# strip out the carriage return or line feed at the end of current line[/color]
  [COLOR=#0000ff]# the function modifies global variable $0 (current line)[/color]
  [COLOR=#008080]sub[/color]([COLOR=#ff00ff]/[/color][COLOR=#6a5acd]\r[/color][COLOR=#6a5acd]$[/color][COLOR=#ff00ff]/[/color][COLOR=#6a5acd],[/color] [COLOR=#ff00ff]""[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]$0[/color])
  [COLOR=#008080]sub[/color]([COLOR=#ff00ff]/[/color][COLOR=#6a5acd]\n[/color][COLOR=#6a5acd]$[/color][COLOR=#ff00ff]/[/color][COLOR=#6a5acd],[/color] [COLOR=#ff00ff]""[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]$0[/color])
}

When I run the script on the given file I get this output:
Code:
$ gawk -f zahier.awk zahier.txt

TIME = 07:58:22, HOST='winserver5', USER = 'oem', HOST='10.10.10.10', PORT = '12345'
TIME = 07:58:38, HOST='__jdbc__', USER = '', HOST='10.10.10.10', PORT = '12345'
TIME = 08:01:09, HOST='__jdbc__', USER = 'ZAHIER', HOST='__jdbc__', USER = 'ZAHIER', HOST='10.10.10.10', PORT = '12345'
 
Hello mikrom

With awk on Solaris I get...
awk: syntax error near line 3
awk: illegal statement near line 3

and if I use /usr/xpg4/bin/awk I get...
line 5 (NR=1): wrong number of arguments to function ""

Unfortunately I do not have gawk. Solaris has awk and nawk, which does not seem to have the match function.
BUT this is still a great script for when I do port to Linux!!



 
I have tried it with GNU Awk 3.1.7 which comes with MinGW/MSYS on Windows.
... maybe it would be possible to install gawk on Solaris too...
 
It seems that there is big difference between several awk versions
I tried my script on IBM iSeries where I have two awk versions and got these errors:

1. from a native awk version (probably an old version which doesnt have the swich --version)
Code:
> awk -f zahier.awk zahier.txt                                                   
   Syntax Error The source line is 9.                                            
   The error context is                                                          
                    match($0, >>>  /([0-9]+:[0-9]+:[0-9]+)/, <<<                 
   awk: 0602-502 The statement cannot be correctly parsed. The source line is 9. 
   Syntax Error The source line is 14.                                           
          awk: 0602-543 There are 2 extra ) characters.

2. from GNU Awk 3.0.3
Code:
> gawk -f zahier.awk zahier.txt                              
  gawk: zahier.awk:9: fatal: [highlight]match() cannot have 3 arguments[/highlight]

... then I redesigned the script so it calls the function match() only with 2 arguments (and not 3):

Now the script work on my IBM iSeries too:
Code:
> awk -f zahier.awk zahier.txt                                                                       
  TIME=07:58:22, HOST=winserver5, USER=oem, HOST=10.10.10.10, PORT=12345                             
  TIME=07:58:38, HOST=__jdbc__, USER=, HOST=10.10.10.10, PORT=12345                                  
  TIME=08:01:09, HOST=__jdbc__, USER=ZAHIER, HOST=__jdbc__, USER=ZAHIER, HOST=10.10.10.10, PORT=12345

Maybe you awk problem is similar.
Here is the modified script - you can try it:
Code:
{
  [COLOR=#0000ff]# chomp every line[/color]
  chomp()
  str = [COLOR=#6a5acd]$0[/color]
  [COLOR=#0000ff]# match TIME[/color]
  [COLOR=#804040][b]if[/b][/color] ([COLOR=#008080]match[/color](str[COLOR=#6a5acd],[/color] [COLOR=#ff00ff]/[[/color][COLOR=#804040][b]0-9[/b][/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]+[/color][COLOR=#ff00ff]:[[/color][COLOR=#804040][b]0-9[/b][/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]+[/color][COLOR=#ff00ff]:[[/color][COLOR=#804040][b]0-9[/b][/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]+[/color][COLOR=#ff00ff]/[/color])) {
    line = [COLOR=#ff00ff]"TIME="[/color] [COLOR=#008080]substr[/color](str[COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RSTART[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RLENGTH[/color])
    str = [COLOR=#008080]substr[/color](str[COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RSTART[/color] + [COLOR=#6a5acd]RLENGTH[/color])
  }
  [COLOR=#0000ff]# match HOST-USER pairs[/color]
  [COLOR=#804040][b]while[/b][/color] ([COLOR=#008080]match[/color](str[COLOR=#6a5acd],[/color] [COLOR=#ff00ff]/[/color][COLOR=#6a5acd]\([/color][COLOR=#ff00ff]HOST=[^[/color][COLOR=#6a5acd])[/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]*[/color][COLOR=#6a5acd]\)\([/color][COLOR=#ff00ff]USER=[^[/color][COLOR=#6a5acd])[/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]*[/color][COLOR=#6a5acd]\)[/color][COLOR=#ff00ff]/[/color]) > [COLOR=#ff00ff]0[/color]) {
    hu = [COLOR=#008080]substr[/color](str[COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RSTART[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RLENGTH[/color])
    line = line hu
    str = [COLOR=#008080]substr[/color](str[COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RSTART[/color] + [COLOR=#6a5acd]RLENGTH[/color])
  }
  [COLOR=#0000ff]# match HOST-PORT pairs[/color]
  [COLOR=#804040][b]while[/b][/color] ([COLOR=#008080]match[/color](str[COLOR=#6a5acd],[/color] [COLOR=#ff00ff]/[/color][COLOR=#6a5acd]\([/color][COLOR=#ff00ff]HOST=([^[/color][COLOR=#6a5acd])[/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]*[/color][COLOR=#ff00ff])[/color][COLOR=#6a5acd]\)\([/color][COLOR=#ff00ff]PORT=([^[/color][COLOR=#6a5acd])[/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]*[/color][COLOR=#ff00ff])[/color][COLOR=#6a5acd]\)[/color][COLOR=#ff00ff]/[/color]) > [COLOR=#ff00ff]0[/color]) {  
    hp = [COLOR=#008080]substr[/color](str[COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RSTART[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RLENGTH[/color])
    line = line hp
    str = [COLOR=#008080]substr[/color](str[COLOR=#6a5acd],[/color] [COLOR=#6a5acd]RSTART[/color] + [COLOR=#6a5acd]RLENGTH[/color])
  }
  [COLOR=#0000ff]# replace parentheses with commas[/color]
  [COLOR=#008080]gsub[/color]([COLOR=#ff00ff]/[/color][COLOR=#6a5acd]\([/color][COLOR=#ff00ff]/[/color][COLOR=#6a5acd],[/color] [COLOR=#ff00ff]", "[/color][COLOR=#6a5acd],[/color] line)
  [COLOR=#008080]gsub[/color]([COLOR=#ff00ff]/[/color][COLOR=#6a5acd]\)[/color][COLOR=#ff00ff]/[/color][COLOR=#6a5acd],[/color] [COLOR=#ff00ff]""[/color][COLOR=#6a5acd],[/color] line)
  [COLOR=#0000ff]# print resulting line[/color]
  [COLOR=#804040][b]printf[/b][/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]%s[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color][COLOR=#6a5acd],[/color] line
}


[COLOR=#0000ff]#[/color]
[COLOR=#804040][b]function[/b][/color] chomp() {
  [COLOR=#0000ff]# strip out the carriage return or line feed at the end of current line[/color]
  [COLOR=#0000ff]# the function modifies global variable $0 (current line)[/color]
  [COLOR=#008080]sub[/color]([COLOR=#ff00ff]/[/color][COLOR=#6a5acd]\r[/color][COLOR=#6a5acd]$[/color][COLOR=#ff00ff]/[/color][COLOR=#6a5acd],[/color] [COLOR=#ff00ff]""[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]$0[/color])
  [COLOR=#008080]sub[/color]([COLOR=#ff00ff]/[/color][COLOR=#6a5acd]\n[/color][COLOR=#6a5acd]$[/color][COLOR=#ff00ff]/[/color][COLOR=#6a5acd],[/color] [COLOR=#ff00ff]""[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]$0[/color])
}
 
Hi

microm said:
It seems that there is big difference between several awk versions
They warned you.

man gawk said:
[pre]
GNU EXTENSIONS

[gray](...)[/gray]

The following features of gawk are not available in POSIX awk.

[gray](...)[/gray]

· The optional third argument to the match() function.[/pre]


Feherke.
feherke.ga
 
mikrom
Great! It works with...

/usr/xpg4/bin/awk

So feherke, in other words you're saying it's variants of the same commands complying with different standards.

Thanks again, I really appreciate the help.
 
feherke said:
They warned you.

(man gawk) said:
GNU EXTENSIONS
(...)
The following features of gawk are not available in POSIX awk.
(...)
· [highlight]The optional third argument to the match() function.[/highlight]
feherke, thanks for the explanation.
Now I know that in the gawk ver. 3.0.3 the function match() had only 2 arguments and in the version 3.1.7 the 3. argument is possible.

Zahier said:
Great! It works with...
Ok, but while I was trying to make that example, I always wanted you to ask, if you don't have Perl on your Solaris machine?
 
I do have perl, I have not dabbled in perl scripting...yet.

perl, v5.8.4 built for sun4-solaris-64int
 
IMO ver. 5.8.x should be ok.
Maybe next time it would be worth to try.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top