Hi there
There is a file with many records. the records have structure with tags. there is a special tag , that have the information. from it, that I am trying to do, is taking the first field, some information of the 3 and 4 fields, using sed. the separator of this fields is an "*".
log_xx.xml
<msg time='2016-11-03T05:52:36.591-05:00' org_id='oracle' comp_id='tnslsnr'
type='UNKNOWN' level='16' host_id='tgestion1'
host_addr='xx.xx.xx.xx'>
<txt>03-NOV-2016 05:52:36 * (CONNECT_DATA=(SID=catrman2)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=xx.xx.xx.xx)
(PORT=57902)) * establish * catrman2 * 12505
</txt>
</msg>
the first field contain the date.
the second field contain a information like a SERVICE_NAME,PROGAM,HOST,USER
the third field contain information PROTOCOL, HOST, PORT
Iam using sed and awk. It function fine. i guess it may be improved, but it does not function in only using sed.
file.sed
#!/bin/sed -f
/<txt*/,/<\/txt>/ {
s/<txt>//g ; s/<\/txt>//g
p
}
sed -n -f ${DIR}file.sed < log_778.xml log_779.xml log_780.xml log_781.xml log_782.xml log_783.xml log_784.xml log_785.xml log.xml | gawk -F"*" ' BEGIN {}
NF==6&&$0!~/TIMESTAMP/{
sub("^ +","",$1)
fecha=substr($1,1,14)
data[1]=$2
data[2]=$3
k=split(data[1],ar,"[()]")
for ( i=1;i<=k;i++ ){
if( ar~/PROGRAM=/ && !(ar in d) ){
d[ar]
sub(/PROGRAM=/, "" , ar)
aa[1]=ar
}
if( ar~/HOST=/ && !(ar in d)){
d[ar];sub(/HOST=/, "" , ar)
aa[2]=ar
}
if( ar~/USER=/ && !(ar in d)){
d[ar];sub(/USER=/, "" , ar)
aa[3]=ar
}
if( ar~/SERVICE_NAME=/ && !(ar in d)){
d[ar];sub(/SERVICE_NAME=/, "" , ar)
aa[4]=ar
}
}
for (l in d) delete d[l]
k=split(data[2],ar,"[()]")
for ( i=1;i<=k;i++ ){
if( ar~/PROTOCOL=/ && !(ar in d) ){
d[ar];sub(/PROTOCOL=/, "" , ar)
aa[5]=ar
}
if( ar~/HOST=/ && !(ar in d)){
d[ar];sub(/HOST=/, "" , ar)
aa[6]=ar
}
if( ar~/PORT=/ && !(ar in d)){
d[ar];sub(/PORT=/, "" , ar)
aa[7]=ar
}
}
for (l in d) delete d[l]
print fecha";"aa[1]";"aa[2]";"aa[3]";"aa[4]";"aa[5]";"aa[6]";"aa[7]
}END{} ' | cut -d";" -f1,3,7 | sort | uniq -c
output
------
count date Hour host IP
86 01-NOV-2016 00;__jdbc__;xx.xx.xx.xx
322 01-NOV-2016 00;__jdbc__;xx.xx.xx.xx
222 01-NOV-2016 00;__jdbc__;xx.xx.xx.xx
2 01-NOV-2016 00;__jdbc__;xx.xx.xx.xx
68 01-NOV-2016 00;xxxxxxxx;xx.xx.xx.xx
12 01-NOV-2016 01;xxxxxxxx;xx.xx.xx.xx
I am trying to do the same with sed, the issue is that i have not able to do it.
I have a problems with the H,G and x variables. Finally i want to print the information find out per record, in one line. If the information is not found it, it must to pint ;;.
initial programm
myprogramm.sed
/<txt>/,/<\/txt>/{
s/^.*<txt>//g ; s/<\/txt>//g
/.* \* .* \* .* \* .* \* .* \* .*/ {
#/([^(|^)]*)/{
h
s/\(.*\) \* .* \* .* \* .* \* .* \* .*/\1/p
G
s/.* \* .*\((SERVICE_NAME[^(|^)]*)\).* \* .* \* .* \* .* \* .*/\1/p
G
s/.* \* .*\((PROGRAM[^(|^)]*)\).* \* .* \* .* \* .* \* .*/\1/p
G
s/.*\((HOST[^(|^)]*)\).* \* .* \* .* \* .* \* .*/\1/p
G
s/.*\((USER[^(|^)]*)\).* \* .* \* .* \* .* \* .*/\1/p
G
s/.* \* .* \* .*\((PROTOCOL[^(|^)]*)\).* \* .* \* .* \* .*/\1/p
G
s/.* \* .* \* .*\((HOST[^(|^)]*)\).* \* .* \* .* \* .*/\1/p
G
s/.* \* .* \* .*\((PORT[^(|^)]*)\).* \* .* \* .* \* .*/\1/p
G
#}
}
}
I would like this output, using sed
==================================
03-NOV-2016 14:22:10 ;(PROGRAM=) ;(HOST=__jdbc__); (USER=oracle) ; (PROTOCOL=tcp) ;(HOST=10.81.203.19) ; (PORT=44390)
Thanks a lot for your comments
Malpa