Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

regex in AWK ! 1

Status
Not open for further replies.

dickiebird

Programmer
Feb 14, 2002
758
GB
Hi Guys
Can't get to grips with regex in AWK !
I'm sure someone will enlighten me.
I want to print only Uppercase or numeric strings
from my output (4 chars min, up to 6 chars max) .
I'm doing awk '/[A-Z][0-9]/ {print}'
but that allows through lowercase and non-printables too.
What should it be ?
TIA

Dickie Bird
Honi soit qui mal y pense
 
dickie,
as always - examples pls!

Here's what I have:
nawk -f upperString.awk upperString.txt
Code:
#--------- upperString.awk
/[A-Z]|[0-9]/

#--------- upperString.txt
237846 2384
abc defgh kl
XYZ FOO BAR
vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Hi Vlad
This is what I get from strings command.
I don't want lowercase or punctuation chars :

testnnie: /data > strings -6 filenam | head
c80250
55353
03760
192856
05265
c05282
+73967
1403783
181292
162073

testnnie: /data > strings -6 filenam|awk '/[A-Z]|[0-9]/ {print} ' | head
c80250
55353
03760
192856
05265
c05282
+73967
1403783
181292
162073

AWK seems to have no effect.
Any thoughts ? Dickie Bird
Honi soit qui mal y pense
 
here's my &quot;thought&quot; based on your input ;)
not sure if &quot;signed&quot; interger considered an &quot;integer&quot;

BEGIN {
pat=&quot;^([A-Z]+|[-+]*[0-9]+)$&quot;

}

pat
vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Hi Vlad & BigOldBulldog.
Many thanks for your efforts to date.
Perhaps I should give you the whole picture.
I'm trying to obtain blocks of 16 chars from a huge and corrupt binary index file.
There are several blocks of various lengths, of unprintable stuff that I want to ignore.
(Possibly index control data)
Running od on existing file (this is one contiguous chunk of data, with no record delimiter.) gives :-
ÿÿÿ......é.....ÿÿÿÿéééééééÿÿÿééÿÿÿÿ....~..be1 .xUB4000....ÿÿÿÿÿÿÿÿÿÿÿÿéééééééééé....ááááááá~..ééá .+432110....é.....ééééééééééééééé..ÿÿééééá etc. etc..
The data blocks I need to trap are highlighted above.
The first 10 bytes of each required block are any ascii character.
The last 6 chars in the required block are alphanumeric or spaces.
Originally I wrote a c program to attempt it - but with no joy.
Jamisar suggested using strings -6 filename which nearly works.
I thought maybe to use strings -td -6 to get the start position of the string, then subtract 10
in order to get to the start of the block of mixed printable/non-printable characters.

strings -td -6 xah |tail -100
1881670 403280
1881693 1403486
1881717 'LG56
1881741 1KM03
1881765 cUL08
1881789 1UK68
1881813 cQQ45
1881837 1402538
1881862 402537
1881885 c402539
1881909 1402537
1881933 1QH07
1881957 +403280
1881982 403280
1882005 YQH07
1882030 UP01
1882053 cUH81
1882077 1KM03
2268178 92569
2268202 85911
2268225 175175
2268249 119950
2268273 195977

Unless anyone can think of a neater way ?
TIA ;-)
Dickie Bird
Honi soit qui mal y pense
 
dbird
you still are speaking about 'index recreation'
in c-forum.
i dont believe you can solve this with awk
your file is too big for this.
perl and c can do it.
tomorrow, actually not on my unix box, i'll
post a 'c' version.

PS: perl comes to late for me :(
-----------
when they don't ask you anymore, where they are come from, and they don't tell you anymore, where they go ... you'r getting older !
 

#include <stdio.h>

/***REM put here the list of startpos reported by strings ***/
long all[]={ 123
, 345
, 678
, 0
};

int main(int argc, char **argv)
{
long pos;
int nnn;

FILE *fd = fopen(&quot;filename&quot;,&quot;r&quot;);
FILE *out = fopen(&quot;output&quot;,&quot;w&quot;);

for(pos = 0; all[pos]; ++pos){
lseek(fd,all[pos]-10,SEEK_SET);
for(nnn = 0; 16 >nnn; ++nnn) putc(getc(fd),out);
}
fclose(fd);
fclose(out);
exit(0);
}
-----------
when they don't ask you anymore, where they are come from, and they don't tell you anymore, where they go ... you'r getting older !
 
Thanks Jamisar - still have a problem in that the
array 'all[]' would need about 5 million start positions.
I'll have to load it from the output of 'string -td -6'
(which I'd have to read into in small blocks?)
I'm off work at the moment ( a head cold ) - I'll try it on Monday!
Danke
Dickie Bird
Honi soit qui mal y pense
 
dbird
no problems, use files, look at man 'atol'
should be able to convert ' 123 abc' to 123
strings -opts filename | your c-programm
----------------------------
#include <stdio.h>
#define MaxBuff 64
int main(int argc, char **argv)
{
long pos;
int nnn;
extern long atol(char *);
char buff[MaxBuff+1];

FILE *fd = fopen(&quot;filename&quot;,&quot;r&quot;);
FILE *out = fopen(&quot;output&quot;,&quot;w&quot;);

while(fgets(buff,MaxBuff,stdin)){
if(10 >(pos = atol(buff))) continue; /* or break */
lseek(fd,pos-10,SEEK_SET);
for(nnn = 0; 16 >nnn; ++nnn) putc(getc(fd),out);
}
fclose(fd);
fclose(out);
exit(0);
}
-----------
when they don't ask you anymore, where they are come from, and they don't tell you anymore, where they go ... you'r getting older !
 
Hi Jamisar.
Your efforts have resulted in success - Thanks !
But now I'm being difficult.
There are sometimes high values (FF or 255) in the 1st 4 bytes that I don't want.
I have broken your putc(getc(fd),out) to :
int cha;
.
.
cha=getc(fd);
if( ! (nnn<4 && cha=='ÿ')) putc(cha,out);

The example is giving me :-
ksh: 11542 Memory fault
What should I have coded ?
TIA ;-) Dickie Bird
Honi soit qui mal y pense
 
hi dbird
i don't see problems in this fragment, i don't think
this cause the mem-fault
post more.
ps:
personnaly i prefer:
for(....){
if(nnn< 4 && cha == 255) continue;
putc(cha,out);
} -----------
when they don't ask you anymore, where they are come from, and they don't tell you anymore, where they go ... you'r getting older !
 
Rats, the problem was one of permissions.
On Thursday I ran the job as root.
Today I didn't.
The test diectory was rwxr-xr-x !!!!
Why don't I ever learn ?
Thanks for the guidance ;-) Dickie Bird
Honi soit qui mal y pense
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top