Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

ASCII File Data Validation 2

Status
Not open for further replies.

gixmono

Technical User
Jan 16, 2006
6
MX
Good morning gurus

Mi name is Raul and I'm trying to write a simple script to validate the data
within a huge ascii file.

The data within the file looks like this:


Function160 599644.00 2160206.00 7759.0000 4897.0000
Function160 599644.00 2160206.00 7843.0000 4915.0000
Function160 599644.00 2160206.00 8000.0000 4951.0000
#
Function240 600242.00 2159405.00 0.0000 1534.0000
Function240 600242.00 2159405.00 132.0000 1653.0000
Function240 600242.00 2159405.00 190.0000 1646.0000

I'm tring to write a RegExp to do the validation. I'm using egrep but I'd
rather prefer to use awk but I'm newbie and don't know how.

Anyway, this is what I've got:

egrep '[A-Za-z][0-9]\{12\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file

And is "almost" working. But the first part, the one that should match Function120 (might be Function2367 too) is not
working.

Any help and/or suggestion would be appreciated.

Thank you very much in advance !!!
 
You may try to replace this:
\{12\}
with this:
\{11,\}

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Thanks PVH but it doesn't work.

I've tried:
egrep '[A-Za-z][0-9] *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z].[0-9]. *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]+[0-9]+ *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]*[0-9]* *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file

egrep '[A-Za-z][0-9]\{12\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z].[0-9].\{12\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]+[0-9]+\{12\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]*[0-9]*\{12\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file

egrep '[A-Za-z][0-9]\{11,\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z].[0-9].\{11,\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]+[0-9]+\{11,\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]*[0-9]*\{11,\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file

And nothing.

Again, Thank you very much !!
 
And this ?
awk 'NF==5' file

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
As I need to load this data into a DataBase I need to be sure that every field of every record has a valid value.

For Example:
This would be ok:
Function160 599644.00 2160206.00 7759.0000 4897.0000
Function160 599644.00 2160206.00 7843.0000 4915.0000

But not this:
Function160 599644.00 2160206.00 7759.0000 4897.0000
Function160 599644.00 00 14 4915.0000
 
Try this...
Code:
grep '^[A-Za-z0-9]* *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}$' file

Hope this helps.
 
Are you sure the whitespace between the fields are only spaces? If not, use '[ ]*' instead of ' *' between the different field REs (type [<space><tab>])

other possibilities to classify your char patterns:
[<space><tab>]* ==> [[:space:]]*
[0-9]* ==> [[:digit:]]*
[A-Z]* ==> [[:upper:]]*
[a-z]* ==> [[:lower:]]*
[a-zA-Z]* ==> [[:alpha:]]*
[A-Za-z0-9]* ==> [[:alnum:]]*

So:

grep '^[[:space:]]*[[:upper:]][[:lower:]]*[[:digit:]]*[[:space:]]*[[:digit:]]\{6\}\.[[:digit:]]\{2\}[[:space:]]*[[:digit:]]\{7\}\.[[:digit:]]\{2\}[[:space:]]*[[:digit:]]\{4\}\.[[:digit:]]\{4\}[[:space:]]*[[:digit:]]\{4\}\.[[:digit:]]\{4\}[[:space:]]*$' file

(no newlines or spaces typed - just one big string)

Granted, it doesn't get any shorter, but I've also allowed for whitespace at begin and end of line - not sure if you allow it or not - your call. Also if "FunctionNNN" is always "Function", followed by a number, you can go

grep '^Function[[:digit:]]*...'

or

grep '^[[:space:]]Function[[:digit:]]*...'

Also, not sure if your grep allows these char classes or not, perhaps your egrep does?


HTH,

p5wizard
 
gixmono,

as your goal is to load this data into a database:
May I suggest that you just try to load the data, and check the rejected lines?
Then correct the rejected lines, delete the loaded data, and try once again.
So you may let the database do all the checking for you, and imho this will be much easier for you.
At least this is the way I would go with with Oracle's SQL*loader. You did not tell us which database you are using; but I presume similar loading features should be available for you as well.

regards
 
Thank you very much guys, I finally got it:

Here it is:
grep '^ *Function[0-9]\{1,\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{1,4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\} *$' file

Now my challenge is finding a way to validate that the fields 4 and 5 are always increasing.

I really appreciate your help.

P.D. Hoinz, thank you very much for your suggestion but the data is loaded through a third party application and I can't use SQL to do this even though the database is Oracle.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top