ASCII File Data Validation 2

gixmono · Jan 16, 2006

Good morning gurus

Mi name is Raul and I'm trying to write a simple script to validate the data
within a huge ascii file.

The data within the file looks like this:

Function160 599644.00 2160206.00 7759.0000 4897.0000
Function160 599644.00 2160206.00 7843.0000 4915.0000
Function160 599644.00 2160206.00 8000.0000 4951.0000
#
Function240 600242.00 2159405.00 0.0000 1534.0000
Function240 600242.00 2159405.00 132.0000 1653.0000
Function240 600242.00 2159405.00 190.0000 1646.0000

I'm tring to write a RegExp to do the validation. I'm using egrep but I'd
rather prefer to use awk but I'm newbie and don't know how.

Anyway, this is what I've got:

egrep '[A-Za-z][0-9]\{12\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file

And is "almost" working. But the first part, the one that should match Function120 (might be Function2367 too) is not
working.

Any help and/or suggestion would be appreciated.

Thank you very much in advance !!!

PHV · Jan 16, 2006

You may try to replace this:
\{12\}
with this:
\{11,\}

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886

gixmono · Jan 16, 2006

Thanks PVH but it doesn't work.

I've tried:
egrep '[A-Za-z][0-9] *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z].[0-9]. *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]+[0-9]+ *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]*[0-9]* *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file

egrep '[A-Za-z][0-9]\{12\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z].[0-9].\{12\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]+[0-9]+\{12\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]*[0-9]*\{12\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file

egrep '[A-Za-z][0-9]\{11,\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z].[0-9].\{11,\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]+[0-9]+\{11,\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file
egrep '[A-Za-z]*[0-9]*\{11,\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}' file

And nothing.

Again, Thank you very much !!

PHV · Jan 16, 2006

And this ?
awk 'NF==5' file

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886

gixmono · Jan 16, 2006

As I need to load this data into a DataBase I need to be sure that every field of every record has a valid value.

For Example:
This would be ok:
Function160 599644.00 2160206.00 7759.0000 4897.0000
Function160 599644.00 2160206.00 7843.0000 4915.0000

But not this:
Function160 599644.00 2160206.00 7759.0000 4897.0000
Function160 599644.00 00 14 4915.0000

SamBones · Jan 16, 2006

Try this...

Code:

grep '^[A-Za-z0-9]* *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\}$' file

Hope this helps.

p5wizard · Jan 16, 2006

Are you sure the whitespace between the fields are only spaces? If not, use '[ ]*' instead of ' *' between the different field REs (type [<space><tab>])

other possibilities to classify your char patterns:
[<space><tab>]* ==> [[:space:]]*
[0-9]* ==> [[:digit:]]*
[A-Z]* ==> [[:upper:]]*
[a-z]* ==> [[:lower:]]*
[a-zA-Z]* ==> [[:alpha:]]*
[A-Za-z0-9]* ==> [[:alnum:]]*

So:

grep '^[[:space:]]*[[:upper:]][[:lower:]]*[[:digit:]]*[[:space:]]*[[:digit:]]\{6\}\.[[:digit:]]\{2\}[[:space:]]*[[:digit:]]\{7\}\.[[:digit:]]\{2\}[[:space:]]*[[:digit:]]\{4\}\.[[:digit:]]\{4\}[[:space:]]*[[:digit:]]\{4\}\.[[:digit:]]\{4\}[[:space:]]*$' file

(no newlines or spaces typed - just one big string)

Granted, it doesn't get any shorter, but I've also allowed for whitespace at begin and end of line - not sure if you allow it or not - your call. Also if "FunctionNNN" is always "Function", followed by a number, you can go

grep '^Function[[:digit:]]*...'

or

grep '^[[:space:]]Function[[:digit:]]*...'

Also, not sure if your grep allows these char classes or not, perhaps your egrep does?

HTH,

p5wizard

hoinz · Jan 16, 2006

gixmono,

as your goal is to load this data into a database:
May I suggest that you just try to load the data, and check the rejected lines?
Then correct the rejected lines, delete the loaded data, and try once again.
So you may let the database do all the checking for you, and imho this will be much easier for you.
At least this is the way I would go with with Oracle's SQL*loader. You did not tell us which database you are using; but I presume similar loading features should be available for you as well.

regards

gixmono · Jan 17, 2006

Thank you very much guys, I finally got it:

Here it is:
grep '^ *Function[0-9]\{1,\} *[0-9]\{6\}\.[0-9]\{2\} *[0-9]\{7\}\.[0-9]\{2\} *[0-9]\{1,4\}\.[0-9]\{4\} *[0-9]\{4\}\.[0-9]\{4\} *$' file

Now my challenge is finding a way to validate that the fields 4 and 5 are always increasing.

I really appreciate your help.

P.D. Hoinz, thank you very much for your suggestion but the data is loaded through a third party application and I can't use SQL to do this even though the database is Oracle.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

ASCII File Data Validation 2

gixmono

Technical User

PHV

MIS

gixmono

Technical User

PHV

MIS

gixmono

Technical User

SamBones

Programmer

p5wizard

IS-IT--Management

hoinz

MIS

gixmono

Technical User

Similar threads

Part and Inventory Search

Sponsor