Best way to save filter and save data contained in a text file

toddmanqa · Aug 20, 2008

I have scanned in a book of definitions with their explanations.

I'd like to use perl to read each line in the file, filter the line, and save the various elements to a database.

For example, there are 52 lines in one file, and here is one:

1AB \'a-'be\ n (1927) : the one of the four ABO blood groups characterized by the presence of antigens designated by the letters A and B and by the absence of antibodies against these antigens

I'd want to save it like this:

data field1: AB
data field2: 'a-'be
data field3: the one of the four...

What would be the best approach for doing this? Should I place each line in an array and then strip off the firsct character in the array(over and over)and analyze each character for spaces, \, and colons until I get to the end of the array?

Any help is appreciated.

toddmanqa · Aug 20, 2008

BTW, I OCRed it.

prex1 · Aug 20, 2008

You don't need to read the file in an array, though this is a common solution for small to medium sized files like yours: everything can be done while reading the file line by line.
Concerning the separation of fields, it all depends on the format details you can count on for recognizing each field.
For example, if the textual part is all the time separated from the rest by a semicolon, you can do [tt]($firstpart,$secondpart)=split/:/,$line;[/tt].
To recognize the first two fields you need to specify how the first part of the line is structured.E.g.:
-field1 is everything coming after a number of figures (may be zero? is there a maximum?) up to the first intervening space (is it composed of letters only? what other restrictions apply?)
-field2 is everything contained between two backward slashes

Please come back with your own specifications/descriptions for fields, your own code to start with and other examples of records (shortening the textual portion whose length doesn't matter), where we can judge on what is common to different lines, and what is not.

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

toddmanqa · Aug 21, 2008

Thanks. I will play with split and begin constructing some rules for the filtering.

stevexff · Aug 21, 2008

We can always help you with writing a regex, but it's up to you to determine what the pattern is...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Best way to save filter and save data contained in a text file

toddmanqa

Technical User

toddmanqa

Technical User

prex1

Programmer

toddmanqa

Technical User

stevexff

Programmer

Similar threads

Part and Inventory Search

Sponsor