Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Best way to save filter and save data contained in a text file

Status
Not open for further replies.

toddmanqa

Technical User
Aug 11, 2008
5
US
I have scanned in a book of definitions with their explanations.

I'd like to use perl to read each line in the file, filter the line, and save the various elements to a database.

For example, there are 52 lines in one file, and here is one:

1AB \'a-'be\ n (1927) : the one of the four ABO blood groups characterized by the presence of antigens designated by the letters A and B and by the absence of antibodies against these antigens

I'd want to save it like this:

data field1: AB
data field2: 'a-'be
data field3: the one of the four...

What would be the best approach for doing this? Should I place each line in an array and then strip off the firsct character in the array(over and over)and analyze each character for spaces, \, and colons until I get to the end of the array?

Any help is appreciated.
 
You don't need to read the file in an array, though this is a common solution for small to medium sized files like yours: everything can be done while reading the file line by line.
Concerning the separation of fields, it all depends on the format details you can count on for recognizing each field.
For example, if the textual part is all the time separated from the rest by a semicolon, you can do [tt]($firstpart,$secondpart)=split/:/,$line;[/tt].
To recognize the first two fields you need to specify how the first part of the line is structured.E.g.:
-field1 is everything coming after a number of figures (may be zero? is there a maximum?) up to the first intervening space (is it composed of letters only? what other restrictions apply?)
-field2 is everything contained between two backward slashes

Please come back with your own specifications/descriptions for fields, your own code to start with and other examples of records (shortening the textual portion whose length doesn't matter), where we can judge on what is common to different lines, and what is not.

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
Thanks. I will play with split and begin constructing some rules for the filtering.
 
We can always help you with writing a regex, but it's up to you to determine what the pattern is...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top