Perl colon-delimited text file reading & parsing

Argonath · Mar 14, 2005

I have a text file (Microsoft Access Locked database (.LDB) file that I am able to read and display as a single stream using the short script as follows:

-----------------------------------------------------------
#!/usr/local/bin/perl/

open(INFILE,'TEST.LDB') || die("Could not open file!");
open(OUTFILE,"+>outfile");

while ($len=sysread(INFILE,$buf,2048))
{
print $buf,"\n";
}

close(INFILE);
close(OUTFILE);
-----------------------------------------------------------

the .LDB file contains records in the form of "Field:Element", and the first four fields are headers that are to be deleted or separated from the rest of the file. The variable $buf above contains a single stream of these "Field:Element" combinations.

How can I format and parse this file out in a form like:

Hdr1: Data
Hdr2: Data
Hdr3: Data
Hdr4: Data

Field1: Data
Field2: Data
Field3: Data
Field4: Data
Field5: Data

Field1: Data
Field2: Data
Field3: Data
Field4: Data
Field5: Data

etc. (Basically, how can I convert this single output stream into something in a readable format)

Thanks for your help.

MikeLacey · Mar 14, 2005

Argonath,

That's not enough info about the format of the data you're getting from the .ldb, could you give us a bit more pls perhaps with some example data?

Mike

http://www.google.com/tsunami_relief.html

shows ways to help with Tsunami Relief.

You cannot really appreciate Dilbert unless you've read it in the
original Klingon.

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

chazoid · Mar 15, 2005

ldb files are in the format:

Code:

43 4F 4D 50 55 54 45 52 4E 41 4D 45 30 31 00 20 ; COMPUTERNAME01. 
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 ;                 
55 53 45 52 4E 41 4D 45 20 20 20 30 31 00 00 00 ; USERNAME   01...
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................

then it continues with the next computer name.

I think you could read the entire file into a string, then split on \x00+ to get each computername/user pair into an array, then split each element of the array on a single \x00 to divide the computername/username. I'd post an example, but I don't have time at the moment. Hopefully that will get you started

Argonath · Mar 16, 2005

chazoid,

do you mean something like this -

-----------------------------------------------------------

#!/usr/local/bin/perl -w

#this command opens the .ldb file

open(odbfile_in,"+<ANYFILE.LDB") || die("Could not open file!");

# this section contains properties of the open .odb file

my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks)=stat(odbfile_in);

print "\nfile size (bytes) =",$size,"\n";

sysread(odbfile_in,$buf,$size);

@first_array=split(/x00/,$buf);
@second_array=split(/x00+/,@first_array);
print @second_string;

close(odbfile_in);[/color blue]
-----------------------------------------------------------

Notice that I made the buffer size equal to the $size variable to make sure that the entire file contents are a single stream.

MikeLacey,

As to example data (output), if I were to print $buf immediately after reading the file in (after the sysread command, just doing a print $buf), it would look like this (X=integer;white spaces occur exactly as indicated in the example and the field names are in boldface - the first three field names are file headers and are output only one time):

Version:X.X-XXXSource File: Filename >Date/Time: MM-DD-YY at HH:MM:SS NextField: TextCount: Integer NextField: TextNextField: Text AddressField: 2-byte hexadecimal valueCount: Integer Type:Integer ... (repeats in sequence minus the Version, File, Date/Time headers)[/color red]

Note that the field names all end with a colon, i.e., the file is colon-delimited and of the form fieldname:value

I hope this clarifies my post a little bit. The particular .ldb file that I am working on was extracted from an object file (.odb) created by Ada.

mikevh · Mar 16, 2005

Is ANYFILE.LDB just a text file? (You make reference to a "text file" in the subject of your original post.) What does it look like if you open it up in an editor? If you can do that and it looks "human readable," paste a representative sample inside [ignore]

Code:

[/ignore] tags. (If you don't know about [ignore]

Code:

[/ignore] tags, click on Process TGML below the box where you type your posts for info.

Argonath · Mar 16, 2005

Opening the .LDB file in notepad gives a header, whitespace, field, whitespace, data, whitespace, as follows:

HeaderVersion:[/color red]X.X-XXXSource File:[/color red] Filename >Date/Time:[/color red] MM-DD-YY at HH:MM:SSNextField:THIS_IS_WHAT_IT_LOOKS_LIKE TextCount:0NextField: 123456NextField: THIS_IS_WHAT_IT_LOOKS_LIKE AddressField: ABCDECount:0 Type:123456

The red fields are headers (never repeated) and the black fields are the repeated fieldname:value.

Sorry, this is the best that I can do as far as an example goes - I am not able to post an example of this particular file and don't have time to completely recreate an example data file that would be of greater use.

Thanks for the help anyway. It's basically a matter of writing the correct format string or using a DBI module that will already do the formatting. There is no fixed length for the field names except for maybe a maximum length and no set number of whitespace characters; also there are multiple data types (hex, integer, character) which makes composing a single format string (i.e., printf("%20c%5s%40f\n",$file_contents)[/color blue]) very tedious at best.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Perl colon-delimited text file reading & parsing

Argonath

Technical User

MikeLacey

MIS

chazoid

Technical User

Argonath

Technical User

mikevh

Programmer

Argonath

Technical User

Similar threads

Part and Inventory Search

Sponsor