Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Field Delimited Flat File Processing

Status
Not open for further replies.

SmileeTiger

Programmer
Mar 13, 2000
200
US
I have a flat file of the following format:
field1: data1.1 field2: data2.1 field4: data4.1
field1: data1.2 field3: data3.2 field5: data5.2

That I want to convert to a delimited file so that I can import it into a spreadsheet with the following format:
field1 field2 field3 field4 field5
data1.1 data2.1 data4.1
data1.2 data3.2 data5.2

(Where there are tabs between columns)
I know the column headers that are present in the flat file ahead of time.

Does anyone have a good way to do this in perl?
 
post some of the real data in the file you wish to convert. It can be done, but your sample data looks too contrived to give more than a generic answer.
 
Here is a sample from my real data file:
title: Harry Potter and the Sorcerer's Stone (Book 1) author: J.K. Rowling pub-date: 1998 format: hardcover
title: Harry Potter and the Sorcerer's Stone (Book 1) author: J.K. Rowling pub-date: 1998 format: paperback publisher: Arthur A. Levine Books

The order of the header fields that I want would be:
title author pub-date format publisher

If a field is missing from the data file then it should be skipped.
 
Are all the field names a single word (a string containing no white space) immediately followed by a colon?
 
This should give you a starting point:

Code:
#!/usr/bin/perl
use strict;

my @fields = ("title","author","pub-date","format","publisher");
my %fields_hash = ("title",1,"author",1,"pub-date",1,"format",1,"publisher");

while (<>) {
	my $string = $_;
	my (@pairs) = split(/:/, $string);
	my @sequence = ();
	for (my $i=0; $i <= $#pairs; $i++) {
		if (($i==0) || ($i==$#pairs)) {
			push(@sequence,$pairs[$i]);
			next;
		}
		my (@twos) = $pairs[$i] =~ m/(.*) (.*)/;
		if ($fields_hash{"$twos[1]"} == 1) {
			push(@sequence,(@twos));
		} else {
			$#sequence += 1;
			$sequence[$#sequence] = $pairs[$i];
		}
		
	}

	for (@sequence) {
		if ($fields_hash{"$_"} == 1) {
			print "\n$_: ";
		} else {
			print "$_";
		}
	}
}

data I tested for is:
Code:
title: Harry Potter and the Sorcerer's Stone (Book 1) author1: J.K. Rowling pub-date: 1998 format: hardcover
title: Harry Potter and the Sorcerer's Stone (Book 1) author: J.K. Rowling pub-date: 1998 format: paperback publisher: Arthur A. Levine Books

Output:
Code:
title:  Harry Potter and the Sorcerer's Stone (Book 1) author1 J.K. Rowling
pub-date:  1998
format:  hardcover
 
title:  Harry Potter and the Sorcerer's Stone (Book 1)
author:  J.K. Rowling
pub-date:  1998
format:  paperback publisher Arthur A. Levine Books

You can do printing as your wish.

VaRaKal
 
Just noticed I missed publisher.
Make the following change:

Code:
my %fields_hash = ("title",1,"author",1,"pub-date",1,"format",1,"publisher",1);
 
Here, see how this works for you. What is the output of this going to be used for? This code delimites the fields with pipe symbols, it's easy to change it to a tab or whatever characters you want.

If you want everything to be in nicely spaced columns, it will take a bit more work.

Code:
my @headers = qw/title author pub-date format publisher/;
my $delim = '|';
print join($delim, @headers), "\n";

while (<DATA>) {
    # Assumes book records always have a title
    if (index($_, 'title:') > $[-1) {
        my %hash;
        while (/(\S+):\s*(.+?)(?=(\s*\S+:)|(\s*$))/g) {
            $hash{$1} = $2;
        }
        print join($delim, map {$hash{$_} || ""} @headers), "\n";
    }
}
 
Thanks so much for the help that does exactly what I need.


I am using the script to parse a dump from an old book data storage system that I have and I am importing it into a spreadsheet so I can work with the data a little more easily. The old application is pretty clunky :(.

My plan is to eventually replace the spreadsheet with a database and writing a small application around the data.

Thanks again!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top