Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

extra carriage returns within a CSV file

Status
Not open for further replies.

rshandy

Technical User
Dec 26, 2003
91
US
Hi,

I'm trying to read in a text file that's comma separated. Unfortunately, some of the data has carriage returns within the field before I get to newline and Perl thinks it an end of record.

How can I differentiate between carriage returns within the record and an "end of line" after each record?


Here's sample data:

110934,David,Johnson, 12 - hammers
1 - screw drivers

5 - box of 8d nails,december,RMA complete, not received, do not restock
110935,Ralph,Jones, 1 - CD drive,december,RMA sent,received, restock
110936,Angela,Smith, 500 socks,
1 - towel,october,RMA not sent, not received, restock

The number of fields are fixed, but the content is not.
I read the file in like this:

open (FILE, "$ccbookdb") or die "Cannot open ccbook database ($ccbookdb):$!\n";

while () {
chomp($in = <FILE>);
if ($in) {
($rma_number,$firstname,$lastname,$return_items,$date,$rma_log,$invlog,$restock_notify)=split(',',$in);



} else{ last; } # end of Database
}


I guess I could count the commas as I read in the data and ignore the carriage returns within the fields until I count up to the fixed number of commas in each record and then accept the newline but, I'm not sure how to approach that.

Any help would be much appreciated.

Rich

 
What OS are you using? And what OS was used to create the file?

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Scratch that. The server that runs our perl scripts is running windows2000. The database is made using brower submissions on WinXP machines.
 
There doesn't seem to be anything to indicate which ones are in mid-field and which are the end of the record (i.e. you don't have quotes around any of the fields that might indicate where they end). How do you decide this?
 
As well as an embedded CRLF the 'Angela Smith' record has a comma in the $return_items column ('500 socks, 1 - towel'). Without some kind of quoting scheme, this will be almost impossible to parse.

Do you have any control over the data collection process? Changing it to output in 'normal' CSV format would make your life easier.
Code:
110936,"Angela","Smith"," 500 socks, 1 - towel","october","RMA not sent","not received","restock"[red]CRLF[/red]
would be a much simpler prospect.

Text::CSV has some good utility methods for creating and parsing (properly formatted) CSV files.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top