Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

CSV files parsing error

Status
Not open for further replies.

stones1030

Programmer
May 30, 2006
15
US
I was using the Text::CSV module to parse a CSV file with 1000 plus records. 4 of them failed. The failed rows are
<code>
011-00-HSCM,Hospital Services – Component Manufacturing,DEFAULT,011-00,4050 Lindell Blvd,St. Louis,MO,63108,USA
011-00-HSCR,Hospital Services – Component Release,DEFAULT,011-00,4050 Lindell Blvd,St. Louis,MO,63108,USA
011-00-HSQC,Hospital Services – Quality Control Lab / Apheresis Manufacturing,DEFAULT,011-00,4050 Lindell Blvd,St. Louis,MO,63108,USA
090-03-09,ETS -  Operations,DEFAULT,090-03,8111 Gatehouse Rd,Falls Church,VA,22042,USA
</code>
 
What do you mean by failed? Did you get an error? If so, what was it? What happened that shouldn't have? Or what didn't happen that should have?
 
sorry. acidentally clicked 'submit post' instead of 'preview post'.
...continued from previous post

If I delete the - character and replace it with a hyphen, the file is parsing without errors. The error message I get is
Failed to parse line: 011-00-HSCM,Hospital Services û Component Manufact
uring,DEFAULT,011-00,4050 Lindell Blvd,St. Louis,MO,63108,USA

Failed to parse line: 011-00-HSCR,Hospital Services û Component Release,
DEFAULT,011-00,4050 Lindell Blvd,St. Louis,MO,63108,USA

Failed to parse line: 011-00-HSQC,Hospital Services û Quality Control La
b / Apheresis Manufacturing,DEFAULT,011-00,4050 Lindell Blvd,St. Louis,MO,63108,
USA

Failed to parse line: 090-03-09,ETS -á Operations,DEFAULT,090-03,8111 Ga
tehouse Rd,Falls Church,VA,22042,USA

Can someone see why the original - does not work?It looks liek the CSV module is getting some special characters in place of the original - character.

Thanks,
 
The type of dash character there doesn't appear to be part of the character set that the (rather limited) Text::CSV module can handle in the CSV fields. I usually recommend the Text::CSV_XS module instead. This works:
Code:
#!/usr/bin/perl -w
use strict;

use Text::CSV_XS;

my $csv = new Text::CSV_XS( {binary => 'TRUE'} );

while(<DATA>) {
   $csv->parse($_);
   print join ' ', $csv->fields(), "\n";
}

__DATA__
011-00-HSCM,Hospital Services – Component Manufacturing,DEFAULT,011-00,4050 Lindell Blvd,St. Louis,MO,63108,USA
011-00-HSCR,Hospital Services – Component Release,DEFAULT,011-00,4050 Lindell Blvd,St. Louis,MO,63108,USA
011-00-HSQC,Hospital Services – Quality Control Lab / Apheresis Manufacturing,DEFAULT,011-00,4050 Lindell Blvd,St. Louis,MO,63108,USA
090-03-09,ETS -  Operations,DEFAULT,090-03,8111 Gatehouse Rd,Falls Church,VA,22042,USA
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top