Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

splitting with an embedded delimiter

Status
Not open for further replies.

maurella

Technical User
Mar 12, 2002
32
0
0
US
I am trying to parse a file in a csv format where the fields are separated by commas and the field contents are quoted, ala:

"f1","f2","f3","f4"

Works most of the time with this:
@line=split(/,/,$_);

Now, I find a file that has an embedded comma in one of the fields, ala:

"f1","f2","f3,a","f4"
^^^

which throws this routine into a snit. While I look elsewhere, I am asking here if anyone can post a routine to handle this.

Thanks,

miker
 
This is one of those tasks where you aways seem one step away from writing your own solution - it seems so simple that common sense says it can't be hard to grow your own.

If you are in ay doubt about the wisdom of Isnid's advice, have a look at the source of Text::CSV to see how hard it is to get it right.

Yours,

fish

["]As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.["]
--Maur
 
The Text::CSV module did the trick.
Thanks
 
Code:
#!/usr/bin/perl

while (<DATA>) {
  @line = m/"([^"]+)"/g;
  print join(" | ", @line);
  print "\n";
}

__DATA__
"f1","f2","f3","f4"
"f1","f2","f3,a","f4"

Kind Regards
Duncan
 
Dunc, that's absolutely fine until you get an embedded quote, such as [tt]"f1","f2,a","f3\"b"[/tt].

There's always a "solve the problem presented by this data" fix but there's always going to be new data. If you want your app to run forever without problems, you need to solve them all in advance. With Text::CSV we've got a fair chance that that's been done for us but, unless a home-grown solution encompasses the complexity of the existing module, it's always going to be a time-bomb, waiting to annoy you when some new data comes in.

I like the way that you've solved the presented problem but know that the problem presented is not always the best problem to solve. Although the OP is asking "how do I cope with embedded delimiters in CSVs?", we ought to be answering "how do I cope with arbitrary CSVs?".

Yours,

fish

[&quot;]As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.[&quot;]
--Maur
 
Hi fish

what about something like this?

Code:
#!/usr/bin/perl

while (<DATA>) {
  s/\\"//g; @line = m/"([^"]+)"/g; s//\"/g foreach @line;
  print join(" | ", @line);
  print "\n";
}

__DATA__
"f1","f2","f3","f4"
"f1","f2","f3,a","f4"
"f1","f2,a","f3\"b"

outputs:-

Code:
f1 | f2 | f3 | f4
f1 | f2 | f3,a | f4
f1 | f2,a | f3"b

Kind Regards
Duncan
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top