Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

preserving quoted strings 2

Status
Not open for further replies.

komodo168

Technical User
Feb 16, 2005
1
US
I have a file with COMMA seperated fields but with embedded COMMA inside a field ex.
Jim,"20 main street",555-1313,60
Tom,"10 washington, Madison, NY-100010",555-1212,26

I want to examine the content of each field and take action but I don't want awk to split on embedded COMMAs

cat file | awk -F"," '{print $2}'

Will I get
"20 main street"
"10 washington

I really want to get (i.e. ignore embedded COMMA)
"20 main street"
"10 washington, Madison, NY-100010"

Thanks
 
You may try this:
awk -F"," '
$2~/^"/{i=3;while($2!~/"$/)$2=$2","$(i++)}
{print $2}
' file

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
you really cannot do it in the most general case, but........
here're a couple of useful link with threads on the similar subject:

Code:
[URL unfurl="true"]http://www.google.com/groups?selm=8bpou1$6ae$1@nnrp1.deja.com[/URL]
[URL unfurl="true"]http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&safe=off&th=b4130c62957f01bd&seekm=bc4va0$p7r$1@newsg4.svr.pol.co.uk#link20[/URL]
[URL unfurl="true"]http://groups.google.com/groups?q=CSV+group:comp.lang.awk&hl=en&lr=&ie=UTF-8&selm=7iatj8$11l$1@nnrp03.primenet.com&rnum=2[/URL]

Hope you'll find it useful.

P.S. sorry, the the local version of TGML does not like loooong URLs (within the [link] directive).


vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Looks like a csv file. Here's a complete csv parser I wrote in Awk recently:
Code:
BEGIN \
{
  while ( get_rec( rec ) )
  {
    # print CSV_STR
    printf "["
    sep = ""
    for (i=1;i in rec; i++)
    { printf "%s<%s>", sep, rec[i]
      sep = ", "
    }
    print "]" 
  }
}

function parse_csv( str, array,    field,i )
{ split( "", array )
  str = str ","
  while ( match(str,
    /[ \t]*("[^"]*(""[^"]*)*"|[^,]*)[ \t]*,/) )
  { field = substr( str, 1, RLENGTH )
    gsub( /^[ \t]*"?|"?[ \t]*,$/, "", field )
    gsub( /""/, "\"", field )
    array[++i] = field
    str = substr( str, RLENGTH + 1 )
  }
}

# Handles records that contain linefeeds.
function get_rec( rec, file   , result,line,str)
{ do
  { if (file)
      result = getline line <file
    else
      result = getline line
    if ( result < 1 )
    { if ( length(str) )
      { print "The csv file is malformed." >"/dev/stderr"
        exit 1
      }
      else
        return 0
    }
    str = str line "\n"
  } # Loop until number of quotes is even.
  while ( gsub( /"/, "\"", str ) % 2 )
  CSV_STR = substr( str, 1, length(str) - 1)
  parse_csv( CSV_STR, rec )
  return 1
}
With this input

John,Doe,120 jefferson st.,Riverside, NJ , 08075
Jack,McGinnis,220 hobo Av.,Phila, PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Riverside,NJ ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
,Blankman,,SomeTown, SD, 00298,
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123

the output is

[<John>, <Doe>, <120 jefferson st.>, <Riverside>, <NJ>, <08075>]
[<Jack>, <McGinnis>, <220 hobo Av.>, <Phila>, <PA>, <09119>]
[<John "Da Man">, <Repici>, <120 Jefferson St.>, <Riverside>, <NJ>, <08075>]
[<Stephen>, <Tyler>, <7452 Terrace "At the Plaza" road>, <SomeTown>, <SD>, <91234>]
[<>, <Blankman>, <>, <SomeTown>, <SD>, <00298>, <>]
[<Joan "the bone", Anne>, <Jet>, <9th, at Terrace plc>, <Desert City>, <CO>, <00123>]
 
This works with your example.
Code:
awk '{match($0, /("[^"]*")/, a);print a[1];}' file
 
I should note that the code above is actually gawk. (Accidentally lopped off first char when cut/pasting.)

Checking the man page, I see the optional 3rd param to match is a GNU extension not supported by all awks. If your awk doesn't support this:
Code:
{ 
    match($0, /"[^"]*"/)
    print substr($0, RSTART,RLENGTH)
}
 
I'm afraid that Mike's snippet won't handle

"John ""Da Man""",Repici,120 Jefferson St.,Riverside,NJ ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123

which is a perfectly standard csv file.

Quotes are allowed within fields; they simply have to be doubled.
 
That's true, it won't. I only claimed it would work with the OP's example. I never said it was a general solution for CSV files.

 
Mike, I knew that you were aware of its limitations. I wanted to make that the o.p. did. It's puzzling, though, that you posted after seeing my parser. Have you, God forbid, become another phv?
 
Mike, I knew that you were aware of its limitations. I wanted to make that the o.p. did.
OK.
It's puzzling, though, that you posted after seeing my parser.
1. Didn't see your parser before I posted. If I had, I might still have posted anyway, because
2. If the OP's data was really as simple as that posted in the example, maybe he'd have been happy with a shorter, simpler solution.
Have you, God forbid, become another phv?
Sorry, I don't understand that.


 
I'm referring to a very prolific and sloppy poster who sometimes puts up a crude, incomplete, quick and dirty solution after others have already provided superior ones.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top