Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

-- Need some help in generalizing the script --

Status
Not open for further replies.

r9ronaldo

Programmer
Jun 19, 2003
28
US
All,

I have an issue, Any input from your experience is highly regarded.

I have a .csv file ( in which some data inside the fields is also seperated by a comma ), I need to get a pipedelimited falt file,

So If I am doing a , It's substituting all comma "," with a delimter "|"
BEGIN{}
{
if(NR != 1)
{
gsub(",","|")
}
}
END {}

So on a UNIX box when you see the .csv file, I need to replace only the comma (",") between the fields, not the one inside the double quotes.

Name, Age, Location,Phone
RK, 29, "LAS VEGAS,Nevada",410-xxx-xxxxx^M
KB, 31, "PHeonix, Arizona",602-xxx-xxxx^M

As of Now I have designed an awk script which read this csv file and spits out what I need, the output looks like this,

Name| Age| Location|Phone
RK| 29| "LAS VEGAS,Nevada"|410-xxx-xxxxx^M
KB| 31| "PHeonix, Arizona"|602-xxx-xxxx^M

But this script has a limitation, in the case it works on only those csv files which can have ony one field that can have comma seperated data, In other words, if the csv file looks like this, It will not work

Name, Age, Location,Phone
"RK,Rivaldo", 29, "LAS VEGAS,Nevada",410-xxx-xxxxx^M
I am looking to generalize the script, nomatter what it should replace only those commas "," that seperate the fields, no the ones inside the double quotes

The script I am using now is

BEGIN{}
{
gsub(",","|")
gsub("^M","")
if(NR != 1){
value = index($0,"\"")
if(value > 0){
before_val = substr($0,1,value)
after_val = substr($0,value+1)
value2 = index(after_val,"\"")
subs_val = substr(after_val,1,value2)
subs_val_2 = substr(after_val,value2+1)
gsub(/\|/,",",subs_val)
#sub(/\|/,",",after_val)
total_val = before_val subs_val subs_val_2
gsub(/\"/," ",total_val)
print total_val > output_file
}
else {
gsub(/\"/," ")
print $0>>output_file
}
}
}
END{}



Any input on this Highly apprecitaed,

Thank you for all your time

R9ronaldo







 
Try this
Code:
#!/usr/bin/awk -f
BEGIN {
  # any char which does not appear in your csv files
  # you can make it printable (eg "!") for testing
  protector = "\1"
}

{
  line = $0
  print line    # just for testing, delete for use

  # this guards all commas within quotes
  # find quoted strings containing commas
  while ( match(line,/\"[^\",]*,[^\",]*\"/) ) {
    a = substr(line,0,RSTART-1)
    b = substr(line,RSTART,RLENGTH)
    c = substr(line,RSTART+RLENGTH)
    sub(",",protector,b) # 'hide' the comma for now...
    line = a b c
  }

  # replace all the other commas
  gsub(",","|",line)

  # restore the commas between quotes
  gsub(protector,",",line)
  
  # print the new line
  print line
}

I get these (before and after) results
Code:
RK, 29, "LAS VEGAS,Nevada",410-xxx-xxxxx
RK| 29| "LAS VEGAS,Nevada"|410-xxx-xxxxx
KB, 31, "PHeonix, Arizona",602-xxx-xxxx
KB| 31| "PHeonix, Arizona"|602-xxx-xxxx
"RK,Rivaldo", 29, "LAS VEGAS,Nevada",410-xxx-xxxxx
"RK,Rivaldo"| 29| "LAS VEGAS,Nevada"|410-xxx-xxxxx

--
 
Or perhaps
{
iqot = 0
a = ""
for (i=1;i<=length;i++) {
c = substr($0,i,1)
if (c=="\"") iqot = 1-iqot
else if (c=="," && !iqot) c="|"
a = a c
}
print a
}

CaKiwi
 
Thanks a Lot Salem && Cakiwi,

I tried both of your approaches, Initially Both seem to work, I was just trying test it out completely, Please don't think other wise, However when I tried to work on another test csv file, I got wrong set of results for Salem's Code,

Name,Age,Location,Phone^M
Name|Age|Location|Phone^M
"RK,Rivaldo", 29, "LAS VEGAS,Nevada",410-xxx-xxxxx^M
"RK,Rivaldo"| 29| "LAS VEGAS,Nevada"|410-xxx-xxxxx^M
"RK,B,ATK,WUYTH","29,32,78","MExico,Braxil,Gautemala","222-222-2222,234-897-6799"^M
"RK|B|ATK|WUYTH","29|32|78","MExico|Braxil|Gautemala","222-222-2222,234-897-6799"^M
"UU,YY.KKa,UU,IIDF,OPGHSFSTS,TUBAGAJAJAJAJAJA","23,22,21,25,26,27","USA,CANDAA,MEXICO,UTOPIA,UGANDS","222-222-2222,675-098-7866,987-098-7654,098-908-7676"^M
"UU|YY.KKa|UU|IIDF|OPGHSFSTS|TUBAGAJAJAJAJAJA","23|22|21|25|26|27","USA|CANDAA|MEXICO|UTOPIA|UGANDS","222-222-2222|675-098-7866|987-098-7654|098-908-7676"^M

The first two sets seem right, but the next two doen't seem right.

I once again would like to thank you for sharing your knowledge,

Thank you

R9ronaldo

Forza Inter Forza Nerazzuri
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top