Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Remove commas when inside quotes

Status
Not open for further replies.

adcm

MIS
Oct 21, 1998
2
0
0
US
I need to remove commas which are contained in between quotes. The following<br>
comma seperated file<br>
<br>
"ADSCB","","","Sean C. Berry","Shyster, Shylock, Shlomb & Shyew","New York,<br>
New York","","","","","267-408-7883","267-408-1566","233-478-2000","233 478<br>
2466",26,6,0,1,0,1,"10/19/1998","08:19"<br>
<br>
Should look like this:<br>
<br>
"ADSCB","","","Sean C. Berry","Shyster Shylock Shlomb & Shyew","New York New<br>
York","","","","","267-408-7883","267-408-1566","233-478-2000","233 478<br>
2466",26,6,0,1,0,1,"10/19/1998","08:19"<br>
<br>
I know I should be using sed, but I cant seem to get it going??<br>
<br>
<br>
Carlo Mauro<br>
cmauro@wlrk.com<br>
WLRK
 
Try the following, it will work in korn and c shell. Create a file, say comma_remove.csh for instance, and add this line of text: -<br>
cat temp1 ¦ sed -e "s/,/./g" &gt; temp2 - where temp1 is the input filename and temp2 the output filename. Note don't forget to chmod the file to 777 or 111 etc so it is executable. Issue the following command from the command line ./comma_remove.csh and the commas will be replaced by full stops. It does work as I have just tried it out. If you want to replace the commas with something else then overwrite the /./ with whatever you want, if it's spaces you want then use / / or for nothing use //.
 
I dont wish to remove all commas. Just the ones which fall in between quotes.<br>
<br>
i.e. "New York, NY" should be interpreted as only one field.<br>
<br>
The real hard part is the fact that not all fileds are enclosed in quotes!
 
sed 's/\"\(.*\),\(.*\)\"/"\1 \2"/' /tmp/in_file&gt;/tmp/out_file<br>
<br>
The above will work only if there is 1 , within the quotes
 
sed 's/\"\(.*\),\(.*\)\"/"\1 \2"/' /tmp/in_file&gt;/tmp/out_file<br>
<br>
The above will work only if there is 1 , within the quotes, you can<br>
add as many , as you like by modifying the above line
 
Here is a solution for 'sed'. You will need<br>
to create a sed-script file, "clean.sed", which contains:<br>
<br>
#n<br>
# add a trailing , to make parsing easier<br>
/[^,]$/s/$/,/<br>
<br>
# unless /..."aaa",/ convert /...aaa,/ to /...aaa#/<br>
# repeat until done<br>
<br>
:z<br>
s/^([^",]*),/\1#/<br>
tz<br>
<br>
# convert /..."aaa,bbb",/ to /..."aaabbb",/<br>
# repeat until done<br>
<br>
:a<br>
s/^([^",]*)("[^",]*),([^"]*"),/\1\2\3,/<br>
ta<br>
<br>
# convert /..."aaa",/ to /...@aaa@#/<br>
<br>
s/^([^",]*)"([^"]*)",/\1@\2@#/<br>
<br>
# repeat until no more /,/ or /"/ remain<br>
/[,"]/tz<br>
<br>
# get rid of trailing # [ was added at the start ]<br>
s/.$//<br>
<br>
# convert @ back to ", and # back to ,<br>
s/@/"/g<br>
s/#/,/g<br>
<br>
p<br>
<br>
You will have to invoke sed as follows:<br>
<br>
sed -E -f clean.sed sourcefile &gt; destfile<br>
<br>
I do wonder about the removal of commas within a quoted field. A (name) field with a value like "Mouse, Mickey" would be converted to "Mouse Mickey", which might not be what was intended. If the purpose of removing the commas within quoted fields is to allow one to parse the file, I would recommend you look at the Perl Cookbook for how to parse "Comma Separated (CSV) Files". It shows how to split a line into fields on the 'correct' commas.<br>
<br>
-- Derek
 
Forgot to including in my previous reply...<br>
<br>
The sed script is converting , to # and " to @, from left to right. # and @ need to be some character that does not appear elsewhere in the file. If I were using this for 'real data', I would probably use &lt;CTRL-C&gt; and &lt;CTRL-Q&gt; instead of # and @.<br>
<br>
-- D
 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Here is a short nawk/gawk program that will do what<br>you want (remove all commas from inside double quotes):<br><br>Usage: thisfile input output &lt;CR&gt;<br><br>Where: thisfile = this program saved in a file and made<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;executeable ( chmod +x thisfile )<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;input = input file&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;output = output file<br><br><br>nawk 'BEGIN{FS=&quot;&quot;}<br><br>{<br>&nbsp;&nbsp;&nbsp;&nbsp;gsub(/,\ /,&quot;\ &quot;)<br>&nbsp;&nbsp;&nbsp;&nbsp;print<br><br>}' $1 &gt; $2<br><br>flogrr
 
This does the trick: -<br>cat temp1 ¦ sed -e &quot;/\&quot;[A-Z].*\,.*\&quot;s/,/ /g&quot; <p>Ged Jones<br><a href=mailto:gedejones@hotmail.com>gedejones@hotmail.com</a><br><a href= > </a><br>
 
Woops Sorry, forget the last message it was a load of rubbish.<br>This will work, 1st Create a command file temp.sed with the following instructions: -<br>s/\&quot;,\&quot;/XXX/g;<br>:a&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>s/,/ /g&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>ta&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>s/XXX/\&quot;,\&quot;/g;<br>This converts &quot;,&quot; to XXX ( you can use whatever string you like ) then replaces any remaining commas with spaces, then converts the substitute string back to &quot;,&quot;. <br>The command to run this is: -<br>sed -f temp.sed infile &gt; outfile <p>Ged Jones<br><a href=mailto:gedejones@hotmail.com>gedejones@hotmail.com</a><br><a href= > </a><br>
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top