Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

awk fields 3

Status
Not open for further replies.

mrn

MIS
Apr 27, 2001
3,993
GB
Hi,

I have a text file in the following format

field1,field2,"fie,ld,3",field4,"fie,ld,5"

I need to grab numerous fields from the file, but am struggling to find a way to protect the field-seperator if it appears in quotes.

E.g

awk -F, '{print $3}'

I need "fie,ld,3"

and not

fie

any ideas?

Mike

"Whenever I dwell for any length of time on my own shortcomings, they gradually begin to seem mild, harmless, rather engaging little things, not at all like the staring defects in other people's characters."
 
Oh and the fields are variable length.....

Mike

"Whenever I dwell for any length of time on my own shortcomings, they gradually begin to seem mild, harmless, rather engaging little things, not at all like the staring defects in other people's characters."
 
Hi

Personally I think this is out of AWK's scope and I would use a Perl or Ruby script instead, taking advantage of one of their CSV modules.
Code:
[blue]master #[/blue] ruby -rcsv -ne 'puts CSV.parse($_)[0][2]' < mrn-sample.csv
fie,ld,3

Feherke.
 
A starting point:
sed 's!"\([^,"]*\),\([^,"]*\),\([^"]*\)"!"\1;\2;\3"!g' /path/to/input | awk -F, '{x=$3;gsub(/;/,",",x);print x}'

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Interesting problem... here are two methods I came up with, basically the same algorithm implemented two different ways:


Code:
awk -F'[red]"[/red][purple]' '
        {
                print NR,$0
                if ($0 ~ /^[/purple][red]"[/red][green]/) { i=2 } else { i=1 }
                if ($0 ~ /[/green][red]"[/red][purple]$/) NF--
                for(;i<=NF;i++) {
                        if (i%2) {
                                sub([/purple][red]"[/red],[blue]$"[/blue],[red]"[/red][purple][/purple][red]"[/red],[blue]$i[/blue])
                                [b]sub[/b]([red]"[/red][purple]^,[/purple][red]"[/red],[red]"[/red][purple][/purple][red]"[/red],[blue]$i[/blue])
                                n=[b]split[/b]([blue]$i[/blue],a,[green]/,/[/green])
                                [olive]for[/olive] (j=1;j<=n;j++) [b]print[/b] a[j]
                        } [olive]else[/olive] {
                                [b]print[/b] [blue]$i[/blue]
                        }
                }
        }
' inputfile


Code:
awk '
        {
                [b]print[/b] [blue]NR[/blue],[blue]$0[/blue]
                remain=[blue]$0[/blue]
                [olive]while[/olive] ([b]match[/b](remain,[green]/\"[^"]*\"/[/green])) {
                        [olive]if[/olive] ([blue]RSTART[/blue]>1) {
                                pre=[b]substr[/b](remain,1,[blue]RSTART[/blue]-1)
                                [b]sub[/b]([red]"[/red][purple]^,[/purple][red]"[/red],[red]"[/red][purple][/purple][red]"[/red],pre)
                                [b]sub[/b]([red]"[/red][purple],$[/purple][red]"[/red],[red]"[/red][purple][/purple][red]"[/red],pre)
                                n=[b]split[/b](pre,a,[green]/,/[/green])
                                [olive]for[/olive] (i=1;i<=n;i++) { [b]print[/b] a[i] }
                        }
                        [b]print[/b] [b]substr[/b](remain,[blue]RSTART[/blue],[blue]RLENGTH[/blue])
                        remain=[b]substr[/b](remain,[blue]RSTART[/blue]+[blue]RLENGTH[/blue])
                }
                [olive]if[/olive] (remain) {
                        [b]sub[/b]([red]"[/red][purple]^,[/purple][red]"[/red],[red]"[/red][purple][/purple][red]"[/red],remain)
                        n=[b]split[/b](remain,a,[green]/,/[/green])
                        [olive]for[/olive] (i=1;i<=n;i++) { [b]print[/b] a[i] }
                }
        }
' inputfile

Currently they just print out the individual fields, but you could of course just assign them to an array instead for later processing. I'd be curious to see shorter, more efficient versions if anyone has ideas!

Annihilannic.
 
What about this ?
Code:
awk -F, '{
for(i=j=1;i<=NF;++j){
  a[j]=$(i++)
  if(a[j]~/^".*[^"]$/)
    do a[j]=a[j]","$i
    while($(i++)!~/"$/)
}
for(i=1;i<j;++i)print i,a[i]
}' inputfile

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Thanks for the star from the Frozen Depths of Hell !
 
Thanks People,

I'll spent some time dissecting the examples, very helpful (As normal)


Mike

"Whenever I dwell for any length of time on my own shortcomings, they gradually begin to seem mild, harmless, rather engaging little things, not at all like the staring defects in other people's characters."
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top