Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Simple split? 1

Status
Not open for further replies.

madasafish

Technical User
Jul 18, 2006
78
TH
I have a file similiar to this...

field1:wxyz field2:"aa bb ccc" field3:xxxx ...etc
field1:zyx field2:"aa bbb cc dd" field2:yyy ...etc
field1:pq field2:"aaa b" field3:zz ...etc

I want to display

field2:"aa bb ccc"
field2:"aa bbb cc dd"
field2:"aaa b"

I have tried alot of permutations of the split command but can not get the syntax right to give me the desired output.

Any help appreciated.
Thanks in advance,
Madasafish

 
Hi

A better solution, but requires GNU Awk 4.0.0 or newer :
Code:
awk '{patsplit($0,a,/field[[:digit:]]+:/,s);print a[2]s[2]}' /input/file
Changing the indexes of a and s arrays in the [tt]print[/tt] statement you can access the other fields too.

Feherke.
 
A legacy awk way:
Code:
awk '{sub(/^[^ ]* /,"");sub(/"[^"]*$/,"\"")}1' /path/to/input
A simple sed way:
Code:
sed 's!^[^ ]* \([^"]*"[^"]*"\).*!\1!' /path/to/input

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Thank-you feherke and PHV,

I over simplified the input lines in my earlier post. I "should have" made it clearer.

A closer representation of the input lines is as follows...

TIME:01:07:2004-10:00:00 TITLE:slota0~rev1 RGC: FP:blah
TIME:14:06:2004-09:00:01 TITLE:"XYZ Test Bridge_dynamic~rev8" BROADG:XYZ FP:blah
TIME:14:06:2004-09:00:01 TITLE:"Test Bridge_dynamic~rev8" BROADG:XYZ FP:blah
TIME:01:07:2004-10:00:02 TITLE:slota2~rev1 RGC:
where "blah" is a variable length text string

The closest I can get to obtaining the correct result
is....

gawk -F":" '{print $7}

The question is...
is there a split ($7,a," ");print a(*-1) ??? eg: print everything except last field

Thank you again,

Madasafish
 
Hi

As you use [tt]gawk[/tt], your all but the last piece of 7[sup]th[/sup] field idea can be continued like this :
Code:
awk -F":" 'NF{FS=" ";$0=$7;NF--;print;FS=":"}' /input/file
Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

Feherke.
 
Thanks Ferherke,

All your solutions worked based on my very basic example of the file. The file I am using is a "named valued pair" (NVP) with a colon ":" as the seperator and capital letters before the colon. The NVP can sit in totaly different places on the line. In my case anywhere in (NVP)field 4 to 8.

Your suggestion of "patsplit" works brilliantly.

For the benefit of other readers, the code that finally worked for me on my original file after some time playing around was...

gawk '{
patsplit($0,a,/[[:upper:]]+:/,s)
{
for (i=4; i <= 8; i++)
if (a == "DURATION:")
print as
}
}' schedule.ini

Please note "DURATION:" text was not supplied on the earlier examples given above.

As an aside, (belt and braces aproach), if I wanted to search for the string "DURATION:" across the whole the length of the line,

Would it be
for (i=1; i <= NF; i++)


Thanks again, Ferherke,

Madasafish












 
Code:
gawk 'BEGIN {}

        {

        patsplit($0,a,/[[:upper:]]+:/,s)

        i=1
        
        while (i <= NF)

        {

        if (a[i] == "TIME:") {
                 
                startdate=s[i]
                split(s[i],d,":");split(d[3],t,"-")
                stime=t[1]" "d[2]" "d[1]" "t[2]" "d[4]" "d[5]
                }

        if (a[i] == "TITLE:") { title=s[i]  }           
        if (a[i] == "BROADG:") { broadg=s[i]  }         
        if (a[i] == "GMTYPE:") { gmtype=s[i]  }         
        if (a[i] == "RPX:") { rp=s[i]  }                
        if (a[i] == "DURATION:") { endsec=s[i] }
        if (a[i] == "CHAN:") { chan=s[i] }
        if (a[i] == "PROMPTKEY:") { promptkey=s[i] }
        if (a[i] == "GMNAME:") { gmname=s[i] }
        if (a[i] == "FLAGS:") { flags=s[i] }
        if (a[i] == "CONTENT:") { content=s[i] }
        if (a[i] == "GDF:") { gdf=s[i] }
        if (a[i] == "DYNAMICROOT:") { dynamicroot=s[i] }
        if (a[i] == "LINKTO:") { linkto=s[i] }
        if (a[i] == "PROMPTIMG:") { promptimg=s[i] }
        if (a[i] == "PROMPTPOS:") { promptpos=s[i] }
        if (a[i] == "GMMODE:") { gmmode=s[i] }
        if (a[i] == "XPLCOLLECT:") { xpl_collect=s[i] }
        if (a[i] == "EPIDESC:") { epidesc=s[i] }
        if (a[i] == "CLICKTHRUCOUNTER:") { clickthru=s[i] } 
        if (a[i] == "LINKTYPE:") { linktype=s[i] }
        if (a[i] == "NORT:") { no_rt=s[i] }
        if (a[i] == "ANNOUNCEDELAY:") { announcedelay=s[i] }
        if (a[i] == "AUTOTERMINATE:") { autoterminate=s[i] }
        if (a[i] == "DESCID:") { descid=s[i] }
        if (a[i] == "PROMPTDURATION:") { promptduration=s[i] }
        if (a[i] == "EVENTID:") { eventid=s[i] }
        if (a[i] == "STARTURL:") { starturl=s[i] }
        if (a[i] == "CID:") { cid=s[i] }
                

        startsec=mktime(stime)
        endtime=startsec+endsec
        enddate=strftime("%d:%m:%Y-%H:%M:%S",endtime)
        timenow=systime()

        if (timenow > endtime) status="Expired"; 
        else if (timenow < startsec) status="Scheduled"; 
        else status="Live"; 

        i++     

        }

print startdate","enddate","title","chan","broadg","gmtype","rp","promptkey","gmname","flags","dynamicroot","content ","gdf","linkto","promptimg","promptpos","gmmode","xpl_collect","epidesc","linktype","no_rt","announcedelay","autoterminate ","descid","promptduration","eventid","status","clickthru","cid","starturl","i



}' x > x.csv

Code:
 Source
$ cat x
TIME:18:05:2002-01:00:01 TITLE:"ABC Bridge~rev8" BROADG:ntl GMTYPE:scheduled RPX:auto DURATION:410227200 CHAN:"ABC ONE" PROMPTKEY:2048 GMNAME:"ABC Bridge" CONTENT:[URL unfurl="true"]http://dcabwww.mh.abc.co.uk/etv/games/dummyInt248days.td[/URL] GDF:[URL unfurl="true"]http://dcabwww.mh.abc.co.uk/bridge/games/langley/bridgecomponent.gdf[/URL] DYNAMICROOT:test.txt LINKTO:ETVS10 PROMPTIMG:mcast://des.is.dtv/abcshared8/img/pressred.gif PROMPTPOS:481|32 GMMODE:1 EPIDESC:rev8 CLICKTHRUCOUNTER:@!ABCONEBridge@ LINKTYPE:ETV1 NORT:true ANNOUNCEDELAY:5 AUTOTERMINATE:false DESCID:2FCAE256-B24A-4f81-81D1-1420BA633857 PROMPTDURATION:30000 EVENTID:18:05:2002-01:00:01-ntl-ABCONE
TIME:08:10:2010-15:01:00 TITLE:"IPV1 HTV PROMPT APP~DOI 2011" BROADG:ntl RPX:auto GMTYPE:scheduled CHAN:HTV DURATION:10171560 GMNAME:"IPV1 HTV PROMPT APP" FLAGS:H CONTENT:[URL unfurl="true"]http://teleweb1.mywaytv.co.uk/ipv/doi/vote/1.00/config/empty.td[/URL] GDF:[URL unfurl="true"]http://teleweb1.mywaytv.co.uk/ipv/doi/vote/1.00/config/vote-slave.gdf[/URL] PROMPTIMG:mcast://des.is.dtv/ipv1/etvprompt/1.00/etv-prompt.html PROMPTPOS:503|15|159|87 XPLCOLLECT:0 GMMODE:1 EPIDESC:"DOI 2011" NORT:TRUE CLICKTHRUCOUNTER:@!ipv1ism@ ANNOUNCEDELAY:30 DESCID:77C52865-FC1F-4f5e-B07A-3B8B2029DF19 PROMPTDURATION:99999999 EVENTID:08:10:2010-15:01:00-ntl-HTV

Code:
 Result
18:05:2002-01:00:01 ,18:05:2015-01:00:01,"ABC Bridge~rev8" ,"ABC ONE" ,ntl ,scheduled ,auto ,2048 ,"ABC Bridge" ,,test.txt ,[URL unfurl="true"]http://dcabwww.mh.abc.co.uk/etv/games/dummyInt248days.td[/URL] ,[URL unfurl="true"]http://dcabwww.mh.abc.co.uk/bridge/games/langley/bridgecomponent.gdf[/URL] ,ETVS10 ,mcast://des.is.dtv/abcshared8/img/pressred.gif ,481|32 ,1 ,,rev8 ,ETV1 ,true ,5 ,false ,2FCAE256-B24A-4f81-81D1-1420BA633857 ,30000 ,18:05:2002-01:00:01-ntl-ABCONE,Live,@!ABCONEBridge@ ,,,29
08:10:2010-15:01:00 ,03:02:2011-07:27:00,"IPV1 HTV PROMPT APP~DOI 2011" ,HTV ,ntl ,scheduled ,auto ,2048 ,"IPV1 HTV PROMPT APP" ,H ,test.txt ,[URL unfurl="true"]http://teleweb1.mywaytv.co.uk/ipv/doi/vote/1.00/config/empty.td[/URL] ,[URL unfurl="true"]http://teleweb1.mywaytv.co.uk/ipv/doi/vote/1.00/config/vote-slave.gdf[/URL] ,ETVS10 ,mcast://des.is.dtv/ipv1/etvprompt/1.00/etv-prompt.html ,503|15|159|87 ,1 ,0 ,"DOI 2011" ,ETV1 ,TRUE ,30 ,false ,77C52865-FC1F-4f5e-B07A-3B8B2029DF19 ,99999999 ,08:10:2010-15:01:00-ntl-HTV,Expired,@!ipv1ism@ ,,,31


The source text is only 2 lines from hundreds of lines of Named Value Pairs (NVP). The code works except in one circumstance, if you look at line 2 of the source text example you will see there is no PROMPTKEY (NVP). When you check the result it has included the NVP result from the previous line. Can someone please advise on how to correct this behavior?


I have included "i" at the end, to report on how many fields in the line. It is showing more fields than is actually in the line?

As always,

Thanks in advance for any assistance with this.

Madasafish.
 
Hi

Just set it to empty string at the beginning of the block :
Code:
[navy]promptkey[/navy][teal]=[/teal][green][i]""[/i][/green]
But I suppose the same could happen to the other fields too, so better set them all to empty string :
Code:
[navy]title[/navy][teal]=[/teal][navy]broadg[/navy][teal]=[/teal][navy]gmtype[/navy][teal]=[/teal][navy]rp[/navy][teal]=[/teal][navy]endsec[/navy][teal]=[/teal][navy]chan[/navy][teal]=[/teal][navy]promptkey[/navy][teal]=[/teal][navy]gmname[/navy][teal]=[/teal][navy]flags[/navy][teal]=[/teal][navy]content[/navy][teal]=[/teal][navy]gdf[/navy][teal]=\[/teal]
[navy]dynamicroot[/navy][teal]=[/teal][navy]linkto[/navy][teal]=[/teal][navy]promptimg[/navy][teal]=[/teal][navy]promptpos[/navy][teal]=[/teal][navy]gmmode[/navy][teal]=[/teal][navy]xpl_collect[/navy][teal]=[/teal][navy]epidesc[/navy][teal]=[/teal][navy]clickthru[/navy][teal]=\[/teal]
[navy]linktype[/navy][teal]=[/teal][navy]no_rt[/navy][teal]=[/teal][navy]announcedelay[/navy][teal]=[/teal][navy]autoterminate[/navy][teal]=[/teal][navy]descid[/navy][teal]=[/teal][navy]promptduration[/navy][teal]=[/teal][navy]eventid[/navy][teal]=\[/teal]
[navy]starturl[/navy][teal]=[/teal][navy]cid[/navy][teal]=[/teal][green][i]""[/i][/green]
But as you use GNU Awk, you can reduce it to a single [tt]delete[/tt], if you use an array. That way you can skip that huge part of assignments too :
Code:
[b]BEGIN[/b] [teal]{[/teal]
  [navy]OFS[/navy][teal]=[/teal][green][i]","[/i][/green]
[teal]}[/teal]

[teal]{[/teal]
  [b]delete[/b] f

  [COLOR=darkgoldenrod]patsplit[/color][teal]([/teal][navy]$0[/navy][teal],[/teal]a[teal],[/teal][fuchsia]/[[:upper:]]+:/[/fuchsia][teal],[/teal]s[teal])[/teal]

  [b]for[/b] [teal]([/teal][navy]i[/navy][teal]=[/teal][purple]1[/purple][teal];[/teal]i[teal]<=[/teal]NF[teal];[/teal]i[teal]++)[/teal] [teal]{[/teal]
    f[teal][[/teal][b]tolower[/b][teal]([/teal][COLOR=darkgoldenrod]gensub[/color][teal]([/teal][fuchsia]/:$/[/fuchsia][teal],[/teal][green][i]""[/i][/green][teal],[/teal][green][i]""[/i][/green][teal],[/teal]a[teal][[/teal]i[teal]]))]=[/teal]s[teal][[/teal]i[teal]][/teal]

    [navy]startdate[/navy][teal]=[/teal]f[teal][[/teal][green][i]"time"[/i][/green][teal]][/teal]
    [b]split[/b][teal]([/teal]startdate[teal],[/teal]d[teal],[/teal][green][i]":"[/i][/green][teal]);[/teal] [b]split[/b][teal]([/teal]d[teal][[/teal][purple]3[/purple][teal]],[/teal]t[teal],[/teal][green][i]"-"[/i][/green][teal])[/teal]
    [navy]stime[/navy][teal]=[/teal]t[teal][[/teal][purple]1[/purple][teal]][/teal][green][i]" "[/i][/green]d[teal][[/teal][purple]2[/purple][teal]][/teal][green][i]" "[/i][/green]d[teal][[/teal][purple]1[/purple][teal]][/teal][green][i]" "[/i][/green]t[teal][[/teal][purple]2[/purple][teal]][/teal][green][i]" "[/i][/green]d[teal][[/teal][purple]4[/purple][teal]][/teal][green][i]" "[/i][/green]d[teal][[/teal][purple]5[/purple][teal]][/teal]

    [navy]startsec[/navy][teal]=[/teal][COLOR=darkgoldenrod]mktime[/color][teal]([/teal]stime[teal])[/teal]
    [navy]endtime[/navy][teal]=[/teal]startsec[teal]+[/teal]f[teal][[/teal][green][i]"duration"[/i][/green][teal]][/teal]
    [navy]enddate[/navy][teal]=[/teal][COLOR=darkgoldenrod]strftime[/color][teal]([/teal][green][i]"%d:%m:%Y-%H:%M:%S"[/i][/green][teal],[/teal]endtime[teal])[/teal]
    [navy]timenow[/navy][teal]=[/teal][COLOR=darkgoldenrod]systime()[/color]

    [b]if[/b] [teal]([/teal]timenow [teal]>[/teal] endtime[teal])[/teal] [navy]status[/navy][teal]=[/teal][green][i]"Expired"[/i][/green][teal];[/teal]
    [b]else[/b] [b]if[/b] [teal]([/teal]timenow [teal]<[/teal] startsec[teal])[/teal] [navy]status[/navy][teal]=[/teal][green][i]"Scheduled"[/i][/green][teal];[/teal]
    [b]else[/b] [navy]status[/navy][teal]=[/teal][green][i]"Live"[/i][/green][teal];[/teal]

  [teal]}[/teal]

  [b]print[/b] startdate[teal],[/teal]enddate[teal],[/teal]f[teal][[/teal][green][i]"title"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"chan"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"broadg"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"gmtype"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"rpx"[/i][/green][teal]],[/teal]
    f[teal][[/teal][green][i]"promptkey"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"gmname"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"flags"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"dynamicroot"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"content"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"gdf"[/i][/green][teal]],[/teal]
    f[teal][[/teal][green][i]"linkto"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"promptimg"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"promptpos"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"gmmode"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"xplcollect"[/i][/green][teal]],[/teal]
    f[teal][[/teal][green][i]"epidesc"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"linktype"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"nort"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"announcedelay"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"autoterminate"[/i][/green][teal]],[/teal]
    f[teal][[/teal][green][i]"descid"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"promptduration"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"eventid"[/i][/green][teal]],[/teal]status[teal],[/teal]f[teal][[/teal][green][i]"clickthrucounter"[/i][/green][teal]],[/teal]
    f[teal][[/teal][green][i]"cid"[/i][/green][teal]],[/teal]f[teal][[/teal][green][i]"starturl"[/i][/green][teal]],[/teal]NF[teal]+[/teal][purple]1[/purple]
[teal]}[/teal]


Feherke.
 
As the say in England....

The Mutts Nuts! :eek:)

Thank-you Feherke
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top