Print out first 3 strings in a specified column 1

jdespres · Dec 24, 2015

Separating columns via multiple spaces...

awk -F '[[:space:]][[:space:]]+' '{ print $1, $7"/"$6, $12 }'

Each column of data will have multiple data separated by spaces..

Not sure how to print out only the first 3 items in $12....

Any idea's

Thanks....

Joe Despres

feherke · Dec 24, 2015

Hi

So the fields are separated by two or more whitespace characters and sub-fields are separated by single whitespace characters.

The big question here is whether you need to preserve the original whitespaces. I mean, to keep space in the output where was space in the input and keep tab where was tab.

If not needed to preserve whitespaces as they were, is simple :

Code:

awk -F '[[:space:]][[:space:]]+' '[teal]{[/teal] [b]split[/b][teal]([/teal][navy]$12[/navy][teal],[/teal] m[teal],[/teal] [fuchsia]/[[:space:]]/[/fuchsia][teal]);[/teal] [b]print[/b] [navy]$1[/navy][teal],[/teal] [navy]$7[/navy][i][green]"/"[/green][/i][navy]$6[/navy][teal],[/teal] m[teal][[/teal][purple]1[/purple][teal]],[/teal] m[teal][[/teal][purple]2[/purple][teal]],[/teal] m[teal][[/teal][purple]3[/purple][teal]] }[/teal]'

If need to preserve whitespaces, gets abit complicated, but still reasonably simple if only has to work in GNU Awk :

Code:

awk -F '\\s\\s+' '[teal]{[/teal] [b]print[/b] [navy]$1[/navy][teal],[/teal] [navy]$7[/navy][i][green]"/"[/green][/i][navy]$6[/navy][teal],[/teal] [COLOR=orange]gensub[/color][teal]([/teal][fuchsia]/(\S+\s\S+\s\S+).*/[/fuchsia][teal],[/teal] [i][green]"[/green][/i][lime]\\[/lime][i][green]1"[/green][/i][teal],[/teal] [i][green]""[/green][/i][teal],[/teal] [navy]$12[/navy][teal]) }[/teal]'

[gray]# or[/gray]

awk -F '\\s\\s+' '[teal]{[/teal] [b]match[/b][teal]([/teal][navy]$12[/navy][teal],[/teal] [fuchsia]/\S+\s\S+\s\S+/[/fuchsia][teal],[/teal] m[teal]);[/teal] [b]print[/b] [navy]$1[/navy][teal],[/teal] [navy]$7[/navy][i][green]"/"[/green][/i][navy]$6[/navy][teal],[/teal] m[teal][[/teal][purple]0[/purple][teal]] }[/teal]'

If need to preserve whitespaces and to be portable ( or at least work with something else than GNU Awk ) :

Code:

awk -F '[[:space:]][[:space:]]+' '[teal]{[/teal] [b]match[/b][teal]([/teal][navy]$12[/navy][teal],[/teal] [fuchsia]/[^[:space:]]+[[:space:]][^[:space:]]+[[:space:]][^[:space:]]+/[/fuchsia][teal]);[/teal] [b]print[/b] [navy]$1[/navy][teal],[/teal] [navy]$7[/navy][i][green]"/"[/green][/i][navy]$6[/navy][teal],[/teal] [b]substr[/b][teal]([/teal][navy]$12[/navy][teal],[/teal] RSTART[teal],[/teal] RLENGTH[teal]) }[/teal]'

( As you can see, in GNU Awk you can use [tt]\s[/tt] for [tt][[:space:]][/tt] and [tt]\S[/tt] for [tt][^[:space:]][/tt]. That also works in original-awk ( available on Ubuntu, not sure about its origin ), but not in Mawk. There the closest alternative would be [tt][ \t][/tt] for [tt][[:space:]][/tt] and [tt][^ \t][/tt] for [tt][^[:space:]][/tt]. )

Feherke.
feherke.ga

jdespres · Dec 24, 2015

didn't work....

I do like the ::---> '\\s\\s+'

Thanks!

Joe Despres

feherke · Dec 24, 2015

Hi

Joe said:
didn't work....

Sorry to hear that. Could you post some sample input and expected output ? And specify which Awk implementation / version are you using.

Feherke.
feherke.ga

jdespres · Dec 24, 2015

awk -W version
GNU Awk 3.1.8

Using awk on a Avamar system

#### Here's the raw out put from the mccli command ::--->

Code:

9145091880251509 Completed w/Exception(s) 10010      2015-12-23 20:00 EST 00h:59m:07s 2015-12-23 20:59 EST Scheduled Backup   6.2 TB         0.1%      yyy.com /xxxx Windows Server 2008 R2 Enterprise Server Edition (No Service Pack) 64-bit 7.0.102-47     2015-12-23 20:00 EST 2015-12-24 08:00 EST 00h:00m:36s  /xxxx/Windows 2008                Windows File System Retention_xxxx   D         xxxx Windows /xxxx/Windows_2008                       xxxx Windows-Windows 2008-1450918802270                        Avamar N/A
9145091880251709 Completed w/Exception(s) 10010      2015-12-23 20:59 EST 00h:05m:19s 2015-12-23 21:05 EST Scheduled Backup   42.8 GB        0.8%      yyyy.com /xxxx  Windows Server 2008 R2 Enterprise Server Edition (No Service Pack) 64-bit 7.0.102-47     2015-12-23 20:00 EST 2015-12-24 08:00 EST 00h:59m:45s  /xxxx/Windows 2008                Windows VSS         Retention_xxxx   D         xxxx Windows /xxxx/Windows_2008                       xxxx Windows-Windows 2008-1450918802270                        Avamar N/A
9145083240268209 Completed w/Exception(s) 10010      2015-12-22 22:11 EST 00h:48m:34s 2015-12-22 23:00 EST Scheduled Backup   6.2 TB         0.1%      yyyy.com /xxxx  Windows Server 2008 R2 Enterprise Server Edition (No Service Pack) 64-bit 7.0.102-47     2015-12-22 20:00 EST 2015-12-23 08:00 EST 02h:11m:46s  /xxxx/Windows 2008                Windows File System Retention_xxxx   D         xxxx Windows /xxxx/Windows_2008                       xxxx Windows-Windows 2008-1450832402416                        Avamar N/A

#### Output desired ::--->
9145091880251509 Completed w/Exception(s) /xxxx/yyy.com Windows File System
9145091880251709 Completed w/Exception(s) /xxxx/yyy.com Windows VSS
9145083240268209 Completed w/Exception(s) /xxxx/yyy.com Windows File System

Basically I want to check for exceptions from yesterdays backup results... Will apply this same info to the failures as well..

Thanks....

Joe Despres

feherke · Dec 24, 2015

Hi

Then the field separator theory seems not good enough :

Code:

... yyy.com /xxxx[highlight red] [/highlight]Windows Server 2008 ...
... yyyy.com /xxxx[highlight green]  [/highlight]Windows Server 2008 ...
... yyyy.com /xxxx[highlight green]  [/highlight]Windows Server 2008 ...

As you have GNU Awk, I would say, better we use the [tt]match()[/tt] function to collect the needed pieces. ( [tt]match()[/tt]'s 3^rd parameter is GNU extension. )

But having only limited information about the input ( I assume those "xxxx" are placeholders for sensitive data ), putting together the regular expression would be quite long. So I would suggest an off-topic solution : Perl, because it's regular expressions support non-greedy quantifiers.

Perl:

perl -ne '[b]print[/b][i][green]"$1 $3/$2 $4\n"[/green][/i][b]if[/b][i][green]/^(.+?)\s+\d+\s+\d{4}-\d{2}-\d{2}.+?\s(\w+\.\w+)\s+(\/\w+).+\s{2,}\/\w+\/.+?\s{2,}(.+)\s+Retention/[/green][/i]'

Actually the accent is on non-greedy modifiers, so any tool/language with PCRE would do it.

Feherke.
feherke.ga

jdespres · Dec 24, 2015

Hey Feherke.....

That didn't work

Thanks! You shouldn't work on this any more...

Joe Despres

feherke · Dec 24, 2015

Hi

Well, it works for the sample input... I suppose the issue is with those "xxxx", which I try to match a [tt]\w+[/tt]. If they contain non-word characters, those will break the matching.

Feherke.
feherke.ga

jdespres · Dec 25, 2015

Yeah, xxxx is just alphabet characters

Thanks

Joe Despres

jdespres · Dec 28, 2015

I totally forgot! mccli command can output xml!

Code:

    <Row>
      <ID>9145117800006709</ID>
      <Status>Completed</Status>
      <ErrorCode>0</ErrorCode>
      <StartTime>2015-12-26 20:11 EST</StartTime>
      <Elapsed>00h:07m:05s</Elapsed>
      <EndTime>2015-12-26 20:18 EST</EndTime>
      <Type>Scheduled Backup</Type>
      <ProgressBytes>22.7 GB</ProgressBytes>
      <NewBytes>0.9%</NewBytes>
      <Client>mickey.mouse.com</Client>
      <Domain>/Unrestrictive/Infrastructure</Domain>
      <OS>Windows Server 2008 R2 Enterprise Server Edition Service Pack 1 64-bit</OS>
      <ClientRelease>7.1.101-145</ClientRelease>
      <Sched.StartTime>2015-12-26 20:00 EST</Sched.StartTime>
      <Sched.EndTime>2015-12-27 08:00 EST</Sched.EndTime>
      <ElapsedWait>00h:11m:21s</ElapsedWait>
      <Group>/Infrastructure-ServerFile-S20-RD30</Group>
      <Plug-In>Windows VSS</Plug-In>
      <RetentionPolicy>RD30</RetentionPolicy>
      <Retention>D</Retention>
      <Schedule>S20</Schedule>
      <Dataset>/ServerFile</Dataset>
      <WID>S20-Infrastructure-ServerFile-S20-RD30-1451178000029</WID>
      <Server>Avamar</Server>
      <Container>N/A</Container>
    </Row>

Each backup generates one set of this...

All I really need is to strip out all the tags and put the data on one line separated by a comma

Joe Despres

feherke · Dec 29, 2015

Hi

Joe said:
All I really need is to strip out all the tags and put the data on one line separated by a comma

May I suggest another off-topic solution for that ? XMLStarlet :

Code:

xmlstarlet sel -t -m //Row -v ID -o , -v Status -o , -v Errorcode -o , -v Domain -o / -v Client -o , -v Plug-In -n

( Although not sure where the commas will come in the picture as until now the separators were spaces. )

Feherke.
feherke.ga

jdespres · Dec 29, 2015

Bummer...... I don't have "xmlstarlet" installed

#### This seems to work ::--->

Code:

raw-quickc () {
export MCCLI=/usr/local/avamar/bin/mccli
export BIN=/home/admin/bin
echo "ID,Status,ErrorCode,StartTime,Elapsed,EndTime,Type,ProgressBytes,NewBytes,Client,Domain,OS,ClientRelease,Sched.StartTime,Sched.EndTime,ElapsedWait,Group,Plug-In,RetentionPolicy,Retention,Schedule,Dataset,WID,Server,Container"
$MCCLI activity show --completed=true --verbose --xml | sed -n '/<Row/,/<\/Row/p'| sed 's/<\/\?[^>]\+>//g'|awk '{$1=$1}1'|awk -f $BIN/ONE_Line.awk|sed 's/\&amp\;lt\;//g'
}

ugly enough to back a buzzard off a gut wagon!

#### ONE_Line.awk ::--->

Code:

BEGIN { RS = ""; FS = "\n"; ORS = "" }
{
        x=1
        while ( x<NF ) {
                print $x ","
                x++
        }
        print $NF "\n"
}

My next goal is to grep out part of a column

Thanks....

Joe Despres

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Print out first 3 strings in a specified column 1

jdespres

MIS

feherke

Programmer

jdespres

MIS

feherke

Programmer

jdespres

MIS

feherke

Programmer

jdespres

MIS

feherke

Programmer

jdespres

MIS

jdespres

MIS

feherke

Programmer

jdespres

MIS

Similar threads

Part and Inventory Search

Sponsor