Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Print out first 3 strings in a specified column 1

Status
Not open for further replies.

jdespres

MIS
Aug 4, 1999
230
US
Separating columns via multiple spaces...

awk -F '[[:space:]][[:space:]]+' '{ print $1, $7"/"$6, $12 }'

Each column of data will have multiple data separated by spaces..

Not sure how to print out only the first 3 items in $12....

Any idea's

Thanks....

Joe Despres
 
Hi

So the fields are separated by two or more whitespace characters and sub-fields are separated by single whitespace characters.

The big question here is whether you need to preserve the original whitespaces. I mean, to keep space in the output where was space in the input and keep tab where was tab.

If not needed to preserve whitespaces as they were, is simple :
Code:
awk -F '[[:space:]][[:space:]]+' '[teal]{[/teal] [b]split[/b][teal]([/teal][navy]$12[/navy][teal],[/teal] m[teal],[/teal] [fuchsia]/[[:space:]]/[/fuchsia][teal]);[/teal] [b]print[/b] [navy]$1[/navy][teal],[/teal] [navy]$7[/navy][i][green]"/"[/green][/i][navy]$6[/navy][teal],[/teal] m[teal][[/teal][purple]1[/purple][teal]],[/teal] m[teal][[/teal][purple]2[/purple][teal]],[/teal] m[teal][[/teal][purple]3[/purple][teal]] }[/teal]'

If need to preserve whitespaces, gets abit complicated, but still reasonably simple if only has to work in GNU Awk :
Code:
awk -F '\\s\\s+' '[teal]{[/teal] [b]print[/b] [navy]$1[/navy][teal],[/teal] [navy]$7[/navy][i][green]"/"[/green][/i][navy]$6[/navy][teal],[/teal] [COLOR=orange]gensub[/color][teal]([/teal][fuchsia]/(\S+\s\S+\s\S+).*/[/fuchsia][teal],[/teal] [i][green]"[/green][/i][lime]\\[/lime][i][green]1"[/green][/i][teal],[/teal] [i][green]""[/green][/i][teal],[/teal] [navy]$12[/navy][teal]) }[/teal]'

[gray]# or[/gray]

awk -F '\\s\\s+' '[teal]{[/teal] [b]match[/b][teal]([/teal][navy]$12[/navy][teal],[/teal] [fuchsia]/\S+\s\S+\s\S+/[/fuchsia][teal],[/teal] m[teal]);[/teal] [b]print[/b] [navy]$1[/navy][teal],[/teal] [navy]$7[/navy][i][green]"/"[/green][/i][navy]$6[/navy][teal],[/teal] m[teal][[/teal][purple]0[/purple][teal]] }[/teal]'

If need to preserve whitespaces and to be portable ( or at least work with something else than GNU Awk ) :
Code:
awk -F '[[:space:]][[:space:]]+' '[teal]{[/teal] [b]match[/b][teal]([/teal][navy]$12[/navy][teal],[/teal] [fuchsia]/[^[:space:]]+[[:space:]][^[:space:]]+[[:space:]][^[:space:]]+/[/fuchsia][teal]);[/teal] [b]print[/b] [navy]$1[/navy][teal],[/teal] [navy]$7[/navy][i][green]"/"[/green][/i][navy]$6[/navy][teal],[/teal] [b]substr[/b][teal]([/teal][navy]$12[/navy][teal],[/teal] RSTART[teal],[/teal] RLENGTH[teal]) }[/teal]'

( As you can see, in GNU Awk you can use [tt]\s[/tt] for [tt][[:space:]][/tt] and [tt]\S[/tt] for [tt][^[:space:]][/tt]. That also works in original-awk ( available on Ubuntu, not sure about its origin ), but not in Mawk. There the closest alternative would be [tt][ \t][/tt] for [tt][[:space:]][/tt] and [tt][^ \t][/tt] for [tt][^[:space:]][/tt]. )


Feherke.
feherke.ga
 
didn't work....

I do like the ::---> '\\s\\s+'

Thanks!

Joe Despres
 
Hi

Joe said:
didn't work....
Sorry to hear that. Could you post some sample input and expected output ? And specify which Awk implementation / version are you using.

Feherke.
feherke.ga
 
awk -W version
GNU Awk 3.1.8

Using awk on a Avamar system :)

#### Here's the raw out put from the mccli command ::--->
Code:
9145091880251509 Completed w/Exception(s) 10010      2015-12-23 20:00 EST 00h:59m:07s 2015-12-23 20:59 EST Scheduled Backup   6.2 TB         0.1%      yyy.com /xxxx Windows Server 2008 R2 Enterprise Server Edition (No Service Pack) 64-bit 7.0.102-47     2015-12-23 20:00 EST 2015-12-24 08:00 EST 00h:00m:36s  /xxxx/Windows 2008                Windows File System Retention_xxxx   D         xxxx Windows /xxxx/Windows_2008                       xxxx Windows-Windows 2008-1450918802270                        Avamar N/A
9145091880251709 Completed w/Exception(s) 10010      2015-12-23 20:59 EST 00h:05m:19s 2015-12-23 21:05 EST Scheduled Backup   42.8 GB        0.8%      yyyy.com /xxxx  Windows Server 2008 R2 Enterprise Server Edition (No Service Pack) 64-bit 7.0.102-47     2015-12-23 20:00 EST 2015-12-24 08:00 EST 00h:59m:45s  /xxxx/Windows 2008                Windows VSS         Retention_xxxx   D         xxxx Windows /xxxx/Windows_2008                       xxxx Windows-Windows 2008-1450918802270                        Avamar N/A
9145083240268209 Completed w/Exception(s) 10010      2015-12-22 22:11 EST 00h:48m:34s 2015-12-22 23:00 EST Scheduled Backup   6.2 TB         0.1%      yyyy.com /xxxx  Windows Server 2008 R2 Enterprise Server Edition (No Service Pack) 64-bit 7.0.102-47     2015-12-22 20:00 EST 2015-12-23 08:00 EST 02h:11m:46s  /xxxx/Windows 2008                Windows File System Retention_xxxx   D         xxxx Windows /xxxx/Windows_2008                       xxxx Windows-Windows 2008-1450832402416                        Avamar N/A

#### Output desired ::--->
9145091880251509 Completed w/Exception(s) /xxxx/yyy.com Windows File System
9145091880251709 Completed w/Exception(s) /xxxx/yyy.com Windows VSS
9145083240268209 Completed w/Exception(s) /xxxx/yyy.com Windows File System

Basically I want to check for exceptions from yesterdays backup results... Will apply this same info to the failures as well..

Thanks....

Joe Despres
 
Hi

Then the field separator theory seems not good enough :
Code:
... yyy.com /xxxx[highlight red] [/highlight]Windows Server 2008 ...
... yyyy.com /xxxx[highlight green]  [/highlight]Windows Server 2008 ...
... yyyy.com /xxxx[highlight green]  [/highlight]Windows Server 2008 ...

As you have GNU Awk, I would say, better we use the [tt]match()[/tt] function to collect the needed pieces. ( [tt]match()[/tt]'s 3rd parameter is GNU extension. )

But having only limited information about the input ( I assume those "xxxx" are placeholders for sensitive data ), putting together the regular expression would be quite long. So I would suggest an off-topic solution : Perl, because it's regular expressions support non-greedy quantifiers.
Perl:
perl -ne '[b]print[/b][i][green]"$1 $3/$2 $4\n"[/green][/i][b]if[/b][i][green]/^(.+?)\s+\d+\s+\d{4}-\d{2}-\d{2}.+?\s(\w+\.\w+)\s+(\/\w+).+\s{2,}\/\w+\/.+?\s{2,}(.+)\s+Retention/[/green][/i]'

Actually the accent is on non-greedy modifiers, so any tool/language with PCRE would do it.

Feherke.
feherke.ga
 
Hey Feherke.....

That didn't work :(

Thanks! You shouldn't work on this any more...

Joe Despres
 
Hi

Well, it works for the sample input... I suppose the issue is with those "xxxx", which I try to match a [tt]\w+[/tt]. If they contain non-word characters, those will break the matching.


Feherke.
feherke.ga
 
Yeah, xxxx is just alphabet characters

Thanks

Joe Despres
 
I totally forgot! mccli command can output xml!

Code:
    <Row>
      <ID>9145117800006709</ID>
      <Status>Completed</Status>
      <ErrorCode>0</ErrorCode>
      <StartTime>2015-12-26 20:11 EST</StartTime>
      <Elapsed>00h:07m:05s</Elapsed>
      <EndTime>2015-12-26 20:18 EST</EndTime>
      <Type>Scheduled Backup</Type>
      <ProgressBytes>22.7 GB</ProgressBytes>
      <NewBytes>0.9%</NewBytes>
      <Client>mickey.mouse.com</Client>
      <Domain>/Unrestrictive/Infrastructure</Domain>
      <OS>Windows Server 2008 R2 Enterprise Server Edition Service Pack 1 64-bit</OS>
      <ClientRelease>7.1.101-145</ClientRelease>
      <Sched.StartTime>2015-12-26 20:00 EST</Sched.StartTime>
      <Sched.EndTime>2015-12-27 08:00 EST</Sched.EndTime>
      <ElapsedWait>00h:11m:21s</ElapsedWait>
      <Group>/Infrastructure-ServerFile-S20-RD30</Group>
      <Plug-In>Windows VSS</Plug-In>
      <RetentionPolicy>RD30</RetentionPolicy>
      <Retention>D</Retention>
      <Schedule>S20</Schedule>
      <Dataset>/ServerFile</Dataset>
      <WID>S20-Infrastructure-ServerFile-S20-RD30-1451178000029</WID>
      <Server>Avamar</Server>
      <Container>N/A</Container>
    </Row>

Each backup generates one set of this...

All I really need is to strip out all the tags and put the data on one line separated by a comma

Joe Despres
 
Hi

Joe said:
All I really need is to strip out all the tags and put the data on one line separated by a comma
May I suggest another off-topic solution for that ? XMLStarlet :
Code:
xmlstarlet sel -t -m //Row -v ID -o , -v Status -o , -v Errorcode -o , -v Domain -o / -v Client -o , -v Plug-In -n
( Although not sure where the commas will come in the picture as until now the separators were spaces. )

Feherke.
feherke.ga
 
Bummer...... I don't have "xmlstarlet" installed :(

#### This seems to work ::--->
Code:
raw-quickc () {
export MCCLI=/usr/local/avamar/bin/mccli
export BIN=/home/admin/bin
echo "ID,Status,ErrorCode,StartTime,Elapsed,EndTime,Type,ProgressBytes,NewBytes,Client,Domain,OS,ClientRelease,Sched.StartTime,Sched.EndTime,ElapsedWait,Group,Plug-In,RetentionPolicy,Retention,Schedule,Dataset,WID,Server,Container"
$MCCLI activity show --completed=true --verbose --xml | sed -n '/<Row/,/<\/Row/p'| sed 's/<\/\?[^>]\+>//g'|awk '{$1=$1}1'|awk -f $BIN/ONE_Line.awk|sed 's/\&amp\;lt\;//g'
}

ugly enough to back a buzzard off a gut wagon!

#### ONE_Line.awk ::--->
Code:
BEGIN { RS = ""; FS = "\n"; ORS = "" }
{
        x=1
        while ( x<NF ) {
                print $x ","
                x++
        }
        print $NF "\n"
}

My next goal is to grep out part of a column :)

Thanks....

Joe Despres
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top