group lines in file by common field

westwood01 · Jun 22, 2006

First off, this site is such an amazing resource. Thanks to you all.

Suppose I have an input.file that looks like:

homer: server1:/var/legato/rman/bin/ebuarch.PRTLDEV
homer: server1:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server1:/var/legato/rman/bin/ebuarch.PRTLQA
homer: server2:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server2:/var/legato/rman/bin/ebuarch.EP88QA

Using awk or sed how can I get the file to look like:

homer: server1:/var/legato/rman/bin/ebuarch.PRTLDEV
homer: server1:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server1:/var/legato/rman/bin/ebuarch.PRTLQA
homer: server1:ALL
homer: server2:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server2:/var/legato/rman/bin/ebuarch.EP88QA
homer: server2:ALL

If only field $2 (server1 for example) is duplicated, a line is added below the last like line to read "homer: server1:ALL" for example.

`awk -F":" '{print $2}' input.file` gives me a listing of all the server names, but I am unsure how to check for dups and add a line if dups exist.

risby · Jun 22, 2006

I don't understand what you mean by "if only field $2 is duplicated". Could you explain that differently?

westwood01 · Jun 22, 2006

Sure. The lines of my input.file are currently arranged/sorted by server name. The server name is specified in the second field ($2), this is with the fields separated by the " : ". As in the example:

$1 $2 $3
homer: server1:/var/legato/rman/bin/ebuarch.PRTLDEV
homer: server1:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server1:/var/legato/rman/bin/ebuarch.PRTLQA
homer: server2:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server2:/var/legato/rman/bin/ebuarch.EP88QA
homer: server3:/var/legato/rman/bin/ebuarch.EP88QA
homer: server4:/var/legato/rman/bin/ebuarch.EP88QA

I want to add a line to this input.file if duplicate server names (more than one) are found. For example:

homer: server1:/var/legato/rman/bin/ebuarch.PRTLDEV
homer: server1:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server1:/var/legato/rman/bin/ebuarch.PRTLQA
homer: server1:ALL
homer: server2:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server2:/var/legato/rman/bin/ebuarch.EP88QA
homer: server2:ALL
homer: server3:/var/legato/rman/bin/ebuarch.EP88QA
homer: server4:/var/legato/rman/bin/ebuarch.EP88QA

So, since server1 (and server2) are duplicated, listed more than once, I have added the line "homer: server1:ALL".

Notice that server3 and server4 do not get the extra line added since they are only listed once in the original input.file, as opposed to server1 and server2 which are listed three and two times respectively.

So what do you think?

PHV · Jun 22, 2006

A starting point:
awk -F: '
$2!=s{if(n>1)print "homer:"s":ALL";n=0}
{print;s=$2;++n}
END{if(n>1)print "homer:"s":ALL"}
' input.file

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886

Annihilannic · Jun 22, 2006

How about this? Seems pretty long for a fairly simple requirement, but...

Code:

awk '
        BEGIN { FS=OFS=":" }
        $2 == last { dupes++ }
        $2 != last && dupes {
                dupes=0
                split(lastline,a,":")
                print a[1],a[2],"ALL"
        }
        { last=$2; lastline=$0; print }
        END {
                if (dupes) {
                        split(lastline,a,":")
                        print a[1],a[2],"ALL"
                }
        }
' inputfile

Annihilannic.

westwood01 · Jun 23, 2006

thank you guys

mrn · Jun 23, 2006

Or with sed

Code:

/server1/{		            
		:match
		n		    
		/server1/b match    
				
				
		i\
	new text here\
	}

Mike

"A foolproof method for sculpting an elephant: first, get a huge block of marble, then you chip away everything that doesn't look like an elephant."

http://www.airport-parking-site.co.uk/

westwood01 · Jun 28, 2006

This is a follow-up to my original post. The awk PHV worked well, and it now looks like this:

awk -F":" '
$4!=s{if(n>1)print " server:"s": All";n=0}
{print;s=$4;++n}
END{if(n>1)print "'date' server:"s": All"}
' tmp.server > tmp2.server

My question is, how can I modify the above awk to also get the timestamp displayed in the new line?

For example, if I have two lines that look like:

Jun 27 22:30:17 server: sbkeppdb1-ebu:/var/legato/rman/bin/ebuarch.PRTLPROD
Jun 27 22:30:17 server: sbkeppdb1-ebu:/var/legato/rman/bin/ebuarch.PRTLPRDS

The awk reads them and adds a line like this:
server: sbkeppdb1-ebu: All

Instead, I would like the new line added to look like:
Jun 27 22:30:17 server: sbkeppdb1-ebu: All

Basically looking to take the timestamp from the lines above and display it in the new line.

Annihilannic · Jun 28, 2006

In all of the points in the script where it sets the value of s, just set the value of another variable, say t to the timestamp. For e.g. t=$1" "$2" "$3. Then add it to the print statements.

Note that since the value will be overwritten each time the variable is set you will only get the last timestamp for that server.

Annihilannic.

westwood01 · Jun 28, 2006

Thanks Annihilannic - worked perfectly.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

group lines in file by common field

westwood01

Technical User

risby

Programmer

westwood01

Technical User

PHV

MIS

Annihilannic

MIS

westwood01

Technical User

mrn

MIS

westwood01

Technical User

Annihilannic

MIS

westwood01

Technical User

Similar threads

Part and Inventory Search

Sponsor