Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

group lines in file by common field

Status
Not open for further replies.

westwood01

Technical User
Dec 28, 2003
41
US
First off, this site is such an amazing resource. Thanks to you all.

Suppose I have an input.file that looks like:

homer: server1:/var/legato/rman/bin/ebuarch.PRTLDEV
homer: server1:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server1:/var/legato/rman/bin/ebuarch.PRTLQA
homer: server2:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server2:/var/legato/rman/bin/ebuarch.EP88QA

Using awk or sed how can I get the file to look like:

homer: server1:/var/legato/rman/bin/ebuarch.PRTLDEV
homer: server1:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server1:/var/legato/rman/bin/ebuarch.PRTLQA
homer: server1:ALL
homer: server2:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server2:/var/legato/rman/bin/ebuarch.EP88QA
homer: server2:ALL

If only field $2 (server1 for example) is duplicated, a line is added below the last like line to read "homer: server1:ALL" for example.

`awk -F":" '{print $2}' input.file` gives me a listing of all the server names, but I am unsure how to check for dups and add a line if dups exist.
 
I don't understand what you mean by "if only field $2 is duplicated". Could you explain that differently?
 
Sure. The lines of my input.file are currently arranged/sorted by server name. The server name is specified in the second field ($2), this is with the fields separated by the " : ". As in the example:

$1 $2 $3
homer: server1:/var/legato/rman/bin/ebuarch.PRTLDEV
homer: server1:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server1:/var/legato/rman/bin/ebuarch.PRTLQA
homer: server2:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server2:/var/legato/rman/bin/ebuarch.EP88QA
homer: server3:/var/legato/rman/bin/ebuarch.EP88QA
homer: server4:/var/legato/rman/bin/ebuarch.EP88QA

I want to add a line to this input.file if duplicate server names (more than one) are found. For example:

homer: server1:/var/legato/rman/bin/ebuarch.PRTLDEV
homer: server1:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server1:/var/legato/rman/bin/ebuarch.PRTLQA
homer: server1:ALL
homer: server2:/var/legato/rman/bin/ebuarch.PRTLPROD
homer: server2:/var/legato/rman/bin/ebuarch.EP88QA
homer: server2:ALL
homer: server3:/var/legato/rman/bin/ebuarch.EP88QA
homer: server4:/var/legato/rman/bin/ebuarch.EP88QA

So, since server1 (and server2) are duplicated, listed more than once, I have added the line "homer: server1:ALL".

Notice that server3 and server4 do not get the extra line added since they are only listed once in the original input.file, as opposed to server1 and server2 which are listed three and two times respectively.

So what do you think?
 
A starting point:
awk -F: '
$2!=s{if(n>1)print "homer:"s":ALL";n=0}
{print;s=$2;++n}
END{if(n>1)print "homer:"s":ALL"}
' input.file

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
How about this? Seems pretty long for a fairly simple requirement, but...

Code:
awk '
        BEGIN { FS=OFS=":" }
        $2 == last { dupes++ }
        $2 != last && dupes {
                dupes=0
                split(lastline,a,":")
                print a[1],a[2],"ALL"
        }
        { last=$2; lastline=$0; print }
        END {
                if (dupes) {
                        split(lastline,a,":")
                        print a[1],a[2],"ALL"
                }
        }
' inputfile

Annihilannic.
 
Or with sed

Code:
/server1/{		            
		:match
		n		    
		/server1/b match    
				
				
		i\
	new text here\
	}

Mike

"A foolproof method for sculpting an elephant: first, get a huge block of marble, then you chip away everything that doesn't look like an elephant."

 
This is a follow-up to my original post. The awk PHV worked well, and it now looks like this:

awk -F":" '
$4!=s{if(n>1)print " server:"s": All";n=0}
{print;s=$4;++n}
END{if(n>1)print "'date' server:"s": All"}
' tmp.server > tmp2.server

My question is, how can I modify the above awk to also get the timestamp displayed in the new line?

For example, if I have two lines that look like:

Jun 27 22:30:17 server: sbkeppdb1-ebu:/var/legato/rman/bin/ebuarch.PRTLPROD
Jun 27 22:30:17 server: sbkeppdb1-ebu:/var/legato/rman/bin/ebuarch.PRTLPRDS

The awk reads them and adds a line like this:
server: sbkeppdb1-ebu: All

Instead, I would like the new line added to look like:
Jun 27 22:30:17 server: sbkeppdb1-ebu: All

Basically looking to take the timestamp from the lines above and display it in the new line.

 
In all of the points in the script where it sets the value of s, just set the value of another variable, say t to the timestamp. For e.g. t=$1" "$2" "$3. Then add it to the print statements.

Note that since the value will be overwritten each time the variable is set you will only get the last timestamp for that server.

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top