Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Merge using AWK 2

Status
Not open for further replies.

stu78

Programmer
May 29, 2002
121
0
0
GB
Hi,

I'm not an expert in AWK, but I've been advised awk could help with this problem...

I have the following file I'd been trying to merge with SQL within a database, basically, I've had to merge AD users and group information, which in SQL is not an issue. However, it's producing me output (for many users, that looks like this;


| AccountDisabled | CanonicalNameUser | City | Comment | CommonName | Company | Country | CreationDate | Department | Description | DisplayName | DistinguishedName | GroupName |

| FALSE | client.local/client/Accounts/user1 | Town | | user1 | company name | | 02/12/2009 17:02 | Department1 | | User 1 | CN=user1,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G1 |
| FALSE | client.local/client/Accounts/user1 | Town | | user1 | company name | | 02/12/2009 17:02 | Department1 | | User 1 | CN=user1,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G2 |
| FALSE | client.local/client/Accounts/user1 | Town | | user1 | company name | | 02/12/2009 17:02 | Department1 | | User 1 | CN=user1,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G3 |
| FALSE | client.local/client/Accounts/user1 | Town | | user1 | company name | | 02/12/2009 17:02 | Department1 | | User 1 | CN=user1,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G4 |
| FALSE | client.local/client/Accounts/user2 | Town | | user2 | company name | | 02/12/2009 17:02 | Department1 | | User 2 | CN=user2,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G7 |
| FALSE | client.local/client/Accounts/user3 | Town | | user3 | company name | | 02/12/2009 17:02 | Department1 | | User 3 | CN=user3,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G7 |
| FALSE | client.local/client/Accounts/user3 | Town | | user2 | company name | | 02/12/2009 17:02 | Department1 | | User 2 | CN=user2,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G8 |

So you can see, that it's listing me all all groups, with the users also.

What I wanted to do in SQL was produce output like;

FALSE | client.local/client/Accounts/user1 | Town | | user1 | company name | | 02/12/2009 17:02 | Department1 | | User 1 | CN=user1,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G1,Grp_G2,Grp_G3,Grp_G4 |
| FALSE | client.local/client/Accounts/user2 | Town | | user2 | company name | | 02/12/2009 17:02 | Department1 | | User 2 | CN=user2,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G7 |
| FALSE | client.local/client/Accounts/user3 | Town | | user3 | company name | | 02/12/2009 17:02 | Department1 | | User 3 | CN=user3,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G7,Grp_G8 |

So basically, I need to scan the file for the column DistinguishedName e.g. CN=user1,OU=Accounts,OU=Client,DC=client,DC=local
If it appears in the subsequent rows, merge the attribute contained in GroupName into the first Groupname Value
If it doesn't appear in the subsequent rows, then just continue, but also put this into the output
 
I assumed that the last line in your input data should have been for user3 and that lines to be merged always appear consecutively in the file
Code:
BEGIN {FS="|"}
NR<3{print;next}
{
  if (cn!=$13) {
    if (NR>3) print a,b,"|"
    cn=$13
    b=$14
    a=$0
    sub(/[^\|]*\|$/,"",a)
  } else {
    sub(/ *$/,"",b)
    sub(/^ */,"",$14)
    b=b "," $14
  }
}
END {
  print a,b,"|"
}

CaKiwi
 
Hi CaKiwi,

This is very helpful, however, produces the following output;

| FALSE | client.local/client/Accounts/user1 | Town | | user1 | company name | | 02/12/2009 17:02 | Department1 | | User 1 | CN=user1,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G1 |
| FALSE | client.local/client/Accounts/user1 | Town | | user1 | company name | | 02/12/2009 17:02 | Department1 | | User 1 |CN=user1,OU=Accounts,OU=Client
,DC=client,DC=local | Grp_G2 |
| FALSE | client.local/client/Accounts/user1 | Town | | user1 | company name | | 02/12/2009 17:02 | Department1 | | User 1 | CN=user1,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G3,Grp_G4 |
| FALSE | client.local/client/Accounts/user2 | Town | | user2 | company name | | 02/12/2009 17:02 | Department1 | | User 2 | CN=user2,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G7 |
| FALSE | client.local/client/Accounts/user3 | Town | | user3 | company name | | 02/12/2009 17:02 | Department1 | | User 3 |CN=user3,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G7 |
| FALSE | client.local/client/Accounts/user3 | Town | | user2 | company name | | 02/12/2009 17:02 | Department1 | | User 2 | CN=user2,OU=Accounts,OU=Client,DC=client,DC=local | Grp_G8 |

Meaning it doesn't merge all columns... e.g. user 2 and user 1. Also, you're right, the data won't appear in consecutive lines; more likely it won't :)

How would you fix these?

You're correct in your assumption about user 3 :)
 
Try this
merge.awk
Code:
[COLOR=#6a5acd]BEGIN[/color] {
  [COLOR=#6a5acd]FS[/color]=[COLOR=#ff00ff]"|"[/color][COLOR=#6a5acd];[/color]
}

[COLOR=#6a5acd]NR[/color] > [COLOR=#ff00ff]1[/color] {
  [COLOR=#0000ff]# strip out all spaces[/color]
  [COLOR=#008080]gsub[/color]([COLOR=#ff00ff]/[[/color][COLOR=#6a5acd]" "[/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]+[/color][COLOR=#ff00ff]/[/color][COLOR=#6a5acd],[/color] [COLOR=#ff00ff]""[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]$0[/color])
  user = [COLOR=#6a5acd]$6[/color][COLOR=#6a5acd];[/color]
  [COLOR=#804040][b]if[/b][/color] (user_array[[COLOR=#6a5acd]user[/color]]) {
    [COLOR=#804040][b]if[/b][/color] ([COLOR=#008080]index[/color](user_array[[COLOR=#6a5acd]user[/color]][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]$14[/color]) == [COLOR=#ff00ff]0[/color]) {
      user_array[[COLOR=#6a5acd]user[/color]] = user_array[[COLOR=#6a5acd]user[/color]] [COLOR=#ff00ff]","[/color] [COLOR=#6a5acd]$14[/color]
    }
  }
  [COLOR=#804040][b]else[/b][/color] {
    user_array[[COLOR=#6a5acd]user[/color]] = [COLOR=#6a5acd]$1[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$2[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$3[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$4[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$5[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$6[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$7[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]\[/color]
                       [COLOR=#6a5acd]$8[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$9[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$10[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$11[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$12[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$13[/color] [COLOR=#ff00ff]"|"[/color] [COLOR=#6a5acd]$14[/color]
  }
}

[COLOR=#6a5acd]END[/color] {
  [COLOR=#804040][b]for[/b][/color] (user in user_array) {
    [COLOR=#804040][b]printf[/b][/color]([COLOR=#ff00ff]"[/color][COLOR=#6a5acd]%s[/color][COLOR=#ff00ff]|[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color][COLOR=#6a5acd],[/color] user_array[[COLOR=#6a5acd]user[/color]])
  }
}
Running it with
Code:
awk -f merge.awk select.dat > select_merge.dat
where select.dat is your file produces select_merge.dat, i.e. what you want
Code:
|FALSE|client.local/client/Accounts/user1|Town||user1|companyname||02/12/200917:02|Department1||User1|CN=user1,OU=Accounts,OU=Client,DC=client,DC=local|Grp_G1,Grp_G2,Grp_G3,Grp_G4|
|FALSE|client.local/client/Accounts/user2|Town||user2|companyname||02/12/200917:02|Department1||User2|CN=user2,OU=Accounts,OU=Client,DC=client,DC=local|Grp_G7,Grp_G8|
|FALSE|client.local/client/Accounts/user3|Town||user3|companyname||02/12/200917:02|Department1||User3|CN=user3,OU=Accounts,OU=Client,DC=client,DC=local|Grp_G7|
 
Thanks guys - that's fantastic!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top