Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Looking for help with sed/regex 1

Status
Not open for further replies.

rhoover

Programmer
Jun 13, 2003
6
US
I am new to scripting and I have an ldif file that I am cleaning up in order to import in into Oracle OID which is Oracles LDAP. It has to be in a specific format in order to load. The problem I am getting stuck at is there are some instances where there is more than one mail address listed for an employee. Does anyone know how I can use sed/regex to delete all except the first occurence. There are about 100,000 employees.

example:
dn: cn=0123456,cn=users,dc=company,dc=com
telephonenumber: 123-246-7890
mail: employee@company.com
uid: 0123456
cn: New Employee
givenname: New
sn: Employee
orclisVisible: True
objectclass: person
objectclass: organizationalperson
objectclass: inetorgperson
objectclass: orcluserv2

dn: cn=0123abc,cn=users,dc=company,dc=com
telephonenumber: 123-246-7890
mail: robert_smith@company.com
mail: bs@company.com
mail: bob@company.com
mail: rsmith@company.com
uid: 0123abc
cn: Robert Smith
givenname: Robert
sn: Smith
orclisVisible: True
objectclass: person
objectclass: organizationalperson
objectclass: inetorgperson
objectclass: orcluserv2

In the example of Robert Smith, I want to end up with:

dn: cn=0123abc,cn=users,dc=company,dc=com
telephonenumber: 123-246-7890
mail: robert_smith@company.com
uid: 0123abc
cn: Robert Smith
givenname: Robert
sn: Smith
orclisVisible: True
objectclass: person
objectclass: organizationalperson
objectclass: inetorgperson
objectclass: orcluserv2

One thing I came across that looks close is:
# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'

Any help would be greatly appreciated.
 
An awk way:
awk '$1=="mail:" && $1==p{next}{p=$1;print}' /path/to/input < output

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top