I am new to scripting and I have an ldif file that I am cleaning up in order to import in into Oracle OID which is Oracles LDAP. It has to be in a specific format in order to load. The problem I am getting stuck at is there are some instances where there is more than one mail address listed for an employee. Does anyone know how I can use sed/regex to delete all except the first occurence. There are about 100,000 employees.
example:
dn: cn=0123456,cn=users,dc=company,dc=com
telephonenumber: 123-246-7890
mail: employee@company.com
uid: 0123456
cn: New Employee
givenname: New
sn: Employee
orclisVisible: True
objectclass: person
objectclass: organizationalperson
objectclass: inetorgperson
objectclass: orcluserv2
dn: cn=0123abc,cn=users,dc=company,dc=com
telephonenumber: 123-246-7890
mail: robert_smith@company.com
mail: bs@company.com
mail: bob@company.com
mail: rsmith@company.com
uid: 0123abc
cn: Robert Smith
givenname: Robert
sn: Smith
orclisVisible: True
objectclass: person
objectclass: organizationalperson
objectclass: inetorgperson
objectclass: orcluserv2
In the example of Robert Smith, I want to end up with:
dn: cn=0123abc,cn=users,dc=company,dc=com
telephonenumber: 123-246-7890
mail: robert_smith@company.com
uid: 0123abc
cn: Robert Smith
givenname: Robert
sn: Smith
orclisVisible: True
objectclass: person
objectclass: organizationalperson
objectclass: inetorgperson
objectclass: orcluserv2
One thing I came across that looks close is:
# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'
Any help would be greatly appreciated.
example:
dn: cn=0123456,cn=users,dc=company,dc=com
telephonenumber: 123-246-7890
mail: employee@company.com
uid: 0123456
cn: New Employee
givenname: New
sn: Employee
orclisVisible: True
objectclass: person
objectclass: organizationalperson
objectclass: inetorgperson
objectclass: orcluserv2
dn: cn=0123abc,cn=users,dc=company,dc=com
telephonenumber: 123-246-7890
mail: robert_smith@company.com
mail: bs@company.com
mail: bob@company.com
mail: rsmith@company.com
uid: 0123abc
cn: Robert Smith
givenname: Robert
sn: Smith
orclisVisible: True
objectclass: person
objectclass: organizationalperson
objectclass: inetorgperson
objectclass: orcluserv2
In the example of Robert Smith, I want to end up with:
dn: cn=0123abc,cn=users,dc=company,dc=com
telephonenumber: 123-246-7890
mail: robert_smith@company.com
uid: 0123abc
cn: Robert Smith
givenname: Robert
sn: Smith
orclisVisible: True
objectclass: person
objectclass: organizationalperson
objectclass: inetorgperson
objectclass: orcluserv2
One thing I came across that looks close is:
# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'
Any help would be greatly appreciated.