Shell Script: find and replace, ad infinitum

ml5003 · Nov 9, 2005

Dear TekTips,

I am interested in a shell script to use in the darwin UNIX implementation on my MAC (running OS X 10.2.8) that will perform multiple [possibly hundreds of] find and replace functions within a document.

As an example, if I had document that contained, among other thing, the names of teams in the National Football League, and I wanted to replace these with all of the corresponding team names from Major League Baseball (e.g. NY Giants -> NY Yankees, New England Patriots -> Boston Red Sox, Philadelphia Eagles -> Philadelphia Phillies and so forth), is there a script that could make all of these changes at once? (Or maybe a variation on/modification to sed?)

The script I envision would include the names of all of the find(football team names) and replace (baseball team names) pairs. I would be using it over and over on many documents with similar content.

Thanks,

ml5003

Annihilannic · Nov 9, 2005

You could just construct a replaces file containing:

[tt]s/NY Giants/NY Yankees/g
s/New England Patriots/Boston Red Sox/g
...[/tt]

And then sed -f /path/to/replaces infile > outfile.

If you're using GNU sed and want to edit in place (rather than redirect to a new file each time) there is an option to do that, check the man page.

Annihilannic.

ml5003 · Dec 2, 2005

Dear Annihilannic,

Thanks for the excellent advice. This works quite well.

Follow up question:

How can I use this replaces script in cases where the character "/" is part of either the string to be found or replaced (or both)? e.g., using the sports teams example again:

s/

http://www.NYGiants.com/http://www.NYYankeescom/g

s/

http://www.PhiladelphiaEagles.com/http://www.PhiladelphiaPhillies.com/g

Thanks,

ml5003

coffeysm · Dec 4, 2005

/ is a special character you will to escape it with a \ before. Would be something like this:

Code:

s/http:\/\/[URL unfurl="true"]www.NYGiants.com/http:\/\/www.NYYankees.com/g[/URL]

That should work for ya.

Ogzilal · Dec 4, 2005

Hi,

You can use the char you want as delimiter :

samples with delimiters / ! and ? respectively

Code:

s/old string/new string/
s!old/string!new/string!
s?another/old one!?another/new one!?

ml5003 · Dec 19, 2005

Thanks to all of you again for the excellent advice.

Next issue in the saga:

Can I use a widecard such as * in these constructions?

In other words, using this same football to baseball web site example, but now the web sites are part of an xml file, incapsulated in elements called "site," e.g, <site> and </site>. So the previous replaces code example in XML becomes

s/<site>http:\/\/

http://www.NYGiants.com<\/site>/<site>http:\/\/www.NYYankeescom<\/site>/g

s/<site>http:\/\/

http://www.PhiladelphiaEagles.com<\/site>/<site>http:\/\/www.PhiladelphiaPhillies.com<\/site>/g

I want to get rid of all web sites and their associated tags, <site> and </site>, from the file expect the above two, related to NY and Phila. So how can I do the following

s/<site>http:\/\/

http://www.NYGiants.com<\/site>/<site>http:\/\/www.NYYankeescom<\/site>/g

s/<site>http:\/\/

http://www.PhiladelphiaEagles.com<\/site>/<site>http:\/\/www.PhiladelphiaPhillies.com<\/site>/g

s/<site>*<\/site>/ /g

Please advise and thanks,

ml5003

Annihilannic · Dec 19, 2005

Try this? I'm no sed wizard... but what it does is only executes the substitions on lines that match the address string at the beginning, prints the output, then deletes the pattern space and starts the next cycle. If neither are matched it falls through to the last line, which is to simply delete any matches. Note that you need to use .* in regular expressions to match any number of any character. In your example >* would match any number of ">" characters.

Code:

/NYGiants/{s/<site>http:\/\/[URL unfurl="true"]www.NYGiants.com<\/site>/<site>http:\/\/www.NYYankees.com<\/site>/g;p;d;}[/URL]
/PhiladelphiaEagles/{s/<site>http:\/\/[URL unfurl="true"]www.PhiladelphiaEagles.com<\/site>/<site>http:\/\/www.PhiladelphiaPhillies.com<\/site>/g;p;d;}[/URL]
s/<site>.*<\/site>//g

Annihilannic.

ml5003 · Jan 4, 2006

Dear Annihilannic,

s/<site>.*<\/site>/ /g worked perfectly for most of the instances where I needed a wildcard: individual strings and short phrases. Would you know how to invoke a wildcard in the s/<site>.*<\/site>/ /g construction to deal with entire paragraphs?

Therefore in the same example, if each team website had a lengthy description,

<siteDescription>

whole paragraph describing baseball team site here,

including

line breaks

</siteDescription>

Therefore, s/<siteDescription>.*<\/siteDescription>/ /g

what would replace .* here?

Thanks for all of the excellent advice.

ml5003

Annihilannic · Jan 5, 2006

Could you post some sample data? If the <siteDescription> starts and ends on separate lines it should be easy... but if it can start part way through a line and finish with further data on the same line it's a little more complicated.

Annihilannic.

ml5003 · Jan 5, 2006

Annihilannic,

Here is a sample data segment, note the actual XML tag pair <d104> and </d104>, cf. the example "<siteDescription>," appear.

SAMPLE DATA

<d104>Preface to the First Edition


Preface to the Second Edition


Preface to the Third Edition


Preface to the Third Edition Revised


Preface to the Fourth Edition


Acknowledgements.


Notation and Definitions^M^M
Introduction to Reliability Engineering^M^M
Reliability Mathematics^M^M
Probability Plotting^M^M
Load-strength Interference^M^M
Statistical Experiments^M^M
Reliability Prediction and Modelling^M^M
Reliability in Design^M^M
Reliability of Mechanical Components and Systems^M^M
Electronic Systems Reliability^M^M
Appendix 1. The Standard Cumulative Normal Distribution Function


Appendix 2. Values of y = exp (-x</I&gt

Appendix 3. Percentiles of the X2 Distribution.


Appendix 4. Values of the F-distribution


Appendix 5. Kolmogorov-Smirnov Tables.


Appendix 6. Rank Tables (Median, 5%, 95%).


Appendix 7. matrix Algebra Revision


Appendix 8. Failure Reporting, Analysis and Corrective Action System (FRACAS).


Appendix 9. Reliability. Maintainability (and Safety) Plan Example


Index^M^M
Index^M^M
^M^M
</d104>

ml5003

cndcadams · Jan 5, 2006

ml5003;

The only input I have here is Annihilannic deserves a star don't you think!!!

ca

Annihilannic · Jan 6, 2006

Oh, that should be fairly easy then since you can safely junk the lines on which the tags appear, try:

Code:

/<siteDescription>/,/\/siteDescription/d
/NYGiants/{s/<site>http:\/\/[URL unfurl="true"]www.NYGiants.com<\/site>/<site>http:\/\/www.NYYankees.com<\/site>/g;p;d;}[/URL]
/PhiladelphiaEagles/{s/<site>http:\/\/[URL unfurl="true"]www.PhiladelphiaEagles.com<\/site>/<site>http:\/\/www.PhiladelphiaPhillies.com<\/site>/g;p;d;}[/URL]
s/<site>.*<\/site>//g

I've just added the first line to delete lines between those two tags.

Annihilannic.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Shell Script: find and replace, ad infinitum

ml5003

Technical User

Annihilannic

MIS

ml5003

Technical User

coffeysm

MIS

Ogzilal

MIS

ml5003

Technical User

Annihilannic

MIS

ml5003

Technical User

Annihilannic

MIS

ml5003

Technical User

cndcadams

IS-IT--Management

Annihilannic

MIS

Similar threads

Part and Inventory Search

Sponsor