Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Shell Script: find and replace, ad infinitum

Status
Not open for further replies.

ml5003

Technical User
Jun 22, 2005
7
US
Dear TekTips,

I am interested in a shell script to use in the darwin UNIX implementation on my MAC (running OS X 10.2.8) that will perform multiple [possibly hundreds of] find and replace functions within a document.

As an example, if I had document that contained, among other thing, the names of teams in the National Football League, and I wanted to replace these with all of the corresponding team names from Major League Baseball (e.g. NY Giants -> NY Yankees, New England Patriots -> Boston Red Sox, Philadelphia Eagles -> Philadelphia Phillies and so forth), is there a script that could make all of these changes at once? (Or maybe a variation on/modification to sed?)

The script I envision would include the names of all of the find(football team names) and replace (baseball team names) pairs. I would be using it over and over on many documents with similar content.

Thanks,

ml5003
 
You could just construct a replaces file containing:

[tt]s/NY Giants/NY Yankees/g
s/New England Patriots/Boston Red Sox/g
...[/tt]

And then sed -f /path/to/replaces infile > outfile.

If you're using GNU sed and want to edit in place (rather than redirect to a new file each time) there is an option to do that, check the man page.

Annihilannic.
 
/ is a special character you will to escape it with a \ before. Would be something like this:

Code:
s/http:\/\/[URL unfurl="true"]www.NYGiants.com/http:\/\/www.NYYankees.com/g[/URL]

That should work for ya.
 
Hi,

You can use the char you want as delimiter :

samples with delimiters / ! and ? respectively

Code:
s/old string/new string/
s!old/string!new/string!
s?another/old one!?another/new one!?
 
Thanks to all of you again for the excellent advice.

Next issue in the saga:

Can I use a widecard such as * in these constructions?


In other words, using this same football to baseball web site example, but now the web sites are part of an xml file, incapsulated in elements called "site," e.g, <site> and </site>. So the previous replaces code example in XML becomes


s/<site>http:\/\/
s/<site>http:\/\/

I want to get rid of all web sites and their associated tags, <site> and </site>, from the file expect the above two, related to NY and Phila. So how can I do the following

s/<site>http:\/\/s/<site>http:\/\/s/<site>*<\/site>/ /g


Please advise and thanks,

ml5003
 
Try this? I'm no sed wizard... but what it does is only executes the substitions on lines that match the address string at the beginning, prints the output, then deletes the pattern space and starts the next cycle. If neither are matched it falls through to the last line, which is to simply delete any matches. Note that you need to use .* in regular expressions to match any number of any character. In your example >* would match any number of ">" characters.

Code:
/NYGiants/{s/<site>http:\/\/[URL unfurl="true"]www.NYGiants.com<\/site>/<site>http:\/\/www.NYYankees.com<\/site>/g;p;d;}[/URL]
/PhiladelphiaEagles/{s/<site>http:\/\/[URL unfurl="true"]www.PhiladelphiaEagles.com<\/site>/<site>http:\/\/www.PhiladelphiaPhillies.com<\/site>/g;p;d;}[/URL]
s/<site>.*<\/site>//g

Annihilannic.
 
Dear Annihilannic,

s/<site>.*<\/site>/ /g worked perfectly for most of the instances where I needed a wildcard: individual strings and short phrases. Would you know how to invoke a wildcard in the s/<site>.*<\/site>/ /g construction to deal with entire paragraphs?

Therefore in the same example, if each team website had a lengthy description,

<siteDescription>

whole paragraph describing baseball team site here,

including

line breaks

</siteDescription>

Therefore, s/<siteDescription>.*<\/siteDescription>/ /g

what would replace .* here?

Thanks for all of the excellent advice.

ml5003
 
Could you post some sample data? If the <siteDescription> starts and ends on separate lines it should be easy... but if it can start part way through a line and finish with further data on the same line it's a little more complicated.

Annihilannic.
 
Annihilannic,

Here is a sample data segment, note the actual XML tag pair <d104> and </d104>, cf. the example "<siteDescription>," appear.

SAMPLE DATA


<d104>Preface to the First Edition

&lt;p&gt;
Preface to the Second Edition

&lt;p&gt;
Preface to the Third Edition

&lt;p&gt;
Preface to the Third Edition Revised

&lt;p&gt;
Preface to the Fourth Edition

&lt;p&gt;
Acknowledgements.

&lt;p&gt;
&lt;P&gt;Notation and Definitions^M^M
&lt;P&gt;Introduction to Reliability Engineering^M^M
&lt;P&gt;Reliability Mathematics^M^M
&lt;P&gt;Probability Plotting^M^M
&lt;P&gt;Load-strength Interference^M^M
&lt;P&gt;Statistical Experiments^M^M
&lt;P&gt;Reliability Prediction and Modelling^M^M
&lt;P&gt;Reliability in Design^M^M
&lt;P&gt;Reliability of Mechanical Components and Systems^M^M
&lt;P&gt;Electronic Systems Reliability^M^M
&lt;P&gt;Appendix 1. The Standard Cumulative Normal Distribution Function

&lt;p&gt;
Appendix 2. Values of &lt;I&gt;y&lt;/I&gt; = exp (-&lt;I&gt;x&lt;/I&gt;)

&lt;p&gt;
Appendix 3. Percentiles of the &lt;I&gt;X&lt;/I&gt;&lt;sup&gt;2&lt;/sup&gt; Distribution.

&lt;p&gt;
Appendix 4. Values of the &lt;I&gt;F&lt;/I&gt;-distribution

&lt;p&gt;
Appendix 5. Kolmogorov-Smirnov Tables.

&lt;p&gt;
Appendix 6. Rank Tables (Median, 5%, 95%).

&lt;p&gt;
Appendix 7. matrix Algebra Revision

&lt;p&gt;
Appendix 8. Failure Reporting, Analysis and Corrective Action System (FRACAS).

&lt;p&gt;
Appendix 9. Reliability. Maintainability (and Safety) Plan Example

&lt;p&gt;
Index^M^M
&lt;P&gt;Index^M^M
^M^M
</d104>


ml5003
 
ml5003;

The only input I have here is Annihilannic deserves a star don't you think!!!

ca
 
Oh, that should be fairly easy then since you can safely junk the lines on which the tags appear, try:

Code:
/<siteDescription>/,/\/siteDescription/d
/NYGiants/{s/<site>http:\/\/[URL unfurl="true"]www.NYGiants.com<\/site>/<site>http:\/\/www.NYYankees.com<\/site>/g;p;d;}[/URL]
/PhiladelphiaEagles/{s/<site>http:\/\/[URL unfurl="true"]www.PhiladelphiaEagles.com<\/site>/<site>http:\/\/www.PhiladelphiaPhillies.com<\/site>/g;p;d;}[/URL]
s/<site>.*<\/site>//g

I've just added the first line to delete lines between those two tags.

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top