Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Text search ..

Status
Not open for further replies.

dbeez

Technical User
Aug 15, 2005
71
KR
I'm looking for a simple command that will run down a directory structure and print out the name of any file containing a certain string.

I've tried using find, but it doesn't want to work and keeps giving me errors.

What is the easiest way to do this ??

Thanks
 
it doesn't want to work and keeps giving me errors
Any chance you could post the error messages and the code you tried ?

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Hi ph,

I think I've got that bit now, but I have a new problem (of course!). My code this time looks like this
Code:
if [ (grep string `find . -print`) != 0 ]
then <search for email address>
<print email address to file>
fi
... so what I want to do basically is run down the directory structure and find pages with string in them, then to write the email addresses contained in those pages to a file.

... statement at the top is obviously messed up, but I don't know what kinda return statement to expect from a grep that is unsuccessful. I'm guessing that it would be 'exit 0'

any help here is gratefully accepted :)
 
You may try something like this:
find . -type f | while read f
do grep -q 'string' "$f" && grep '@' "$f"
done > /path/to/output

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
thanks phv,

that helps enormously with scrolling through the files. I'm still trying to pick out the addresses though.
So far I've got
Code:
find . -type f | while read f
do grep -q 'string' "$f" && sed 's/.*mailto://;s/".*//' "$f" | grep '@'
done > file.email
... this is about as close as I've gotten. Unfortunately though it still gives me lines with @ and no email address.

... so what I'm doing here is looking for the email address wrappers (eg. mailto:) and then excluding them. What I really want to do is look for the email address and then exclude everything else.

Can't seem to be able to do that though. Frankly I'm finding man sed a little confusing at the moment.

thanks again ...
 
Code:
find . -type f -print0 | xargs -0 grep -i 'mailto:.*string' | sed 's/\([^:]*\):.*mailto:\([^" >]*\).*/\1: \2/i'

Or, if you have a grep that supports -r for recursive searches:

Code:
grep -ri 'mailto:.*string' . | sed 's/\([^:]*\):.*mailto:\([^" >]*\).*/\1: \2/i'

Breaking down the sed command:

s/ search and replace

\([^:]*\) all the non-colon characters before the first colon (i.e. the filename)

.*mailto: the line up to and include the mailto: component (to be discarded)

\([^" >]*\) the email address up to the first space, speech mark or tag terminator

.* the rest of the line (to be discarded)

/

\1: \2 replace it with the filename \1 and the matching email address \2. \1 and \2 are derived from the bracketed components of the search string.

/i and ignore case.

Annihilannic.
 
X-cellent ... thanks feherke

I had most of this script done a couple of weeks ago but lost it when I reinstalled and forgot to save it.

I'll read over the reference in the morning ... I may still have a few questions later on though :)

thanks again guys ...
 
thanks anni*

I can't seem to get your code to work unfortunately, it just spits out
Code:
root@ubuntu:/home/babo/Desktop/spider_proj # grep -ri 'mailto:.*string' . | sed 's/\([^:]*\):.*mailto:\([^" >]*\).*/\1: \2/i'
./spider: \([^
./spider: \([^
... also, I'm looking to parse a file that matches my string, as opposed to a line that matches my string.

... thanks for explanation though, even though I do have one question. About the
Code:
\1: \2 replace it with the filename \1 and the matching email address \2.  \1 and \2 are derived from the bracketed components of the search string.
... I realize that it is derived from the brackets ... but what exactly is it meant to do ???

... also what exactly does the -e switch do ... I realize that it includes a script file in the command, but why do people use it even when they aren't using a script file ???

I realize that these are probably really dumb questions - so sorry in advance
 
Ok cool, I think I've found something a good text on sed ...

Feherke I think your suggestion was missing quite of bit of the sed command as well. The best one I've come up with is probably


... hopefully I'll be able to take it myself from here ... thanks guys
 
dbeez,

I think it worked, however it found a matching string in ./spider, which of course was part of the sed command, not an email address.

The output of my commands should be something like:

[tt]./filename: email@address.one
./subdir/filename2: email@address.two[/tt]

\1, \2, \... are used for pulling chunks out of the matched
string and putting them into the replacement. Some more examples:

s/.*\([0-9]\)*.*/\1/ would pull the first occurrence of a number out of a line.

s/\([a-z]\) \([a-z]\)/\2 \1/ would swap around two lower-case strings, e.g. "apples oranges" would become "oranges apples".

These work in vi too of course.

Regarding -e, maybe this chunk of man page (from Solaris) will clarify:

[tt] -e script
script is an edit command for sed. See USAGE below for
more information on the format of script. If there is
just one -e option and no -f options, the flag -e may
be omitted.[/tt]

I'll leave adapting my solutions to your requirements as an exercise for you since you seem to be well on the way there. :)

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top