Text search ..

dbeez · Nov 4, 2005

I'm looking for a simple command that will run down a directory structure and print out the name of any file containing a certain string.

I've tried using find, but it doesn't want to work and keeps giving me errors.

What is the easiest way to do this ??

Thanks

PHV · Nov 4, 2005

it doesn't want to work and keeps giving me errors
Any chance you could post the error messages and the code you tried ?

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886

dbeez · Nov 4, 2005

Hi ph,

I think I've got that bit now, but I have a new problem (of course!). My code this time looks like this

Code:

if [ (grep string `find . -print`) != 0 ]
then <search for email address>
<print email address to file>
fi

... so what I want to do basically is run down the directory structure and find pages with string in them, then to write the email addresses contained in those pages to a file.

... statement at the top is obviously messed up, but I don't know what kinda return statement to expect from a grep that is unsuccessful. I'm guessing that it would be 'exit 0'

any help here is gratefully accepted

PHV · Nov 4, 2005

You may try something like this:
find . -type f | while read f
do grep -q 'string' "$f" && grep '@' "$f"
done > /path/to/output

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886

dbeez · Nov 6, 2005

thanks phv,

that helps enormously with scrolling through the files. I'm still trying to pick out the addresses though.
So far I've got

Code:

find . -type f | while read f
do grep -q 'string' "$f" && sed 's/.*mailto://;s/".*//' "$f" | grep '@'
done > file.email

... this is about as close as I've gotten. Unfortunately though it still gives me lines with @ and no email address.

... so what I'm doing here is looking for the email address wrappers (eg. mailto

and then excluding them. What I really want to do is look for the email address and then exclude everything else.

Can't seem to be able to do that though. Frankly I'm finding man sed a little confusing at the moment.

thanks again ...

feherke · Nov 6, 2005

Hi

dbeez said:
Frankly I'm finding man sed a little confusing at the moment.

Right. And incomplete too. At least my GNU [tt]sed[/tt]'s man page does not mention all features. This is a better documentation :

http://www.gnu.org/software/sed/manual/html_mono/sed.html

Feherke.

http://rootshell.be/~feherke/

Annihilannic · Nov 7, 2005

Code:

find . -type f -print0 | xargs -0 grep -i 'mailto:.*string' | sed 's/\([^:]*\):.*mailto:\([^" >]*\).*/\1: \2/i'

Or, if you have a grep that supports -r for recursive searches:

Code:

grep -ri 'mailto:.*string' . | sed 's/\([^:]*\):.*mailto:\([^" >]*\).*/\1: \2/i'

Breaking down the sed command:

s/ search and replace

$[^:]*$ all the non-colon characters before the first colon (i.e. the filename)

.*mailto: the line up to and include the mailto: component (to be discarded)

$[^" >]*$ the email address up to the first space, speech mark or tag terminator

.* the rest of the line (to be discarded)

/

\1: \2 replace it with the filename \1 and the matching email address \2. \1 and \2 are derived from the bracketed components of the search string.

/i and ignore case.

Annihilannic.

dbeez · Nov 7, 2005

X-cellent ... thanks feherke

I had most of this script done a couple of weeks ago but lost it when I reinstalled and forgot to save it.

I'll read over the reference in the morning ... I may still have a few questions later on though

thanks again guys ...

dbeez · Nov 7, 2005

thanks anni*

I can't seem to get your code to work unfortunately, it just spits out

Code:

root@ubuntu:/home/babo/Desktop/spider_proj # grep -ri 'mailto:.*string' . | sed 's/\([^:]*\):.*mailto:\([^" >]*\).*/\1: \2/i'
./spider: \([^
./spider: \([^

... also, I'm looking to parse a file that matches my string, as opposed to a line that matches my string.

... thanks for explanation though, even though I do have one question. About the

Code:

\1: \2 replace it with the filename \1 and the matching email address \2.  \1 and \2 are derived from the bracketed components of the search string.

... I realize that it is derived from the brackets ... but what exactly is it meant to do ???

... also what exactly does the -e switch do ... I realize that it includes a script file in the command, but why do people use it even when they aren't using a script file ???

I realize that these are probably really dumb questions - so sorry in advance

dbeez · Nov 7, 2005

Ok cool, I think I've found something a good text on sed ...

Feherke I think your suggestion was missing quite of bit of the sed command as well. The best one I've come up with is probably

http://www.grymoire.com/Unix/Sed.html

... hopefully I'll be able to take it myself from here ... thanks guys

Annihilannic · Nov 8, 2005

dbeez,

I think it worked, however it found a matching string in ./spider, which of course was part of the sed command, not an email address.

The output of my commands should be something like:

[tt]./filename: email@address.one
./subdir/filename2: email@address.two[/tt]

\1, \2, \... are used for pulling chunks out of the matched
string and putting them into the replacement. Some more examples:

s/.*$[0-9]$*.*/\1/ would pull the first occurrence of a number out of a line.

s/$[a-z]$ $[a-z]$/\2 \1/ would swap around two lower-case strings, e.g. "apples oranges" would become "oranges apples".

These work in vi too of course.

Regarding -e, maybe this chunk of man page (from Solaris) will clarify:

[tt] -e script
script is an edit command for sed. See USAGE below for
more information on the format of script. If there is
just one -e option and no -f options, the flag -e may
be omitted.[/tt]

I'll leave adapting my solutions to your requirements as an exercise for you since you seem to be well on the way there.

Annihilannic.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Text search ..

dbeez

Technical User

PHV

MIS

dbeez

Technical User

PHV

MIS

dbeez

Technical User

feherke

Programmer

Annihilannic

MIS

dbeez

Technical User

dbeez

Technical User

dbeez

Technical User

Annihilannic

MIS

Similar threads

Part and Inventory Search

Sponsor