Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

trying to undersatnd the grep statment

Status
Not open for further replies.

786snow

Programmer
Nov 12, 2006
75
I am new to unix , i am trying to understand the grep statment

what is name means here?
alsi what is xargs?

and topic ?


find . -name "*" | xargs grep -i "topic
 
As sleipnir214 pointed out, you should look at the man pages. But... with that being said, I have found that the man pages dont always help to explain a concept, so...


Code:
find . -name "*" | xargs grep -i "topic"

1.) find . -name "*"
... looks for all files in the current directory and all subdirs under the current directory
2.) xargs grep -i "topic"
... xargs takes any output generated by the previous command (find in this case) and passes it to grep
... grep then proceeds to check each item to see if the word "topic" is found, if found, display it.
... the -i in the grep command tells it to ignore case.
 
thanks a lot for explaining so well.

One question i have you said

1.) find . -name "*"
... looks for all files in the current directory and all subdirs under the current directory

what is the adavantage of doing that? when you have *
and then sending that out put to grep, does it make it faster?
 
In this case the advantage of

find . -name "*"

is that you will find all files that may have "topic" as part of the file name. The grep is used to toss out all non matching entries.
 
xargs is used to break a large input strings into smaller pieces which is then passed on to the command after it.

If I had a folder with around 30k files in it and I did a rm -rf *, the rm command is going to barf because the list of names for 30k files is too big/long to be processed. This list of 30k file names can however be passed into xargs and broken down. xargs will then call rm 30000 times, each time with a different name.

--== Anything can go wrong. It's just a matter of how far wrong it will go till people think its right. ==--
 
@sbrew:
grep will not look for filenames which match 'topic' in some kind (this would do a
Code:
find . -iname "*topic*"
), but grep for topic in the files, provided by find.

Since "*" will match every file, it's usage is verbose Papperlapapp.
Code:
find . | xargs grep -i "topic"
will do the same, and xargs isn't needed as well - the -exec-switch would do it too:
Code:
find . -exec grep -i "topic" {} \;

don't visit my homepage:
 
Stefanwagner - you are absolutely correct... someone must've wacked me upside the head. [banghead]

to fix my explanation from above:

2.) xargs grep -i "topic"
... xargs takes any output generated by the previous command (find in this case) and passes it to grep
... grep then proceeds to check each file to see if the word "topic" is found in it. If found, display the file name and the matching line from the file.
... the -i in the grep command tells it to ignore case.
 
zeland

xargs is used to break a large input strings into smaller pieces which is then passed on to the command after it.

yes and no. It takes the output of anything on stdin and makes it parameters to the past command. There is no breaking.

Common useage in scripts is when you need to find files that have certian attributes and do stuff to them when you find them. For instance to remove orphaned packages in debian:

deborphan | xargs apt-get remove --purge

If I had a folder with around 30k files in it and I did a rm -rf *, the rm command is going to barf because the list of names for 30k files is too big/long to be processed. This list of 30k file names can however be passed into xargs and broken down. xargs will then call rm 30000 times, each time with a different name.

I call BS. The rm -rf * will work to erase any number of files without any issue. The results of running:

find . | xarg rm -f

are the same as running:
rm -rf *

which will do exactly what it looks like. Remove with recursive force all files and directories from the current directory which you have writable permission to and that aren't currently locked because of current file pointers (being activle accessed, not enough to simply have the file open in an editor it must be in the process of saving or opening).

In the parent, he uses the find for recursion, not fear of size of directory, because he didn't know about the -R option of grep. See above in thread.

[plug=shameless]
[/plug]
 
@jstreich:
Well - I sometimes had an issue with a command:
CMD *
because the shell is interpreting the asterix, and handling the results to the command. But
CMD .
is not interpreted, and that worked.
I can't remmember which command it was, but I was operating on the freeCD - Database, which stores millions of files in flat directories - I never reached those bounds before or later.

don't visit my homepage:
 
jstreich:
It takes the output of anything on stdin and makes it parameters to the past command. There is no breaking.

I'm not sure if "breaking" was the right word to use, but from my understanding the code below:
Code:
echo "00000001 00000002 00000003" | xargs rm -f
would result in:
Code:
rm -f 00000001
rm -f 00000002
rm -f 00000003


I call BS. The rm -rf * will work to erase any number of files without any issue.

This could be a problem on all of my FC4, FC5 & FC6 boxes. Test this out on your distro of choice to see if you get the same error:
Code:
[root@localhost test]$ for i in $(seq 1 40000); do j=$(printf %08d $i); touch $j; done
[root@localhost test]$ rm -rf *
bash: /bin/rm: Argument list too long

--== Anything can go wrong. It's just a matter of how far wrong it will go till people think its right. ==--
 
Zeland:
Code:
echo "00000001 00000002 00000003" | xargs rm -f
results in:
Code:
rm -f  00000001 00000002 00000003

Example:
Code:
deborphan | xargs apt-get remove --purge
Gives the output of, and stop after a single confirmation if required, even if there are multiple orphaned libraries.
I would show the output, but my system is doing an "apt-get upgrade" that forced my computer to restart networking so I'm stuck at work without a connection to my main box.

[plug=shameless]
[/plug]
 
jstreich:

Being unsatisfied with the fact that find . | xargs rm -f is the same as rm -f * (which definitely doesn't work on any of my computers with a flat folder of over 30k files), I did some additional poking around.

I went back to my example above and created a folder with 80k files of 8 characters each. I did a straight rm -f * just to ensure I would not be making a fool of myself later and was pleased that it failed as expected. I wrote a shell script that simply just displays the first parameter that is passed to it.


showCall.sh
Code:
#!/bin/bash
echo "* - $1"

Now for the test. find . | xargs ../showCall.sh gives me the results below:

* - .
* - ./00011569
* - ./00023138
* - ./00034707
* - ./00046276
* - ./00057845
* - ./00069414

This shows me that my shell script was called 7 times in total and the number of parameter between each call is 11568. This also tells me that this could be a buffer issue. Calculating the difference between 2 results:

(filename(8 Char) + ./(2 Char) + newline(1 Char)) * 11568 = 127248 bytes

I tried this on a folder 80k files of 12 characters each.

* - .
* - ./000000008484
* - ./000000016967
* - ./000000025450
* - ./000000033933
* - ./000000042416
* - ./000000050899
* - ./000000059382
* - ./000000067865
* - ./000000076348

Parameter length 15 * number of parameters per call 8482 = 127230 bytes, which is close enough to the first test.

From this test I'm inclined to conclude that xargs does break the input from stdin into approximately 128kb sized blocks which is then passed on to the command for execution.

--== Anything can go wrong. It's just a matter of how far wrong it will go till people think its right. ==--
 
@sbrew:
Code:
find . -name "*" | xargs grep -i "topic"
does not work on files containing spaces.

Code:
find . -name "*" -exec grep -i "topic" {} \;
does.
and
Code:
grep -iR "topic" .
too.

No - I'm not a fan of spaces in filenames. ;)

don't visit my homepage:
 
If you run this comand,
Code:
grep ARG_MAX /usr/include/linux/limits.h
you'll see a line that defines the constant ARG_MAX. The Linux kernel is compiled with a similar header file that also defines that constant, which is 131072 for me. It's accompanied by the comment:
Code:
# bytes of args + environ for exec()

ARG_MAX defines the amount of memory the kernel reserves for an exec call. This amount can't be changed during runtime (not on a stock Linux kernel, anyway; I think some other Unixes can change this value at runtime).

I believe this limit applies to the number of arguments, and it also includes environment "arguments" (e.g. "HOME=/home/chip" and "TERM=xterm").


A command like
Code:
rm -rf *
when run in a directory containing a large number of files, can expand a command to longer than ARG_MAX. The shell can request more memory to be allocated to hold all that, but that's useless if the kernel can't handle a line that long.

So such a command will indeed fail. On my Debian Etch:
Code:
chip@horus:~$ ls /* /*/* /*/*/* /*/*/*/*
bash: /bin/ls: Argument list too long
Notice that bash itself gives that error message about /bin/ls; it couldn't exec the ls process (or more likely it didn't even try) because the argument list was too long.


So when necessary, xargs does split up its arguments into multiple invocations that are small enough to exec.
 
chipperMDW:: "If you run this comand, ..."
-Me?

If I do a
Code:
ls  *
in a directory containing 40000 files, I get an argument-list-too-long (while sharing the 131072 value with you, found in /usr/src/linux/... too).

The -exec part of find has no problems with ARG_MAX, since it executes on every single file, it finds.
Code:
find ./ -exec ls  {} \; | wc
has no problems finding 140 000 files.

don't visit my homepage:
 
Correct. Your command runs ls individually on each file found, so the command lines are tiny. No problem with that.


I was responding to this:
jstreich said:
I call BS. The rm -rf * will work to erase any number of files without any issue. The results of running:

find . | xarg rm -f

are the same as running:
rm -rf *
This is incorrect. As others had mentioned, there is a difference. I was confirming that there is a difference and explaining the reason for that difference.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top