Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

too many pipes, can you streamline this?

Status
Not open for further replies.
Feb 12, 2002
80
NO
Hi,

I have never got to grips with how to use awk in any way other than one liners - when people show me more complicated awk scripts they never seem tomkae much sense to me.
But now I have created a daft "one liner" and want to see if any one can
a) streamline it so that it's not just a series of awk commands piped into another
b) write it so that it makes some sort of sense

I have a whole list of unix file listing outputs that were generated using "ls -ls", giving the full file path of multiple directories. Basically I am trying to get a list of the first part of each file name listed, excluding some examples.

The following does pretty much what I want, it just takes a long time to run, and bugs me that it's so stupid!
Code:
 egrep -vf exclude.list */*content* | awk '{print $11}' | awk -F"/" '{print $NF}' | awk -F"." '{print $1}' | awk -F"_" '{print $1}' | sort -u
Explanation:
Code:
egrep -vf exclude.list */*content*
... exclude what's in the 'exclude.list' file
Code:
awk '{print $11}'
... take the 11th field
Code:
awk -F"/" '{print $NF}'
... delim by slash and take the last field (the filename)
Code:
awk -F"." '{print $1}'
... strip off the suffix file extension
Code:
awk -F"_" '{print $1}'
... some BUT NOT ALL file names have underscores then the extension, os only take the first part of the file name
Code:
sort -u
... sort it into a uniqe listing

So, erm - yeah, it *seems* to work. But I'm sure you'll agree, it's a crazy one liner!!

Anyone willing to make it better??


Example input file:
Code:
266965  136 -rw-------   1 sam      samadm     130082 Apr 28 13:54 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.MIGZER_060/ebcdic_headers/C00093313.ebcdic
266966   48 -rw-------   1 sam      samadm      49029 Apr 28 13:54 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.MIGZER_060/ebcdic_headers/C00093314.ebcdic
266944    1 drwxrwxr-x   2 sam      samadm       1024 Apr 28 13:54 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.MIGZER_060/ebcdic_headers
248343 113672 -rw-------   1 sam      samadm   116331600 Apr 28 13:54 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.MIGZER_060/C00093313_tl5240.sgy
248344 42640 -rw-------   1 sam      samadm   43626600 Apr 28 13:54 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.MIGZER_060/C00093314_tl5240.sgy
248320    1 drwxrwxr-x   4 sam      samadm       1024 Apr 28 13:54 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.MIGZER_060
279381    1 -rw-------   1 sam      samadm         31 Apr 28 13:59 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.STKZER_060/stats/C00093345.stats
279382    1 -rw-------   1 sam      samadm         30 Apr 28 13:59 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.STKZER_060/stats/C00093346.stats
279360    1 drwxrwxr-x   2 sam      samadm       1024 Apr 28 13:59 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.STKZER_060/stats
285569  136 -rw-------   1 sam      samadm     130080 Apr 28 13:55 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.STKZER_060/ebcdic_headers/C00093325.ebcdic
 
Ammendment:

If the output is able to have the OPTION of outputing the part I want (first part of the filename, without underscore or "." extension) AS WELL AS the full file path, as a QC method.

I note that my output has some dodgy results, and to check it, I would need to know what the original file path was that the dodgy result came from.

So output something like this:
Code:
C00093313 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.MIGZER_060/ebcdic_headers/C00093313.ebcdic
C00093314 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.MIGZER_060/ebcdic_headers/C00093314.ebcdic
C00093313 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.MIGZER_060/C00093313_tl5240.sgy
C00093314 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.MIGZER_060/C00093314_tl5240.sgy
C00093345 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.STKZER_060/stats/C00093345.stats
C00093346 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.STKZER_060/stats/C00093346.stats
C00093325 /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.STKZER_060/ebcdic_headers/C00093325.ebcdic
ebcdic /scratch2/no_archive/nlthoa/PB0001/L-148/L-148.3D.MIGZER_060/ebcdic_headers

(My exclude list removes directory entries as well as some others)
 
hm ...,
there are contradictory requirements here.
On the one hand, you want to simplify your solution.
On the other hand, in your second post, you want more flexibility.
I don't think that this could be done together, sorry.

For the first requirement, I suggest to leave away the awk '{print $11}'.
For the second requirement: Sure this can be done with awk. I would rather write a few lines of shell script though, using other tools as well. Others may prefer awk for it.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top