Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

grep question 1

Status
Not open for further replies.

kasparov

Programmer
Feb 13, 2002
203
GB
I have a file with this format:

any_ascii_chars_except_comma,any_ascii_chars_except_comma,xxx

xxx is a 3 numeric string (examples are 001, 024, 057). What I want to do is split my original file into a series of files called (for example) 001.txt, 024.txt, 057.txt - where the record goes into the file named by the value of the 3rd field. How can I do this? I've tried:

grep ^[\ -\+\--\~],[\ -\+\--\~],001 <my_file> > 001.txt
grep ^[\ -\+\--\~],[\ -\+\--\~],024 <my_file> > 024.txt
grep ^[\ -\+\--\~],[\ -\+\--\~],057 <my_file> > 057.txt

and so on (there's a space after the first backslash within the brackets). My thinking was that the comma is between the plus sign (+) and the hyphen (-) in the ASCII sequence so I thought I could grep for all chars either side of the comma with this syntax. But it doesn't work. I've tried removing & adding backslash chars with no joy.

Hope this makes sense to you all - can anyone help? I bet the answer is something much simpler than this :)

TIA Chris
 
Sorry - error in my first post - the command I was trying was this:

grep ^[\ -\+\--\~]*,[\ -\+\--\~]*,001 <my_file> > 001.txt
grep ^[\ -\+\--\~]*,[\ -\+\--\~]*,024 <my_file> > 024.txt
grep ^[\ -\+\--\~]*,[\ -\+\--\~]*,057 <my_file> > 057.txt

the first 2 fields are strings, not single characters. (I'd missed the * in my first post)
 
Strangely enough, I needed to do something similar this morning. Try...

awk -F, '{print > $3 &quot;.txt&quot;}' my_file

 
> awk -F, '{print > $3 &quot;.txt&quot;}' my_file
I think each 'xxx' file remains open until the end of the file. If there are a large number of different 'xxx' in the input file, this could fail when it runs out of file handles.

Definitely worth trying though :)
 
Thanks Ygor & Salem

Unfortunately it did run out of file handles. Nearly worked though (I didn't realise awk would do this).

Perhaps I should sort the file by the 3rd field & split it before I run it through awk.

Any more suggestions welcome though.
 
awk -F, '{out= $3 &quot;.txt&quot;; print >> out; close(out)}' my_file

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
In the end I split the file & then ran Ygor's command but appended to $3 instead of creating & writing to it. This worked fine. (vlad's solution also ran out of file handles.)

Thanks very much folks.

I'm still curious why I can't get grep to work though. If anyone can explain I'd be interested to know ...
 
You did say that the file is:-
any_ascii_chars_except_comma,any_ascii_chars_except_comma,xxx

Which is ^[^,]*,[^,]*,xxx$

So this should work...

for i in $(cut -d, -f 3 my_file|sort -u)
do
grep ^[^,]*,[^,]*,$i$ my_file > $i.txt
done
 
Thanks Ygor - must improve my lateral thinking skills
 
Hi,
Since the original post said any character but comma do you need the extra brackets.


for i in $(cut -d, -f 3 my_file|sort -u)
do
egrep &quot;.*,.*,$i&quot; my_file > $i.txt
done



 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top