grep question 1

kasparov · Jun 25, 2003

I have a file with this format:

any_ascii_chars_except_comma,any_ascii_chars_except_comma,xxx

xxx is a 3 numeric string (examples are 001, 024, 057). What I want to do is split my original file into a series of files called (for example) 001.txt, 024.txt, 057.txt - where the record goes into the file named by the value of the 3rd field. How can I do this? I've tried:

grep ^[\ -\+\--\~],[\ -\+\--\~],001 <my_file> > 001.txt
grep ^[\ -\+\--\~],[\ -\+\--\~],024 <my_file> > 024.txt
grep ^[\ -\+\--\~],[\ -\+\--\~],057 <my_file> > 057.txt

and so on (there's a space after the first backslash within the brackets). My thinking was that the comma is between the plus sign (+) and the hyphen (-) in the ASCII sequence so I thought I could grep for all chars either side of the comma with this syntax. But it doesn't work. I've tried removing & adding backslash chars with no joy.

Hope this makes sense to you all - can anyone help? I bet the answer is something much simpler than this

TIA Chris

kasparov · Jun 25, 2003

Sorry - error in my first post - the command I was trying was this:

grep ^[\ -\+\--\~]*,[\ -\+\--\~]*,001 <my_file> > 001.txt
grep ^[\ -\+\--\~]*,[\ -\+\--\~]*,024 <my_file> > 024.txt
grep ^[\ -\+\--\~]*,[\ -\+\--\~]*,057 <my_file> > 057.txt

the first 2 fields are strings, not single characters. (I'd missed the * in my first post)

Ygor · Jun 25, 2003

Strangely enough, I needed to do something similar this morning. Try...

awk -F, '{print > $3 ".txt"}' my_file

Salem · Jun 25, 2003

> awk -F, '{print > $3 ".txt"}' my_file
I think each 'xxx' file remains open until the end of the file. If there are a large number of different 'xxx' in the input file, this could fail when it runs out of file handles.

Definitely worth trying though

kasparov · Jun 25, 2003

Thanks Ygor & Salem

Unfortunately it did run out of file handles. Nearly worked though (I didn't realise awk would do this).

Perhaps I should sort the file by the 3rd field & split it before I run it through awk.

Any more suggestions welcome though.

vgersh99 · Jun 25, 2003

awk -F, '{out= $3 ".txt"; print >> out; close(out)}' my_file

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

kasparov · Jun 25, 2003

In the end I split the file & then ran Ygor's command but appended to $3 instead of creating & writing to it. This worked fine. (vlad's solution also ran out of file handles.)

Thanks very much folks.

I'm still curious why I can't get grep to work though. If anyone can explain I'd be interested to know ...

Ygor · Jun 25, 2003

You did say that the file is:-
any_ascii_chars_except_comma,any_ascii_chars_except_comma,xxx

Which is ^[^,]*,[^,]*,xxx$

So this should work...

for i in $(cut -d, -f 3 my_file|sort -u)
do
grep ^[^,]*,[^,]*,$i$ my_file > $i.txt
done

kasparov · Jun 25, 2003

Thanks Ygor - must improve my lateral thinking skills

tdatgod · Jun 25, 2003

Hi,
Since the original post said any character but comma do you need the extra brackets.

for i in $(cut -d, -f 3 my_file|sort -u)
do
egrep ".*,.*,$i" my_file > $i.txt
done

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

grep question 1

kasparov

Programmer

kasparov

Programmer

Ygor

Programmer

Salem

Programmer

kasparov

Programmer

vgersh99

Programmer

kasparov

Programmer

Ygor

Programmer

kasparov

Programmer

tdatgod

Programmer

Similar threads

Part and Inventory Search

Sponsor