Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

breaking a file based on the Key column

Status
Not open for further replies.
Oct 22, 2001
215
US
Hi
What would be the best way to split a file into pieces based on the 1st column (say 8 char, fixed length), some thing like this: The text file file1 looks like this:

ABCD001 asfafggfdggfgfg
ABCD001 hjhjhjh
ABCD001 hghgh
ABCD002 hjhjhj
ABCD002 jhhj

So I want: file2
ABCD001 asfafggfdggfgfg
ABCD001 hjhjhjh
ABCD001 hghgh

and file2:
ABCD002 hjhjhj
ABCD002 jhhj
TIA
 
using nawk. Save the below in myAwk.awk and run it as:

nawk -f myAwk.awk myTextFile.txt

#----------------------------- myAwk.awk--------------------

{
pos=match($1, "[0-9][0-9]*");
outFile= (!pos) ? "file0" : "file" substr($1, RSTART, RLENGTH);

print >> outFile;
}

#------------------------------------------------ vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+
 
Use the following awk script
Code:
{
  if (substr($0,1,8) != hldstr) {
    if (NR>1) close(fn)
    fnix++
    fn = &quot;file&quot; fnix
    hldstr = substr($0,1,8)
  }
  print > fn
}
Put in a file, split.awk say, and enter
Code:
awk -f split.awk inputfile
CaKiwi
 
ooops, sorry - that's better

#----------------------------- myAwk.awk--------------------

{
pos=match($1, &quot;[1-9][0-9]*&quot;);
outFile= (!pos) ? &quot;file0&quot; : &quot;file&quot; substr($1, RSTART, RLENGTH);

print >> outFile;
}

#------------------------------------------------ vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+
 
CaKiwi,
not sure if your implementation works:

ABCD001 asfafggfdggfgfg
ABCD001 hjhjhjh
ABCD001 hghgh
ABCD002 hjhjhj
ABCD003 vlad
ABCD002 jhhj

The file number is encoded in the value of the first column.

vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+
 
Vlad,

I took it to mean that when the first 8 characters of a record changed the data should be written to a new sequentially numbered file, but your interpretation seems more likely. CaKiwi
 
whatever makes the customer happy ;) vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+
 
Thanks a lot... this is exactly what I am trying to do. Cawiki, I want to ask you if I can change the output file names from the file1, file2 etc to: first column itself... that is: file1 shold be named as ABCD001, fle 2 as ABCD002 ?
Thanks again...
 
Yes, like this
.​
Code:
{
  if (substr($0,1,8) != hldstr) {
    if (NR>1) close(fn)
    fn = $1
    hldstr = substr($0,1,8)
  }
  print > fn
}
CaKiwi
 
CaKiwi,

you're a better mind-reader ;)

vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top