Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Separating a Large File 2

Status
Not open for further replies.

RulMorf

Technical User
Jul 19, 2013
14
MX
Hello Guys, please help me. I'm an awk begginner whit this problem:

I have a large file like this:

DSV1 UWI: 3060000134 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: DevSurvey Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 180.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 349.6000
DSV1 UWI: 3060000134 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: Nuevos_Dir_Nov04 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 0.0001 0.0000 0.0000
DSV7 51.0000 0.1200 170.4900
DSV7 81.0000 0.0900 188.6600
DSV7 111.0000 0.0700 128.3600
DSV7 141.0000 0.4300 263.2100
DSV7 171.0000 0.6200 102.3500
DSV1 UWI: 3060000988 Depth: Conf. Factor:
DSV2 Common: K1296
DSV3 Survey Name: RMA_21.84_13102010 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 348.4000


which I need to separate into several files. The file name will be taken
from the record DSV2, but if the name is repeated then the filename should
have a different name. The delimiters are DSV1 and DSV7(which repeats).

Desired output:
Filename: K1292.dat
DSV1 UWI: 3060000500 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: DevSurvey Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 180.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 349.6000

Filename: K1292_v2.dat
DSV1 UWI: 3060000134 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: Nuevos_Dir_Nov04 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 0.0001 0.0000 0.0000
DSV7 51.0000 0.1200 170.4900
DSV7 81.0000 0.0900 188.6600
DSV7 111.0000 0.0700 128.3600
DSV7 141.0000 0.4300 263.2100
DSV7 171.0000 0.6200 102.3500

Filename: K1296.dat
DSV1 UWI: 3060000988 Depth: Conf. Factor:
DSV2 Common: K1296
DSV3 Survey Name: RMA_21.84_13102010 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 348.4000

I'm starting with:

awk '/DSV1/,/DSV7/{
if ($1 ~ /DSV2/ )

but I don't know how to hande repeated values of DSV7.



Thanks in advance Guys.
 
Something like this
Code:
[COLOR=#0000ff]# Run: awk -f split_file.awk split_file.txt[/color]
[COLOR=#6a5acd]BEGIN[/color] {
}

[COLOR=#ff00ff]/DSV1/[/color] {
  [COLOR=#804040][b]if[/b][/color] (filename) {
    [COLOR=#0000ff]# ouput the lines[/color]
    [COLOR=#804040][b]print[/b][/color] [COLOR=#ff00ff]"Writing file: "[/color] filename
    [COLOR=#804040][b]print[/b][/color] lines > filename
    lines = [COLOR=#ff00ff]""[/color]  
  }
}
[COLOR=#ff00ff]/DSV2/[/color] {
  filename = [COLOR=#6a5acd]$3[/color]
  filenames[[COLOR=#6a5acd]filename[/color]] += [COLOR=#ff00ff]1[/color]
  [COLOR=#804040][b]if[/b][/color] (filenames[[COLOR=#6a5acd]filename[/color]] > [COLOR=#ff00ff]1[/color])  {
    filename = filename [COLOR=#ff00ff]"_v"[/color] filenames[[COLOR=#6a5acd]filename[/color]] 
  }
}
{
  lines = lines [COLOR=#6a5acd]$0[/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color]
}

[COLOR=#6a5acd]END[/color] {
  [COLOR=#804040][b]if[/b][/color] (filename) {
    [COLOR=#0000ff]# ouput the lines[/color]
    [COLOR=#804040][b]print[/b][/color] [COLOR=#ff00ff]"Writing file: "[/color] filename
    [COLOR=#804040][b]print[/b][/color] lines > filename
  }  
}

Output:
Code:
$ awk -f split_file.awk split_file.txt
Writing file: K1292
Writing file: K1292_v2
Writing file: K1296
writes files:

K1292
Code:
DSV1 UWI: 3060000134 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: DevSurvey Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 180.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 349.6000

K1292_v2
Code:
DSV1 UWI: 3060000134 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: Nuevos_Dir_Nov04 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 0.0001 0.0000 0.0000
DSV7 51.0000 0.1200 170.4900
DSV7 81.0000 0.0900 188.6600
DSV7 111.0000 0.0700 128.3600
DSV7 141.0000 0.4300 263.2100
DSV7 171.0000 0.6200 102.3500

K1296
Code:
DSV1 UWI: 3060000988 Depth: Conf. Factor:
DSV2 Common: K1296
DSV3 Survey Name: RMA_21.84_13102010 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 348.4000

 
What about this ?
awk '$1=="DSV1"{x=$0;next}$1=="DSV2"{++n[$3];out=$3(n[$3]==1?"":"_v"n[$3])".dat";print x>out}{print>out}' input

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
I overlooked that you want output with extension *.dat
So this is the corrected source
Code:
[COLOR=#0000ff]# Run: awk -f split_file.awk split_file.dat[/color]
[COLOR=#ff00ff]/DSV1/[/color] {
  [COLOR=#804040][b]if[/b][/color] (filename) {
    [COLOR=#0000ff]# ouput the lines[/color]
    [COLOR=#804040][b]print[/b][/color] [COLOR=#ff00ff]"Writing file: "[/color] filename
    [COLOR=#804040][b]print[/b][/color] lines > filename
    lines = [COLOR=#ff00ff]""[/color]  
  }
}
[COLOR=#ff00ff]/DSV2/[/color] {
  filename = [COLOR=#6a5acd]$3[/color]
  filenames[[COLOR=#6a5acd]filename[/color]] += [COLOR=#ff00ff]1[/color]
  [COLOR=#804040][b]if[/b][/color] (filenames[[COLOR=#6a5acd]filename[/color]] > [COLOR=#ff00ff]1[/color])  {
    filename = filename [COLOR=#ff00ff]"_v"[/color] filenames[[COLOR=#6a5acd]filename[/color]] 
  }
  filename = filename [COLOR=#ff00ff]".dat"[/color]
}
{
  lines = lines [COLOR=#6a5acd]$0[/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color]
}

[COLOR=#6a5acd]END[/color] {
  [COLOR=#804040][b]if[/b][/color] (filename) {
    [COLOR=#0000ff]# ouput the lines[/color]
    [COLOR=#804040][b]print[/b][/color] [COLOR=#ff00ff]"Writing file: "[/color] filename
    [COLOR=#804040][b]print[/b][/color] lines > filename
  }  
}
 
WOW Guys, you are Good !!!. I have a lot to learn about awk.

Both Codes work good.

I'll try to understand what they do.


Thank You very much.



Greetings.


Raul
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top