Separating a Large File 2

RulMorf · Jul 19, 2013

Hello Guys, please help me. I'm an awk begginner whit this problem:

I have a large file like this:

DSV1 UWI: 3060000134 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: DevSurvey Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 180.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 349.6000
DSV1 UWI: 3060000134 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: Nuevos_Dir_Nov04 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 0.0001 0.0000 0.0000
DSV7 51.0000 0.1200 170.4900
DSV7 81.0000 0.0900 188.6600
DSV7 111.0000 0.0700 128.3600
DSV7 141.0000 0.4300 263.2100
DSV7 171.0000 0.6200 102.3500
DSV1 UWI: 3060000988 Depth: Conf. Factor:
DSV2 Common: K1296
DSV3 Survey Name: RMA_21.84_13102010 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 348.4000

which I need to separate into several files. The file name will be taken
from the record DSV2, but if the name is repeated then the filename should
have a different name. The delimiters are DSV1 and DSV7(which repeats).

Desired output:
Filename: K1292.dat
DSV1 UWI: 3060000500 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: DevSurvey Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 180.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 349.6000

Filename: K1292_v2.dat
DSV1 UWI: 3060000134 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: Nuevos_Dir_Nov04 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 0.0001 0.0000 0.0000
DSV7 51.0000 0.1200 170.4900
DSV7 81.0000 0.0900 188.6600
DSV7 111.0000 0.0700 128.3600
DSV7 141.0000 0.4300 263.2100
DSV7 171.0000 0.6200 102.3500

Filename: K1296.dat
DSV1 UWI: 3060000988 Depth: Conf. Factor:
DSV2 Common: K1296
DSV3 Survey Name: RMA_21.84_13102010 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 348.4000

I'm starting with:

awk '/DSV1/,/DSV7/{
if ($1 ~ /DSV2/ )

but I don't know how to hande repeated values of DSV7.

Thanks in advance Guys.

mikrom · Jul 19, 2013

Something like this

Code:

[COLOR=#0000ff]# Run: awk -f split_file.awk split_file.txt[/color]
[COLOR=#6a5acd]BEGIN[/color] {
}

[COLOR=#ff00ff]/DSV1/[/color] {
  [COLOR=#804040][b]if[/b][/color] (filename) {
    [COLOR=#0000ff]# ouput the lines[/color]
    [COLOR=#804040][b]print[/b][/color] [COLOR=#ff00ff]"Writing file: "[/color] filename
    [COLOR=#804040][b]print[/b][/color] lines > filename
    lines = [COLOR=#ff00ff]""[/color]  
  }
}
[COLOR=#ff00ff]/DSV2/[/color] {
  filename = [COLOR=#6a5acd]$3[/color]
  filenames[[COLOR=#6a5acd]filename[/color]] += [COLOR=#ff00ff]1[/color]
  [COLOR=#804040][b]if[/b][/color] (filenames[[COLOR=#6a5acd]filename[/color]] > [COLOR=#ff00ff]1[/color])  {
    filename = filename [COLOR=#ff00ff]"_v"[/color] filenames[[COLOR=#6a5acd]filename[/color]] 
  }
}
{
  lines = lines [COLOR=#6a5acd]$0[/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color]
}

[COLOR=#6a5acd]END[/color] {
  [COLOR=#804040][b]if[/b][/color] (filename) {
    [COLOR=#0000ff]# ouput the lines[/color]
    [COLOR=#804040][b]print[/b][/color] [COLOR=#ff00ff]"Writing file: "[/color] filename
    [COLOR=#804040][b]print[/b][/color] lines > filename
  }  
}

Output:

Code:

$ awk -f split_file.awk split_file.txt
Writing file: K1292
Writing file: K1292_v2
Writing file: K1296

writes files:

K1292

Code:

DSV1 UWI: 3060000134 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: DevSurvey Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 180.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 349.6000

K1292_v2

Code:

DSV1 UWI: 3060000134 Depth: Conf. Factor:
DSV2 Common: K1292
DSV3 Survey Name: Nuevos_Dir_Nov04 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 0.0001 0.0000 0.0000
DSV7 51.0000 0.1200 170.4900
DSV7 81.0000 0.0900 188.6600
DSV7 111.0000 0.0700 128.3600
DSV7 141.0000 0.4300 263.2100
DSV7 171.0000 0.6200 102.3500

K1296

Code:

DSV1 UWI: 3060000988 Depth: Conf. Factor:
DSV2 Common: K1296
DSV3 Survey Name: RMA_21.84_13102010 Method:
DSV4 Company: UNKNOWN Date:
DSV5 Remarks:
DSV6 Measr. Depth Deviation Direction
DSV7 0.0000 0.0000 0.0000
DSV7 30.0000 0.5000 196.7200
DSV7 60.0000 0.5000 348.4000

PHV · Jul 19, 2013

What about this ?
awk '$1=="DSV1"{x=$0;next}$1=="DSV2"{++n[$3];out=$3(n[$3]==1?"":"_v"n[$3])".dat";print x>out}{print>out}' input

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

mikrom · Jul 19, 2013

I overlooked that you want output with extension *.dat
So this is the corrected source

Code:

[COLOR=#0000ff]# Run: awk -f split_file.awk split_file.dat[/color]
[COLOR=#ff00ff]/DSV1/[/color] {
  [COLOR=#804040][b]if[/b][/color] (filename) {
    [COLOR=#0000ff]# ouput the lines[/color]
    [COLOR=#804040][b]print[/b][/color] [COLOR=#ff00ff]"Writing file: "[/color] filename
    [COLOR=#804040][b]print[/b][/color] lines > filename
    lines = [COLOR=#ff00ff]""[/color]  
  }
}
[COLOR=#ff00ff]/DSV2/[/color] {
  filename = [COLOR=#6a5acd]$3[/color]
  filenames[[COLOR=#6a5acd]filename[/color]] += [COLOR=#ff00ff]1[/color]
  [COLOR=#804040][b]if[/b][/color] (filenames[[COLOR=#6a5acd]filename[/color]] > [COLOR=#ff00ff]1[/color])  {
    filename = filename [COLOR=#ff00ff]"_v"[/color] filenames[[COLOR=#6a5acd]filename[/color]] 
  }
  filename = filename [COLOR=#ff00ff]".dat"[/color]
}
{
  lines = lines [COLOR=#6a5acd]$0[/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color]
}

[COLOR=#6a5acd]END[/color] {
  [COLOR=#804040][b]if[/b][/color] (filename) {
    [COLOR=#0000ff]# ouput the lines[/color]
    [COLOR=#804040][b]print[/b][/color] [COLOR=#ff00ff]"Writing file: "[/color] filename
    [COLOR=#804040][b]print[/b][/color] lines > filename
  }  
}

RulMorf · Jul 19, 2013

WOW Guys, you are Good !!!. I have a lot to learn about awk.

Both Codes work good.

I'll try to understand what they do.

Thank You very much.

Greetings.

Raul

mikrom · Jul 19, 2013

I have lot to learn from PHV too

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Separating a Large File 2

RulMorf

Technical User

mikrom

Programmer

PHV

MIS

mikrom

Programmer

RulMorf

Technical User

mikrom

Programmer

Similar threads

Part and Inventory Search

Sponsor