Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to break a file into pieces? 1

Status
Not open for further replies.

whatisthis987

IS-IT--Management
Apr 24, 2007
34
US
Hi,
I have a file that looks like this:

Par1
sdfsdff c 568768
dfdfdfdty6v 655
dfdsdfd 1243
END1abcd
Par2
tyrtyrfghgfh
rtytrfg gffg
gfhjkk dffg
END2abcd

I would like to write a TCL script to split what's between Par and END into seperate files. For example,

::File Par1.txt::
Par1
sdfsdff c 568768
dfdfdfdty6v 655
dfdsdfd 1243
END1abcd

::File Par2.txt::
Par2
tyrtyrfghgfh
rtytrfg gffg
gfhjkk dffg
END2abcd

Could anyone please help?
 
Hi whatisthis987,

Given is this file, which I named merged_file.txt
Code:
Par1
sdfsdff c 568768
dfdfdfdty6v 655
dfdsdfd  1243
END1abcd
Par2
tyrtyrfghgfh
rtytrfg gffg
gfhjkk dffg
END2abcd
This script named files.tcl does the work, i.e. it splits the above file into pieces as you defined
Code:
[COLOR=#804040][b]set[/b][/color] filename [COLOR=#ff00ff]"merged_file.txt"[/color]
[COLOR=#804040][b]set[/b][/color] input_file [[COLOR=#804040][b]open[/b][/color] [COLOR=#008080]$filename[/color] [COLOR=#ff00ff]"r"[/color]]

[COLOR=#804040][b]while[/b][/color] { [[COLOR=#804040][b]gets[/b][/color] [COLOR=#008080]$input_file[/color] line] != -[COLOR=#ff00ff]1[/color] } {
[COLOR=#0000ff]  # write line to screen[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#008080]$line[/color]

[COLOR=#0000ff]  # Match file name[/color]
  [COLOR=#804040][b]set[/b][/color] result [[COLOR=#804040][b]regexp[/b][/color] {(^Par\d+)} [COLOR=#008080]$line[/color] match fname]
  [COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$result[/color] == [COLOR=#ff00ff]1[/color]} {
[COLOR=#0000ff]    #puts "This was matched: '$match'"[/color]
[COLOR=#0000ff]    #puts "Extracted \$fname: '$fname'"[/color]
    [COLOR=#804040][b]set[/b][/color] new_fname [COLOR=#ff00ff]"$fname.txt"[/color]
    [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Opening file '$new_fname' for writing..."[/color]
    [COLOR=#804040][b]set[/b][/color] output_file [[COLOR=#804040][b]open[/b][/color] [COLOR=#008080]$new_fname[/color] [COLOR=#ff00ff]"w"[/color]] 
  }

[COLOR=#0000ff]  # write line to the output file[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#008080]$output_file[/color] [COLOR=#008080]$line[/color]

[COLOR=#0000ff]  # Match end of file[/color]
  [COLOR=#804040][b]set[/b][/color] result [[COLOR=#804040][b]regexp[/b][/color] {(^END\d+).*} [COLOR=#008080]$line[/color] match end]
  [COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$result[/color] == [COLOR=#ff00ff]1[/color]} {
[COLOR=#0000ff]    #puts "This was matched: '$match'"[/color]
[COLOR=#0000ff]    #puts "Extracted \$end: '$end'"[/color]
    [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Closing file '$new_fname'..."[/color]
    [COLOR=#804040][b]close[/b][/color] [COLOR=#008080]$output_file[/color]
  }
  
}
[COLOR=#804040][b]close[/b][/color] [COLOR=#008080]$input_file[/color]

Running the script:
Code:
C:\Users\Roman\Work>tclsh85 files.tcl
Par1
Opening file 'Par1.txt' for writing...
sdfsdff c 568768
dfdfdfdty6v 655
dfdsdfd  1243
END1abcd
Closing file 'Par1.txt'...
Par2
Opening file 'Par2.txt' for writing...
tyrtyrfghgfh
rtytrfg gffg
gfhjkk dffg
END2abcd
Closing file 'Par2.txt'...

As a result you will get 2 files:
Par1.txt which contains
Code:
Par1
sdfsdff c 568768
dfdfdfdty6v 655
dfdsdfd  1243
END1abcd
and Par2.txt which contains
Code:
Par2
tyrtyrfghgfh
rtytrfg gffg
gfhjkk dffg
END2abcd

For matching of the begins and ends of the files I used regular expressions, for better understanding you can uncomment some puts in source to see the matching results.

Roman
 
Thanks but there is a problem if the merge_file.txt does not begin with "Par". It would error out "can't read "output_file": no such variable.
 
I thought, that the input file begins everytime with Par :)
But when it's not so, you can build in a switch, witch indicates if the utput file is opened or not - something like this:
Code:
[COLOR=#804040][b]set[/b][/color] filename [COLOR=#ff00ff]"merged_file.txt"[/color]
[COLOR=#804040][b]set[/b][/color] input_file [[COLOR=#804040][b]open[/b][/color] [COLOR=#008080]$filename[/color] [COLOR=#ff00ff]"r"[/color]]
[COLOR=#0000ff]# set output_opened to False[/color]
[COLOR=#804040][b]set[/b][/color] output_opened [COLOR=#ff00ff]0[/color]

[COLOR=#804040][b]while[/b][/color] { [[COLOR=#804040][b]gets[/b][/color] [COLOR=#008080]$input_file[/color] line] != -[COLOR=#ff00ff]1[/color] } {
[COLOR=#0000ff]  # write line to screen[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#008080]$line[/color]

[COLOR=#0000ff]  # Match file name[/color]
  [COLOR=#804040][b]set[/b][/color] result [[COLOR=#804040][b]regexp[/b][/color] {(^Par\d+)} [COLOR=#008080]$line[/color] match fname]
  [COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$result[/color] == [COLOR=#ff00ff]1[/color]} {
[COLOR=#0000ff]    #puts "This was matched: '$match'"[/color]
[COLOR=#0000ff]    #puts "Extracted \$fname: '$fname'"[/color]
    [COLOR=#804040][b]set[/b][/color] new_fname [COLOR=#ff00ff]"$fname.txt"[/color]
    [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Opening file '$new_fname' for writing..."[/color]
    [COLOR=#804040][b]set[/b][/color] output_file [[COLOR=#804040][b]open[/b][/color] [COLOR=#008080]$new_fname[/color] [COLOR=#ff00ff]"w"[/color]]
[COLOR=#0000ff]    # set output_opened to True[/color]
    [COLOR=#804040][b]set[/b][/color] output_opened [COLOR=#ff00ff]1[/color]
  }

[COLOR=#0000ff]  # write line to the output file only if opened[/color]
  [COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$output_opened[/color]} {
    [COLOR=#804040][b]puts[/b][/color] [COLOR=#008080]$output_file[/color] [COLOR=#008080]$line[/color]
  }

[COLOR=#0000ff]  # Match end of file[/color]
  [COLOR=#804040][b]set[/b][/color] result [[COLOR=#804040][b]regexp[/b][/color] {(^END\d+).*} [COLOR=#008080]$line[/color] match end]
  [COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$result[/color] == [COLOR=#ff00ff]1[/color]} {
[COLOR=#0000ff]    #puts "This was matched: '$match'"[/color]
[COLOR=#0000ff]    #puts "Extracted \$end: '$end'"[/color]
    [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Closing file '$new_fname'..."[/color]
    [COLOR=#804040][b]close[/b][/color] [COLOR=#008080]$output_file[/color]
[COLOR=#0000ff]    # set output_opened to False[/color]
    [COLOR=#804040][b]set[/b][/color] output_opened [COLOR=#ff00ff]0[/color]
  }
  
}
[COLOR=#804040][b]close[/b][/color] [COLOR=#008080]$input_file[/color]

Then for example if I have this file
Code:
Something on the beginning...
Blah blah blah
Par1
sdfsdff c 568768
dfdfdfdty6v 655
dfdsdfd  1243
END1abcd
Something other in the middle ....
Blah blah blah
Par2
tyrtyrfghgfh
rtytrfg gffg
gfhjkk dffg
END2abcd
Something on the end...
Blah blah blah

Then running the above scripts
Code:
C:\Users\Roman\Work>tclsh85 files.tcl
Something on the beginning...
Blah blah blah
Par1
Opening file 'Par1.txt' for writing...
sdfsdff c 568768
dfdfdfdty6v 655
dfdsdfd  1243
END1abcd
Closing file 'Par1.txt'...
Something other in the middle ....
Blah blah blah
Par2
Opening file 'Par2.txt' for writing...
tyrtyrfghgfh
rtytrfg gffg
gfhjkk dffg
END2abcd
Closing file 'Par2.txt'...
Something on the end...
Blah blah blah
creates 2 files Par1.txt and Par2.txt:
Code:
Par1
sdfsdff c 568768
dfdfdfdty6v 655
dfdsdfd  1243
END1abcd
Code:
Par2
tyrtyrfghgfh
rtytrfg gffg
gfhjkk dffg
END2abcd
 
By the way, assuming some of the lines ($line) consist of special characters like "} % #..", what can be done to ignore those special characters in $line?
 
Hi whatisthis987,

Sorry, I don't understand your last question good. How do you want to ignore the special characters } % #..?
1. Ignore the whole line and don't write it to the file, when one of the above characters occurs
or
2. write all characters excepting the special chars to the file? For example from the line
aaa}bbb%ccc#ddd
should be written only
aaa bbb ccc ddd or this aaabbbcccddd (without spaces) ?

Please specify your problem more detailed?

 
Actually, I am intending to filter out those Parxxx with line length > 1.

set length [llength $line]
if {$length==1} {
set result [regexp {(^Par\d+)} $line match fname]
if {$result==1} {
#open output_file
#set file_opened to 1
...
...
..

so if the paragraph starts with "Par123 sas dasda asdas", I would like to filter this.


 
Construct a regular expression, so it matches exactly what you want.

Look at this example code
Code:
[COLOR=#804040][b]set[/b][/color] line [COLOR=#ff00ff]"Par123 sas dasda asdas"[/color]
[COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\$[/color][COLOR=#ff00ff]line='$line'[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color]

[COLOR=#804040][b]set[/b][/color] result [[COLOR=#804040][b]regexp[/b][/color] {^(Par\d+)[^\d]+$} [COLOR=#008080]$line[/color] match fname]
[COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Matching Result=$result"[/color]
[COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$result[/color]} {
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"This was matched: '$match'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Extracted [/color][COLOR=#6a5acd]\$[/color][COLOR=#ff00ff]fname: '$fname'"[/color]
}

[COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color]

[COLOR=#804040][b]set[/b][/color] result [[COLOR=#804040][b]regexp[/b][/color] {^(Par\d+)$} [COLOR=#008080]$line[/color] match fname]
[COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Matching Result=$result"[/color]
[COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$result[/color]} {
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"This was matched: '$match'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Extracted [/color][COLOR=#6a5acd]\$[/color][COLOR=#ff00ff]fname: '$fname'"[/color]
}

[COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]-------------------------------------------------"[/color]
[COLOR=#804040][b]set[/b][/color] line [COLOR=#ff00ff]"Par123"[/color]
[COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\$[/color][COLOR=#ff00ff]line='$line'[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color]

[COLOR=#804040][b]set[/b][/color] result [[COLOR=#804040][b]regexp[/b][/color] {^(Par\d+)[^\d]+$} [COLOR=#008080]$line[/color] match fname]
[COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Matching Result=$result"[/color]
[COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$result[/color]} {
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"This was matched: '$match'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Extracted [/color][COLOR=#6a5acd]\$[/color][COLOR=#ff00ff]fname: '$fname'"[/color]
}

[COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color]

[COLOR=#804040][b]set[/b][/color] result [[COLOR=#804040][b]regexp[/b][/color] {^(Par\d+)$} [COLOR=#008080]$line[/color] match fname]
[COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Matching Result=$result"[/color]
[COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$result[/color]} {
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"This was matched: '$match'"[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"Extracted [/color][COLOR=#6a5acd]\$[/color][COLOR=#ff00ff]fname: '$fname'"[/color]
}

The result is
Code:
D:\>tclsh85 regexp3.tcl
$line='Par123 sas dasda asdas'

Matching Result=1
This was matched: 'Par123 sas dasda asdas'
Extracted $fname: 'Par123'


Matching Result=0

-------------------------------------------------
$line='Par123'

Matching Result=0


Matching Result=1
This was matched: 'Par123'
Extracted $fname: 'Par123'

I use 2 regular expression
First
Code:
^(Par\d+)[^\d]+$
matches from beginning of the line (^) to the end of line ($) a pattern which consists of Par followed by minimal one digit (\d+) and minimal one "not digit" character ([^\d]+). Thereby the subpattern closed in ( and ), i.e Par\d+ will be extracted.
So as you see this regex matches
$line='Par123 sas dasda asdas'
but it doesn't match
$line='Par123'

Second regex is
Code:
^(Par\d+)$
It matches from beginning of the line (^) to the end of line ($) only the pattern which consists of Par followed by minimal one digit (\d+). Thereby the subpattern closed in ( and ), i.e Par\d+ will be extracted.
So this regex doesn't match
$line='Par123 sas dasda asdas'
but it matches
$line='Par123'
only


 
Thank again for the solution and detail explanation. Perhaps the previous testcase isn't a good one. Here's what I really want.

=== Input file ===
123 45676 677
par123.123
sfsf sf dsfsd sfd s
wefrewr23 23432 53 65 5
dgasgd.4354ssd
sdfs{#$%$#%$%!!!@ et g trhrt
werd wewete 24356
655b56 fg d gdgf.werrew t34

== Output files ===

::par123.123.txt:: <--file #1
sfsf sf dsfsd sfd s
wefrewr23 23432 53 65 5

::dgasgd.4354ssd.txt:: <--file #2
sdfs{#$%$#%$%!!!@ et g trhrt
werd wewete 24356
655b56 fg d gdgf.werrew t34

========================

so, instead of having a line that begins with par, now I would like to look for the line that has a length of 1 and copy the lines below it until the next single field line is hit then create another file for it. I tried using llength $line but those special characters are causing syntax error.
 
Here you don't need regular expressions. You only have to split each line into a list, then if the list contains only one element it's your file name.

Example:
File merged_file.txt
Code:
123 45676 677
par123.123
sfsf sf dsfsd sfd s
wefrewr23 23432 53 65  5
dgasgd.4354ssd
sdfs{#$%$#%$%!!!@  et g trhrt
werd wewete 24356
655b56 fg d gdgf.werrew t34

Script files.tcl
Code:
[COLOR=#804040][b]set[/b][/color] filename [COLOR=#ff00ff]"merged_file.txt"[/color]
[COLOR=#804040][b]set[/b][/color] input_file [[COLOR=#804040][b]open[/b][/color] [COLOR=#008080]$filename[/color] [COLOR=#ff00ff]"r"[/color]]
[COLOR=#0000ff]# set output_opened to False[/color]
[COLOR=#804040][b]set[/b][/color] output_opened [COLOR=#ff00ff]0[/color]

[COLOR=#804040][b]while[/b][/color] { [[COLOR=#804040][b]gets[/b][/color] [COLOR=#008080]$input_file[/color] line] != -[COLOR=#ff00ff]1[/color] } {
[COLOR=#0000ff]  # write line to screen[/color]
  [COLOR=#804040][b]puts[/b][/color] [COLOR=#008080]$line[/color]

[COLOR=#0000ff]  # split line into list on spaces[/color]
  [COLOR=#804040][b]set[/b][/color] line_list [[COLOR=#804040][b]split[/b][/color] [COLOR=#008080]$line[/color] [COLOR=#ff00ff]" "[/color]]
  
  [COLOR=#804040][b]if[/b][/color] {[[COLOR=#804040][b]llength[/b][/color] [COLOR=#008080]$line_list[/color]] == [COLOR=#ff00ff]1[/color]} {
[COLOR=#0000ff]    # file name[/color]
    [COLOR=#804040][b]set[/b][/color] new_fname [COLOR=#ff00ff]"$line.txt"[/color]
    [COLOR=#804040][b]puts[/b][/color] [COLOR=#ff00ff]"->File name found: '$new_fname'"[/color]
[COLOR=#0000ff]    # first close previous opened output file    [/color]
    [COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$output_opened[/color]} {
      [COLOR=#804040][b]close[/b][/color] [COLOR=#008080]$output_file[/color]
[COLOR=#0000ff]      # set output_opened to False      [/color]
      [COLOR=#804040][b]set[/b][/color] output_opened [COLOR=#ff00ff]0[/color]      
    }
[COLOR=#0000ff]    # open output file[/color]
    [COLOR=#804040][b]set[/b][/color] output_file [[COLOR=#804040][b]open[/b][/color] [COLOR=#008080]$new_fname[/color] [COLOR=#ff00ff]"w"[/color]]
[COLOR=#0000ff]    # set output_opened to True[/color]
    [COLOR=#804040][b]set[/b][/color] output_opened [COLOR=#ff00ff]1[/color]
[COLOR=#0000ff]    # don't write the line, which is identical with filename[/color]
[COLOR=#0000ff]    # jump to the while-loop begin and process next line[/color]
    [COLOR=#804040][b]continue[/b][/color]    
  }

[COLOR=#0000ff]  # write line to the output file only if opened[/color]
  [COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$output_opened[/color]} {
    [COLOR=#804040][b]puts[/b][/color] [COLOR=#008080]$output_file[/color] [COLOR=#008080]$line[/color]
  }
  
}
[COLOR=#804040][b]close[/b][/color] [COLOR=#008080]$input_file[/color]
[COLOR=#0000ff]# close output if opened[/color]
[COLOR=#804040][b]if[/b][/color] {[COLOR=#008080]$output_opened[/color]} {
  [COLOR=#804040][b]close[/b][/color] [COLOR=#008080]$output_file[/color]
}

Running
Code:
C:\Users\Roman\Work>tclsh85 files.tcl
123 45676 677
par123.123
->File name found: 'par123.123.txt'
sfsf sf dsfsd sfd s
wefrewr23 23432 53 65  5
dgasgd.4354ssd
->File name found: 'dgasgd.4354ssd.txt'
sdfs{#$%$#%$%!!!@  et g trhrt
werd wewete 24356
655b56 fg d gdgf.werrew t34

Result
File par123.123.txt
Code:
sfsf sf dsfsd sfd s
wefrewr23 23432 53 65  5

dgasgd.4354ssd.txt
Code:
sdfs{#$%$#%$%!!!@  et g trhrt
werd wewete 24356
655b56 fg d gdgf.werrew t34
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top