Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Splitting up a file

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
Hi all,

The idea is to split up a file containing, let's say, 17 words (one on each line), into smaller files, each containing 3 words (only the last file is allowed to have less). Example:

testfile.txt:

"aabb
"bbcc
"ccdd
"ddee
"eeff
"ffgg
"gghh
"hhii
"iijj
"jjkk
"kkll
"llmm
"mmnn
"nnoo
"oopp
"ppqq
"qqrr

This file should be split up into:

testfile_1:

"aabb "bbcc "ccdd

testfile_2:

"ccdd "ddee "eeff

testfile_3:

"eeff "ffgg "gghh

testfile_4:

"gghh "hhii "iijj

etc.

Remarks:

- each new file should start with the name of the original
file (leaving out the extension, though), followed by an
underscore and its number (as you can see in the example);
- the second outputfile should start with the last word of
the first outputfile, the third outputfile should start
with the last word of the second outputfile, etc.
- in the outputfiles, words should be placed after one
another, with only one white space between them and a
newline after the last word.

(optional:
- it would be nice if the size of the outputfiles (in this
case three words) could be changed on the comand line;
and if more than one large file could be placed on the
command line.)

Can someone help me with this? I would be very grateful :)Thanks,

kredit99

 
I think this was solved here already a few days ago. This should work:

#! /bin/sh

LEN="$1"
shift

awk -v len="$LEN" '
BEGIN {
ct=1
fn="file_" ct
}
{
if (NR%len == 0) {
printf("%s\n", $0) >> fn
close(fn)
ct++
fn="file_" ct
}
else {
printf("%s ", $0) >> fn
}
}' "$@"
# END OF SCRIPT


Cheers,
ND
 
My apologies, you probably want this END section on the last script.

END {
if (NR%len != 0) print > fn
}

Regards,
ND
 
Hi ND,

The output is not entirely the way I want it to be:

- the first word of each smaller file (except for the first
file of course) should begin with the last word of the
previous file, as you van see in the example above (first
message).
- the name of the file should begin with the name of the
large file(s), indicated on the command line.

for instance:

gawk -f test.awk -v len=36 window.txt door.txt etc...

These files should be split up into:

window_1 door_1
window_2 door_2
window_3 door_3
etc. etc.

Possible to do this with gawk? Thanks.

Kredit99



 
This seems to be what you want.

Good luck,
ND

#! /bin/sh

LEN="$1"
shift

awk -v len="$LEN" '
{
if(NR == 1){
fn = FILENAME "_1"
ct=1
xt=1
}
if( xt == 1 && line != ""){
printf("%s ", line) > fn
xt++
}
if( xt == len ){
line=$0
printf("%s\n", $0) > fn
xt=1
close(fn)
fn=FILENAME "_" ++ct
}
else {
printf("%s ", $0) > fn
xt++
}
}
END {
if( xt < len ) print > fn
}' &quot;$@&quot;
 
ND,

When I run your script on the inputfile(s), Awk notices an error on the 20th line. I quote: &quot;20:fatal:expression for '>' redirection has null string value&quot;. Could you have another look at the script below, which is basically the same as the one you proposed, except for the fact that some variables are renamed? And while doing that, could you add some '#' comments to the script so that I know how the script should be interpreted?

Thanks ND!

----script looks like this-------:

BEGIN {
if (NR == 1) {
fname = FILENAME &quot;_1&quot;
fnumber = 1
fline = 1
}
if (fline == 1 && line != &quot;&quot;) {
printf (&quot;%s&quot;, line) > fname
fline++
}
if (fline == len) {
line = $0
printf (&quot;%s\n&quot;, $0) > fname
fline = 1
close (fname)
fn = FILENAME &quot;_&quot; ++fnumber
}
else {
printf (&quot;%s&quot;, $0) > fname
fline++
}
}
END {
if (fline < len) print > fname
}
 
I see two problems from your script above. The BEGIN keyword must go or the block fails. And the line

fn = FILENAME &quot;_&quot; ++fnumber

must have fn changed to fname for

fname = FILENAME &quot;_&quot; ++fnumber

...
ND
 
Here is the good version that works (for me) and takes multiple files as input. My fix is that all file sets will a start with '_1' now.

Comments too.

{
# currFName is null and update at start of each new file
# Condition checks for next file
if( FILENAME != currFName ){
fname = FILENAME &quot;_1&quot; # (re)set base filename var
fnumber=1 # (re)set filename number var
currFName = FILENAME # var to catch change of filename
xt = 1 # var to match allowed lines
# (set) to concat
line = &quot;&quot; # (last)line to null when first
# set in file
}
# Check if previous line will be first in new set
if( xt == 1 && line != &quot;&quot;){
printf(&quot;%s &quot;, line) > fname # print line with space
# for next line will be
# concat'd to this
xt++ # advance set index
}
# Is set index at max value
if( xt == len ){
line=$0 # save line to applay to
# next file
printf(&quot;%s\n&quot;, $0) > fname # terminate set with last
# line
xt=1 # prepare set index for
# next file
close(fname) # close open file
fname=FILENAME &quot;_&quot; ++fnumber # prepare next file(name)
# to write
}
# just concatenate lines to set
else {
printf(&quot;%s &quot;, $0) > fname
xt++ # advance set marker
}
}
END {
# If set not full then insert newline
if( xt < len ) print > fname
}
 
Two final questions ND,

Your script is working fine. But: the smaller files are not allowed to carry the .txt extensions. So a file like 'door.txt' should be split up into the following files: door_1, door_2, door_3 etc. and not into: door.txt_1, door.txt_2, door.txt_3 etc, as is the case with the script you proposed. Would you be so kind as to have another look at that?

And also, I do not understand why in the first condition the 'line' is set to &quot;&quot;. Could you explain that to me?

Thanks,

kredit99
 
Problem about the .txt solution is solved. Have used the match function. But that still leaves me with the question why in the first condition the 'line' is set to &quot;&quot;.

Greetings,

kredit99
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top