Working with data chunks 2

jriggs420 · Nov 30, 2005

I have an app which requires 4 separate chunks of input data, with each dataset being separated by a series of special characters, i.e.111 or xxx or some such.
I know how to split a string into separate variables, but since the datasets here will be considerably larger (10,000 chars), I have the feeling I need something a little more suited for that amount of data. I am currently pondering

Code:

@data=split(/111/,$infile);
$datachunk1=$data[0];
$datachunk2=$data[1]; etc...

The only problem is, I would like for each 'chunk' to be in array format. Any suggestions? TIA

Joe

A clever person solves a problem.
A wise person avoids it.

-- Einstein

KevinADC · Nov 30, 2005

I can't think of any particular problem associated with the length of the data. 10,000 characters is not much. As long as the delimiters are not part of the strings you should be OK.

ishnid · Dec 1, 2005

You could set the input record seperator (the special $/ variable). By default, when you try to read one "line" from a file, the input record seperator is the newline character, so perl will read everything up until the next newline character.

If you set it to `111', it will read up until the first occurrence of that string, which is how you're seperating your records. Here's a simple example:

Code:

#!/usr/bin/perl
use strict;
use warnings;

local $/ = '111';

my $items = 0;
while(<DATA>) {
   chomp;
   print "ITEM $items: $_\n";
   print '#' x 30, "\n";
   $items++;
}

__DATA__
hfjasvkbfhdsvkjfhsadfjbchdfkjlashnfxajkhn111hfdjskaf bhdsjkfb
hdsafvhsadlfjhacsnfjdhnasxjk111nhfdjkasnhfdjkvsanvhfasdjflbhajkfsbchakljfbhacsfjksdhbfcskjfhbcskfhbasj111bchfdsjaklfhbcsadkjfbchscfa#sdfchcsdfcsa
#dcffbshdfjkclashfbsjkdfchbsal111cbfhdasklfchbdjsafchbas
cfbsfa

fcdsachfbdsajklfchablk111ghfdafsdffasdfas111last record!

jriggs420 · Dec 1, 2005

Thanks guys. What form does the data take? Will new line chars be preserved as well? Is this basically the same thing as reading from an input file? Can I set up operations the same way? Is it possible to assign (insert) each dataset to an array or maybe a file handle? I'm particularly interested in being able to use the <while> function to navigate through each dataset. Does that make sense?
Thanks again..

A clever person solves a problem.
A wise person avoids it.

-- Einstein

KevinADC · Dec 1, 2005

don't use "while" if you have the data already in an array. Use "while" if the data is being read from a file. You can answer your questions just by experimenting with the commands and see what you get. Newlines are preserved if you do like ishnid showed you.

jriggs420 · Dec 1, 2005

I must not have my perl hat on today because I'm still not getting it. Right now my program is seeking input from 3 separate files. I want to combine them into 1 input file, each dataset sepatated by special characters. If I can figure out how to put the datasets into 3 different arrays, I will be good to go. I can use whatever for the delimeters, and they can each be unique if that's important.
___DATA_____
1 111
2 fffffff
3 aaaaaaa
4 qqqqqqq
5 111
6

http://wwwwwww

7 eeeeeee
8 rrrrrrr
9 111
10 bbbbbbb
11 ggggggg
12 ppppppp
13 111
with this data ffffff/aaaaa/qqqqqq should be stored in the first array, and so on with 111 being the delimeter. I just need know this possible because I haven't seen it documented/demonstrated elsewhere. Sorry for being so dense.

A clever person solves a problem.
A wise person avoids it.

-- Einstein

KevinADC · Dec 1, 2005

the problem is not your density, at least I don't think so, it just takes time to learn how to do what you are trying. The sample data you posted is not what I, and I assume ishnid, was expecting based on your original post. Post the code you have been trying to use to seperate your data into seperate bits and someone can look at it and guide you along.

jriggs420 · Dec 1, 2005

Here is the code. Please excuse any typos, I am hand typing it:

Code:

my @array;
my $start=1;
my $end=2;
while (<DATA>){
       if (@array and $_=~$end){
           $end++;
           s/$end//;#don't want that
        print @array;#test-will write to @array when working
              @array=();}
       elsif (@array){
             push @array, $_;}
       elsif (if $_=~$start){
             s/$start//;#don't want that
             $start++;
             push @array,$_;} #blank line, that's OK
       else{#input=fubar}
              }
_____data________
1
qqq
qqq
2
aaa
aaa
aaa
3
zzz
zzz
4

Couple known problems: Only iterates through first data set. Limits allowable delimeters. Suppose I could use 1111, 1112. Don't know if it is possible to modify @array's variable name. @array1, @array2.... with each full iteration. I was originally trying to use split, but couldn't get anything close to what I want. Since I am a perl noob, I always doubt my script, thinking, there's got to be a better way. Ideas?

A clever person solves a problem.
A wise person avoids it.

-- Einstein

rharsh · Dec 1, 2005

See if this help get you started:

Code:

my (%records, $i);
local $/ = '111';
while (<DATA>) {
    chomp;
    if ($_ !~ /^\s*$/) {
        my @temp = grep {$_ ne ''} split("\n", $_);
        push @{$records{"list".++$i}}, @temp;
    }
}

foreach my $list (sort keys %records){
    print "$list:\n";
    foreach my $list_item (@{$records{$list}}) {
        print "\t$list_item\n";
    }
}

__DATA__
111
fffffff
aaaaaaa
qqqqqqq
111
[URL unfurl="true"]wwwwwww[/URL]
eeeeeee
rrrrrrr
111
bbbbbbb
ggggggg
ppppppp

KevinADC · Dec 1, 2005

to me it looks like you just want whats between the digits to be seperated into arrays. I would maybe use an array of arrays if thats the case:

Code:

my @AoA = ();
my $i = -1;
while (<DATA>) {
   chomp;
   /^\d+$/ ? ($i++,next) : push(@{$AoA[$i]},$_);
}
print "@{$_}\n" for @AoA;
__DATA__
1
qqq
qqq
2
aaa
aaa
aaa
3
zzz
zzz
4

jriggs420 · Dec 2, 2005

Thanks rharsh and Kevin,

Both examples worked as posted. I am currently tinkering with them to see how they work. Looks like 2 separate approaches to a similar problem. I can't tell which is best suited for my needs just yet. I just wanted to let you guys know I appreciate the help.

-Joe

A clever person solves a problem.
A wise person avoids it.

-- Einstein

jriggs420 · Dec 2, 2005

Kevin, props for such a short code that works great. Do you know why if I try print @AoA[1]; instead of your print. I get a recommedation to rename that variable, and the output looks like it might be hex:ARRAY(0x8105d98). I think I have the idea of your script, create an array for each dataset incremented @AoA[0], @AoA[1], etc. I've tried countless variations on this theme with little success. Thanks again,

A clever person solves a problem.
A wise person avoids it.

-- Einstein

KevinADC · Dec 2, 2005

@AoA[1] is a deprecated form of $AoA[1], which is what you should be using to print the individual array elements because they are strings.

I used an array of arrays, which is to say an array of array references. So to print the data, and not the reference (the hex address), you dereference the reference: @{$_}

If you look at rharsh's code you see he is also using references:

push @{$records{"list".++$i}}, @temp;

and later:

foreach my $list_item (@{$records{$list}}) {

he used a hash of arrays, I used an array of arrays.

rharsh · Dec 2, 2005

After rereading the rest of the posts on this thread, I'm not sure why I thought the OP wanted the lists named. Since that's not the case, don't bother with the hash of arrays, use Kevin's approach - array of arrays - it makes more sense.

jriggs420 · Dec 2, 2005

whoah, wait a minute...did you just say 'dereference the reference'? I knew I was a little over my head here, but you just blew me out of the water with that one. I've been reading up on aoa's, and I have figured out to use 'print $array[0][0];
which prints out the first line 'fffffffff', as expected, but I can't figure out why print $array[0]; gives a reference. Perhaps I'm using these forums too much as a crutch, I think I need to invest some time in cpan. As a side note, with all due respect, Kevin, you're code while technically proficient, eloquent, and concise, does little to help others (namely me), gain a grasp on a new language. I don't mind doing things the long way as long as I learn something along the way. No offense intended. thx

A clever person solves a problem.
A wise person avoids it.

-- Einstein

rharsh · Dec 2, 2005

You might want to take a look at the perldsc and perlreftut perldocs - they both have good intros to references and data structures.

KevinADC · Dec 2, 2005

references are nothing to be intimidated by. They are just another way to save and get data. You can see how using references made trying to do what you wanted very easy. They are a powerful and flexible yet very easy to learn part of perl. Saying: dereference the reference, might be akin to saying: decode the code, which is something you are probably used to hearing and doesn't cause any confusion. You can learn the basics of references in less than an hour:

http://perldoc.perl.org/perlreftut.html

jriggs420 · Dec 5, 2005

Thanks for the references, I think I have demystifyed(sp?) the subject, at least well enough to suit my means...One other question..If a routine in the script is using a module. should it be declared below the shebang, or within the actual routine itself?

A clever person solves a problem.
A wise person avoids it.

-- Einstein

KevinADC · Dec 5, 2005

It's easy to load all modules somewhere at the begining of the script but it's not necessary, you can load them in the sub routines if thats what you prefer. There might be a bit of a difference doing one way or the other but I'm not sure what that might be. It may also depend on if you are using "use" or "require" to load up modules.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Working with data chunks 2

Programmer

Technical User

Programmer

Programmer

Technical User

Programmer

Technical User

Programmer

Technical User

Technical User

Programmer

Programmer

Technical User

Technical User

Programmer

Technical User

Technical User

Programmer

Technical User

Similar threads

Log in

Part and Inventory Search

Sponsor