Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Working with data chunks 2

Status
Not open for further replies.

jriggs420

Programmer
Sep 30, 2005
116
US
I have an app which requires 4 separate chunks of input data, with each dataset being separated by a series of special characters, i.e.111 or xxx or some such.
I know how to split a string into separate variables, but since the datasets here will be considerably larger (10,000 chars), I have the feeling I need something a little more suited for that amount of data. I am currently pondering
Code:
@data=split(/111/,$infile);
$datachunk1=$data[0];
$datachunk2=$data[1]; etc...
The only problem is, I would like for each 'chunk' to be in array format. Any suggestions? TIA

Joe

A clever person solves a problem.
A wise person avoids it.

-- Einstein
 
I can't think of any particular problem associated with the length of the data. 10,000 characters is not much. As long as the delimiters are not part of the strings you should be OK.
 
You could set the input record seperator (the special $/ variable). By default, when you try to read one "line" from a file, the input record seperator is the newline character, so perl will read everything up until the next newline character.

If you set it to `111', it will read up until the first occurrence of that string, which is how you're seperating your records. Here's a simple example:
Code:
#!/usr/bin/perl
use strict;
use warnings;

local $/ = '111';

my $items = 0;
while(<DATA>) {
   chomp;
   print "ITEM $items: $_\n";
   print '#' x 30, "\n";
   $items++;
}

__DATA__
hfjasvkbfhdsvkjfhsadfjbchdfkjlashnfxajkhn111hfdjskaf bhdsjkfb
hdsafvhsadlfjhacsnfjdhnasxjk111nhfdjkasnhfdjkvsanvhfasdjflbhajkfsbchakljfbhacsfjksdhbfcskjfhbcskfhbasj111bchfdsjaklfhbcsadkjfbchscfa#sdfchcsdfcsa
#dcffbshdfjkclashfbsjkdfchbsal111cbfhdasklfchbdjsafchbas
cfbsfa

fcdsachfbdsajklfchablk111ghfdafsdffasdfas111last record!
 
Thanks guys. What form does the data take? Will new line chars be preserved as well? Is this basically the same thing as reading from an input file? Can I set up operations the same way? Is it possible to assign (insert) each dataset to an array or maybe a file handle? I'm particularly interested in being able to use the <while> function to navigate through each dataset. Does that make sense?
Thanks again..

A clever person solves a problem.
A wise person avoids it.

-- Einstein
 
don't use "while" if you have the data already in an array. Use "while" if the data is being read from a file. You can answer your questions just by experimenting with the commands and see what you get. Newlines are preserved if you do like ishnid showed you.
 
I must not have my perl hat on today because I'm still not getting it. Right now my program is seeking input from 3 separate files. I want to combine them into 1 input file, each dataset sepatated by special characters. If I can figure out how to put the datasets into 3 different arrays, I will be good to go. I can use whatever for the delimeters, and they can each be unique if that's important.
___DATA_____
1 111
2 fffffff
3 aaaaaaa
4 qqqqqqq
5 111
6 7 eeeeeee
8 rrrrrrr
9 111
10 bbbbbbb
11 ggggggg
12 ppppppp
13 111
with this data ffffff/aaaaa/qqqqqq should be stored in the first array, and so on with 111 being the delimeter. I just need know this possible because I haven't seen it documented/demonstrated elsewhere. Sorry for being so dense.

A clever person solves a problem.
A wise person avoids it.

-- Einstein
 
the problem is not your density, at least I don't think so, it just takes time to learn how to do what you are trying. The sample data you posted is not what I, and I assume ishnid, was expecting based on your original post. Post the code you have been trying to use to seperate your data into seperate bits and someone can look at it and guide you along.
 
Here is the code. Please excuse any typos, I am hand typing it:
Code:
my @array;
my $start=1;
my $end=2;
while (<DATA>){
       if (@array and $_=~$end){
           $end++;
           s/$end//;#don't want that
        print @array;#test-will write to @array when working
              @array=();}
       elsif (@array){
             push @array, $_;}
       elsif (if $_=~$start){
             s/$start//;#don't want that
             $start++;
             push @array,$_;} #blank line, that's OK
       else{#input=fubar}
              }
_____data________
1
qqq
qqq
2
aaa
aaa
aaa
3
zzz
zzz
4
Couple known problems: Only iterates through first data set. Limits allowable delimeters. Suppose I could use 1111, 1112. Don't know if it is possible to modify @array's variable name. @array1, @array2.... with each full iteration. I was originally trying to use split, but couldn't get anything close to what I want. Since I am a perl noob, I always doubt my script, thinking, there's got to be a better way. Ideas?

A clever person solves a problem.
A wise person avoids it.

-- Einstein
 
See if this help get you started:
Code:
my (%records, $i);
local $/ = '111';
while (<DATA>) {
    chomp;
    if ($_ !~ /^\s*$/) {
        my @temp = grep {$_ ne ''} split("\n", $_);
        push @{$records{"list".++$i}}, @temp;
    }
}

foreach my $list (sort keys %records){
    print "$list:\n";
    foreach my $list_item (@{$records{$list}}) {
        print "\t$list_item\n";
    }
}

__DATA__
111
fffffff
aaaaaaa
qqqqqqq
111
[URL unfurl="true"]wwwwwww[/URL]
eeeeeee
rrrrrrr
111
bbbbbbb
ggggggg
ppppppp
 
to me it looks like you just want whats between the digits to be seperated into arrays. I would maybe use an array of arrays if thats the case:

Code:
my @AoA = ();
my $i = -1;
while (<DATA>) {
   chomp;
   /^\d+$/ ? ($i++,next) : push(@{$AoA[$i]},$_);
}
print "@{$_}\n" for @AoA;
__DATA__
1
qqq
qqq
2
aaa
aaa
aaa
3
zzz
zzz
4
 
Thanks rharsh and Kevin,

Both examples worked as posted. I am currently tinkering with them to see how they work. Looks like 2 separate approaches to a similar problem. I can't tell which is best suited for my needs just yet. I just wanted to let you guys know I appreciate the help.

-Joe

A clever person solves a problem.
A wise person avoids it.

-- Einstein
 
Kevin, props for such a short code that works great. Do you know why if I try print @AoA[1]; instead of your print. I get a recommedation to rename that variable, and the output looks like it might be hex:ARRAY(0x8105d98). I think I have the idea of your script, create an array for each dataset incremented @AoA[0], @AoA[1], etc. I've tried countless variations on this theme with little success. Thanks again,

A clever person solves a problem.
A wise person avoids it.

-- Einstein
 
@AoA[1] is a deprecated form of $AoA[1], which is what you should be using to print the individual array elements because they are strings.

I used an array of arrays, which is to say an array of array references. So to print the data, and not the reference (the hex address), you dereference the reference: @{$_}

If you look at rharsh's code you see he is also using references:

push @{$records{"list".++$i}}, @temp;

and later:

foreach my $list_item (@{$records{$list}}) {

he used a hash of arrays, I used an array of arrays.
 
After rereading the rest of the posts on this thread, I'm not sure why I thought the OP wanted the lists named. Since that's not the case, don't bother with the hash of arrays, use Kevin's approach - array of arrays - it makes more sense.
 
whoah, wait a minute...did you just say 'dereference the reference'? I knew I was a little over my head here, but you just blew me out of the water with that one. I've been reading up on aoa's, and I have figured out to use 'print $array[0][0];
which prints out the first line 'fffffffff', as expected, but I can't figure out why print $array[0]; gives a reference. Perhaps I'm using these forums too much as a crutch, I think I need to invest some time in cpan. As a side note, with all due respect, Kevin, you're code while technically proficient, eloquent, and concise, does little to help others (namely me), gain a grasp on a new language. I don't mind doing things the long way as long as I learn something along the way. No offense intended. thx

A clever person solves a problem.
A wise person avoids it.

-- Einstein
 
You might want to take a look at the perldsc and perlreftut perldocs - they both have good intros to references and data structures.
 
references are nothing to be intimidated by. They are just another way to save and get data. You can see how using references made trying to do what you wanted very easy. They are a powerful and flexible yet very easy to learn part of perl. Saying: dereference the reference, might be akin to saying: decode the code, which is something you are probably used to hearing and doesn't cause any confusion. You can learn the basics of references in less than an hour:

 
Thanks for the references, I think I have demystifyed(sp?) the subject, at least well enough to suit my means...One other question..If a routine in the script is using a module. should it be declared below the shebang, or within the actual routine itself?

A clever person solves a problem.
A wise person avoids it.

-- Einstein
 
It's easy to load all modules somewhere at the begining of the script but it's not necessary, you can load them in the sub routines if thats what you prefer. There might be a bit of a difference doing one way or the other but I'm not sure what that might be. It may also depend on if you are using "use" or "require" to load up modules.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top