Need help with tabular data 1

calomel · Aug 14, 2006

Hello all. I'm a research scientist with some familiarity with perl but am trying to do some work with tabular data and have no idea where to go from here. I can read in the data fine into several variables, mess with individual rows and export the data out fine. However, I'm stuck as to how to approach the main part of a new program. Here's the basic idea:

I need to find all occurrences when a specific variable is the same for three straight consecutive rows then sum the analyzer results (another column) for those three rows. I know how to read data in using foreach but have no idea how to manuipulate or look at the data in previous or earlier rows and store the analyzer results between different rows.

I've done a lot of googling and dug through my perl books to no avail. I would appreciate any input to get me going in the right direction. Thanks

rharsh · Aug 14, 2006

Can you supply some sample data? This might be easier if we are able to see what you're working with.

How large are the files you're working on? A hundred Kb, a few Mb, a couple Gigs?

Are you only going to be looking for one keyword/string across three lines, or are there multiple keywords?

calomel · Aug 15, 2006

The datasets I'm edaling with a fairly small. Maybe a few hunderd kb at the most so the code doesn't have to be very efficient, it just has to work. Here's an example of a fake dataset:

Code:

Date   Flag    Result
6-5     0        1.5
6-6     0        1.7
6-7     0        1.8
6-8     1        0.4
6-9     1        0.1
6-10    1        0.02
6-11    2        0.2
6-12    2        4.6
6-13    2        1.1
6-14    3        17.8
6-15    3        3.3
6-16    3        0.6
6-17    1        0.2
6-18    1        0.04
6-19    0        1.3
6-20    0        1.8
repeat ad naseum for several pages

So here's the gist of what I'm trying to do (in order of increasing difficulty).
1) I want to sum up the result column when I have three consecutive rows with a "3" value for the flag variable (i.e., 17.8 + 3.3 + 0.6 = 21.7). However I have no idea how to ask perl if it's seen 3 consecutive rows with those paramters. Here's the ideal resulting output for those rows:

Code:

6-14    3        0
6-15    3        0
6-16    3        21.7

2) Same thing for the "2" flags except I want to ignore the first row with a 2 in it. So in thise case I want 4.6 + 1.1 (5.7) and want to ignore the preceeding 0.2 and replace it with a 0. Here's the ideal resulting output for those rows:

Code:

6-11    2        0
6-12    2        5.7
6-13    2        0

3) I need to calculate a blank by summing the last 1 flag before the 2 flags start (i.e., 6-10, 0.02) with the last 1 flag after the 3 flags go by (i.e., 6-18, 0.04). So in this case the blank would be 0.06 (0.02 + 0.04). Now here's the really tricky part. I need to substract the blank from the sums for the 2 and 3 flag sums I created in steps and 1 and 2.

As you can see, this gets a little funky and I really have no idea how to tell perl how to look through all the rows for these kind of structures. The only thing going for me is that the structure of the flags is always the same (12 "0" flags followed by 3 "1" flags followed by 3 "2" flags followed by 3 "3" flags followed by 2 "1" flags. If anyone has any advice or avenues to look into, I'd greatly appreciate it.

rharsh · Aug 15, 2006

the structure of the flags is always the same (12 "0" flags followed by 3 "1" flags followed by 3 "2" flags followed by 3 "3" flags followed by 2 "1" flags.

That makes this problem a lot easier. Take a look at this and see if it helps:

Code:

my @data;
my $headers = <DATA>;

while (<DATA>) {
    next if /^\s*$/;
    push @data, [split /\s+/, $_];
}

for (my $i = 0; $i <= $#data; $i++) {
    #Skip Zeros
    next if $data[$i][1] == 0;
    
    # Calculate 'blank'
    my $blank = $data[$i+2][2] + $data[$i+9][2];
    
    # Sum 2's
    $data[$i+3][2] = 0;    # Set first row to 0
    my $twos = $data[$i+4][2] + $data[$i+5][2];
    $data[$i+4][2] = $twos - $blank;
    $data[$i+5][2] = 0;

    # Sum 3's
    my $threes = $data[$i+6][2] + $data[$i+7][2] + $data[$i+8][2];
    $data[$i+6][2] = $data[$i+7][2] = 0;
    $data[$i+8][2] = $threes - $blank;

    # Start reading at next set of 0's
    $i+=10;
}

foreach (@data) {
    print join("\t", @{$_}), "\n";
}



__DATA__
Date   Flag    Result
6-5     0        1.5
6-6     0        1.7
6-7     0        1.8
6-8     1        0.4
6-9     1        0.1
6-10    1        0.02
6-11    2        0.2
6-12    2        4.6
6-13    2        1.1
6-14    3        17.8
6-15    3        3.3
6-16    3        0.6
6-17    1        0.2
6-18    1        0.04
6-19    0        1.3
6-20    0        1.8

calomel · Aug 15, 2006

Thanks a million! I'll have to spend a while reading over your code to figure out all the commands. The only thing I'm worried about is that it seems like you had tell it which lines the consecutive rows are on. For instance

Code:

my $twos = $data[$i+4][2] + $data[$i+5][2];

makes it look like we've told the program exactly where the rows with the consecutive 2 flags are. I need it to read through pages of data find all the sets of consecutive 2 flags and apply the same formula to each of them. I'll try to search around a little more on this. Thanks!

calomel · Aug 15, 2006

Oops, I spoke too soon. It works absolutely perfectly. I just need to spend some time looking at the code a little harder. I not familiar with some of the commands so I'll need to play around with it to get a better handle on it. I definitely owe you a beer!

rharsh · Aug 15, 2006

I'm glad the code worked so well for you. Did you get it all figured out or are you having problems with some of it?

calomel · Aug 17, 2006

Code:

for (my $i = 0; $i <= $#data; $i++) {

This is the only line that I don't understand. What was the reason for using "my" instead of just setting $i=0? I've never used my before and have only seen it used in subroutines. also "$i <= $#data" is giving me some confusion as well. I'm not familiar with "$#". Why not use @data instead? But it seems to me that this line is resposible for going through the data line by line given the increment at the end ($i++). I think if I can understand this line I have the rest of it figured out. Thanks again.

keid · Aug 17, 2006

my is used only forcreating private variable i, without it some warnings will appear. while $#data stands for the index value of the last element of the array data ie the array length.
note for loops ( init value; test condition; increment)...,

hope this helps from a beginner like me

stevexff · Aug 17, 2006

$#data does represent the index value of the last element in the array. But it isn't the array length, as arrays are indexed from 0. A 3 element array will have $#array == 2 and @array (in a scalar context) == 3. Normally you see

Code:

for (my $i = 0; $i < @array; $i++) {...}

to achieve the same effect.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Need help with tabular data 1

calomel

Technical User

rharsh

Technical User

calomel

Technical User

rharsh

Technical User

calomel

Technical User

calomel

Technical User

rharsh

Technical User

calomel

Technical User

keid

Technical User

stevexff

Programmer

Similar threads

Part and Inventory Search

Sponsor