Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations sizbut on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem with division 2

Status
Not open for further replies.

brianwustl

Programmer
Feb 14, 2005
11
US
I would like to open a text file that contains hundreds of lines of tab delimited numerical values (hundreds of values per line) between 0 and 255. I would like to take the mean average of all these values (so I guess the program would add up all the values and then divide them by the number of values per file). So the result for each text file would be a single value between 0 and 255.

I would like to perform this on a series of files and then create a single text file that holds all of these mean values, each on a new line:

i.e.,
230
198
194
6
201
57
115
...etc.

Also, this would need to be run with the perl included with Mac OS X.

Here is what I have so far:

Code:
my @files;
my (@l_count, $l_count);
my (@f_count, $f_count);
my @all;

grep{ -f and push @files, $_ }glob '*';

for( @files ){
    open FH, $_ or die $!;
    while( <FH> ){
        push @l_count, $_ for split '\s+', $_;
        for( @l_count ){
           $l_count += $_ for @l_count
        }
        @l_count = ();
        push @f_count, ( $l_count / $#l_count );
        $l_count = 0;
    }
    close FH;
    $f_count += $_ for @f_count;
    @f_count = ();
    push @all, ( $f_count / $#f_count );
    $f_count = 0;
}

open FH, '>output/end_res.log' or die $!;
print FH $_, $/ for @all;
close FH;

It seems to work perfectly, except for the fact that the resulting mean value seems to be (sum * count) instead of (sum / count). I tried fiddling with the code to get it to output the ratio instead of the product, but to no avail.

For example, a file with the following comma delimited values:

1 1 1

outputs:

9

(1+1+1 * 3)

Any ideas?

Thanks,
Brian
 
I don't know anything about Perl for Macs, but this should work for Perl on nix and windows, maybe it will for Mac or will with some modification:

Code:
#!perl
use strict;
use warnings;

my @all = ();

my @files = grep{ -f $_ }glob '*.num';

foreach ( @files ){
    my @f_count = ();
    my ($l_count,$f_count) = (0,0);

    open FH, $_ or die $!;
    my $line = do { local $/; <FH> }; # read entire file into one string
    close FH;

    my @l_count = split(/\s+/,$line);
    next unless @l_count; # skip blank files if any 
    $l_count += $_ for @l_count;
    push @f_count, ( sprintf '%.2f', ($l_count / int(@l_count)) );# round to 2 decimal places

    $f_count += $_ for @f_count;
    push @all, ( sprintf '%.2f', ($f_count / int(@f_count)) );
}

open FH, '>end_res.log' or die $!;
print FH "$_\n" for @all;
close FH;
 
edit, change this line:

my @files = grep{ -f $_ }glob '*.num';

to:

my @files = grep{ -f $_ }glob '*';

I was using *.num to do a quick test on my computer.
 
I also don't know anything about Perl on Macs. I'm no mathematician. (What is a mean average? Aren't mean and average synonymous?)
Code:
grep{ -f and push @files, $_ }glob '*';
An unusual idiom. How about
my @files = grep {-f} <*>, or
while (<*>) {
next unless -f;
...

Code:
for( @l_count ) {
    $l_count += $_ for @l_count
}
I think you are looping @l_count * @l_count times here. Either get rid of the outer for loop or the for modifier. You only need one.
Code:
push @f_count, ( $l_count / $#l_count );
I think you are trying to average the values on the current line here. Your description of the program's objective didn't mention this, I think. However, $#l_count is not the number of items in @l_count. It's the highest index in @l_count, which is one less than the number of items in @l_count, since array indexing starts at 0. Using an array name in scalar context returns the number of elements in the array, so the number of elements in @l_count is @l_count. Also, why do you need parentheses around the division? Better to use them around the function arguments.
Code:
push @all, ( $f_count / $#f_count );
You have the same problem here with confusing the highest index ($#f_count) with the number of elements (@f_count).

That's the best I can do without a better understanding of what you're trying to accomplish.

HTH






 
(What is a mean average? Aren't mean and average synonymous?)

That would be the average of all the averages, the mean average, which is about as average as you could possibly get unless there is a mean average of the mean averages! ;-)
 
perl on OS/X IIRC is the same as that found on FreeBSD

type
Code:
which perl[/perl]
and put that in your shebang line
#!/usr/bin/perl

--Paul

cigless ...
 
I always thought the mean average is each item added up and DIVIDED by the total number of items...i.e. ((1 + 1 + 1)/3) which would give you 1...

If I'm assuming correctly and all your files are in the same directory I'd do the following:
Code:
$dir = 'c:/tmp/folder/files/';
opendir(DIR, $dir);
@files = readdir(DIR);
closedir(DIR);

$output = 'c:/tmp/folder/output.txt';
open (OUTPUT, ">>".$output) || die "ERROR: Unable to open output file: ".$output;
foreach $file(@files)  {
   $total = 0;
   $element_count = 0;
   open (FILE,$file) || die "ERROR: Can't open file: ".$file;
   @lines = <FILE>;
   close FILE;
   foreach $line(@lines)  {
      @elements = split/'/,$line;
      foreach $element(@elements)  {
         $total = $total + $element;
         $element_count++;
      }
   }
   $mean_average = $total / $element_count;
   print OUTPUT $mean_average."\n";
}
close OUTPUT;
please bare in mind this isn't tested, so you may have to make some adjustments...

Rob Waite
 
That looks like what the OP intended from his description (save that you're not skipping directories), but I was confused by the term mean average and the code which seemed to be averaging the values on each line.

As he's declined to enlighten us further about his intentions, who knows what it all really means? (Yuk, yuk.)
 
oh that's just mean ;-), hnaar, hnaar
--Paul

cigless ...
 
Math time! A bit on averages (using the dataset {1 2 2 5 9 11 15}):

Mean Average is the normal average you're used to. Add all the elements up and divide by the number of elements. So, with the numbers above, the mean average is roughly 6.4.

Median Average is the 'middle' of your dataset. With an even number of elements, you split the difference between the middle two. So, with the numbers given above, the median average 5.

Mode Average is the most common element in the dataset. I don't remember what you're supposed to do in the event of a tie. But, in the dataset above, the mode averages is 2.

 
Hmm, rharsh, twarn't the way I learned it back when I was a young'un. We called the first one a mean or an average (synonyms), the second a median plain and simple, and the third a mode, likewise.

'Course I'm from the States so maybe it's different over in some a them furren places, and I'm gettin' a wee bit long in the tooth, so maybe they teach young'uns different names for them things nowadays.

And then again, my math never was no better'n just average.


 
The young'uns indeed are learning differently. I heard one kid say that the probabilty of the circumference of a circle being [pi] x radius was one

--Scary


cigless ...
 
Don't get me started. I'll have everybody thinking I'm a mean ol' man with outmoded ideas.
[soapbox]
 
Not used a Mac, but I've read that you can save a perl script as a droplet (whatever that is) and execute it by dragging and dropping file(s) onto it. In the absence of a command line, I guess this invokes the script with the file names as command line arguments?

In which case,
Code:
use strict;
use warnings;

my @averages;

while (<>) {
    my ($n, $total);
    while (/(\d+)\s*/g) {
        $total += $1;
        $n++;
    }
    if (eof) {
        my $mean = $n ? $total / $n : 0; # avoid possible divide-by-zero
        push @averages, $mean;
        $total = $n = 0;
    }
}
print join("\n", @averages), "\n";
ought to do it.

Anyway Mike, I wouldn't call you mean or outmoded. You just suffer from the same set of standard deviations as the rest of us...
 
Er, no, it only averages the last line of each file. That'll teach me to generate random test data from another script[blush]
 
Moved $n and $total outside the while loop to fix it
Code:
use strict;
use warnings;

my @averages;
my ($n, $total);

while (<>) {
    while (/(\d+)\s*/g) {
        $total += $1;
        $n++;
    }
    if (eof) {
        my $mean = $n ? $total / $n : 0; # avoid possible divide-by-zero
        push @averages, $mean;
        $total = $n = 0;
    }
}
print join("\n", @averages), "\n";
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top