Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Read data from CSV file in PERL

Status
Not open for further replies.

Courtney1306

Programmer
May 30, 2007
14
US
I am writing a program which reads a file that has three columns in excel. When a user inputs a specific number from column 1, I need to find each instance it occurs and the information in columns 2 and 3 on those same lines. How can I do this?
 
Simple parsing of CSV files can be done without the bulk of the CPAN module:

Code:
#!/usr/bin/perl
#use strict;    # turn on compiler restrictions
$|++;    # do not buffer output

my $file            = "";
my @field           = ();
my $line            = 1;
my $count           = 0;
my $specific_number = 2;

open( INFILE, "parse.csv" )
  or die("Can not open input file: $!");

while ( $file = <INFILE> ) {
    @field = parse_csv($file);
    chomp(@field);

    if ( $field[0] eq $specific_number ) {
        print "Found $specific_number on line $line:";
        print "Column 2=$field[1], Column 3=$field[2]\n";
        $count++;
    }
    $line++;
}

close(INFILE);

print "Found $specific_number $count times\n";

exit;

sub parse_csv {
    my $text = shift;
    my @new  = ();
    push( @new, $+ ) while $text =~ m{
       "([^\"\\]*(?:\\.[^\"\\]*)*)",?
           |  ([^,]+),?
           | ,
       }gx;
    push( @new, undef ) if substr( $text, -1, 1 ) eq ',';
    return @new;
}

If I use a CSV file with the following data:
[tt]
3,2,7
2,4,3
67,89,23
2,456,23
2,56,8
56,2,90
[/tt]

Then the output of the program execution is:

[tt]
Found 2 on line 2:Column 2=4, Column 3=3
Found 2 on line 4:Column 2=456, Column 3=23
Found 2 on line 5:Column 2=56, Column 3=8
Found 2 3 times
[/tt]
 
Hemo said:
Simple parsing of CSV files can be done without the bulk of the CPAN module

Yes, you can roll your own solution. However, it's silly to do so.

While most csv files are extremely simple, it's good practice to get used to using the standard modules. Text::CSV_XS is very efficient, and handles all the special cases that are so easy to mess up when rolling your own solution.

- Miller
 
I have to agree with Miller. There really isn't much "bulk" when using the modules and all the hard work has already been taken care of for you.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
Hemo

I'm with MillerH on this one. While men can still make fire by banging rocks together, since the invention of the safety match most people choose not to...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Agree with the above. And if you need it, Text::CSV is a pure Perl module so there's no problem installing it anywhere.
 
The only problem with the parse seems to be that it is only telling me where the number is found. I actually need to then retreve the rest of the data in the other columns that are on the same line. There would be two other numbers I would get. I then need to put the two other numbers in their respective colums. For example:

25 .0024 1.842
30 1.001 1.573
25 .6780 .9976

Say I'm searching for 25. .0024 and .6780 would go into a new column, column C. 1.842 and .9976 would go into new column D.

 
When you are developing across platforms for a client that insists 'modules' are not allowed, and the perl program must be totally self contained... you find the reasoning for the way I explained the solution. Hardly a 'roll your own' as this method is described early on in most of the Perl books and guides.

In my work, I cannot afford to 'assume' a customer will allow us to install code that is not written or supported by ourselves.

Courtney asked how this _could_ be done, and rather than simply pointing them to 'RTFM', I offered a possible solution.

Now, if you want to use Text:CSV, and offer a solution, go for it.

This isn't a homework assignment by chance?
 
No it isn't a homework assignment.

I am having problems with the parse_csv part of the code, though. The command prompt is saying it is an undefined subroutine. I was trying to run and manipulate that code so I completely understood it. Don't I still need to used Text::CSV to use that function?
 
Hemo,

The link to the pod documentation for Text::CSV_XS contains more than enough examples for someone to get started. And yes, it's up to the student to know that.

I believe that most of us have been in situations where a client is distrustful of cpan. However, this is normally from a state of ignorance, not one of genuine informed fear. It's up to the project lead to at least try to convince them of the benefits. Especially for something like Text::CSV or Text::CSV_XS, which are one step away from bring included as part of the perl package itself.

YMMV.

- Miller
 
Of course, there is one thing in the Text::CSV_XS examples that I don't much care for. Their error conditions use the block if form instead of a simple "or die". But that's pretty much my only complaint with it.

Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]Text::CSV_XS[/green][red];[/red]

[black][b]use[/b][/black] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$csv[/blue] = [maroon]Text::CSV[/maroon][maroon]:[/maroon]XS->[maroon]new[/maroon][red]([/red][red])[/red][red];[/red]

[olive][b]while[/b][/olive] [red]([/red]<DATA>[red])[/red] [red]{[/red]
	[blue]$csv[/blue]->[maroon]parse[/maroon][red]([/red][blue]$_[/blue][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Can't parse(): [/purple][red]"[/red] . [blue]$csv[/blue]->[maroon]error_input[/maroon][red]([/red][red])[/red][red];[/red]
	[black][b]my[/b][/black] [blue]@fields[/blue] = [blue]$csv[/blue]->[maroon]fields[/maroon][red];[/red]
	
	[olive][b]if[/b][/olive] [red]([/red][blue]$fields[/blue][red][[/red][fuchsia]0[/fuchsia][red]][/red] == [fuchsia]25[/fuchsia][red])[/red] [red]{[/red]
		[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]@fields[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
	[red]}[/red]
[red]}[/red]

[teal]__DATA__[/teal]
[teal]25,.0024,1.842[/teal]
[teal]30,1.001,1.573[/teal]
[teal]25,.6780,.9976[/teal]
[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[/ul]
Other Modules used :
[ul]
[li]Text::CSV_XS[/li]
[/ul]
[/tt]

- Miller
 
Is there something completely wrong with this? I tried modifying it tod o something similar to what I want, but the program exits after I enter the two numbers in the command prompt. Element ID is the first number i would choose, say 37. Reference Element ID would be a second, 29 for example. I need to keep track of these separately.

open (INFILE, "RETURNCAL.csv") || die ("Could not open file!");

print "Element ID? ";
$ID = <STDIN>;

print "Reference Element ID? ";
$REF = <STDIN>;

my $file = "";
my @field = ();
my $line = 1;

while ( $file = <INFILE> ) {
@field = parse_csv($file);
chomp(@field);

if ( $field[0] eq $ID ) {
print "Found element id $ID on line $line";
}
if ( $field[0] eq $REF ) {
print "Found reference element id $REF on line $line";
}
$line++;
}

close(INFILE);


sub parse_csv {
my $text = shift;
my @new = ();
push( @new, $+ ) while $text =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",?
| ([^,]+),?
| ,
}gx;
push( @new, undef ) if substr( $text, -1, 1 ) eq ',';
return @new;
}

Here is an example of the file I am trying to extract data from.

37,0.00027,5.7697
25,0.10678,4.3029
21,0.00025,2.6893
29,0.00027,1.9125
33,0.00024,4.0646
37,0.00027,3.1804
 
Courtney,

This will be the last time that I'll mention it, but don't use the roll your own solution. Besides, the the reasons that have already been mentioned, there are two other reasons why this is better. First, core libraries often have proper error checking with good error reporting mechanisms. Secondly, if you use the public code that experts are familiar with, then it will take less time to debug your code and you'll be more likely to get assistance.

Finally, don't forget to "chomp" your STDIN input.

- Miller
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top