Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

read files form directory

Status
Not open for further replies.

lillyth

Programmer
Aug 11, 2008
17
0
0
DE
Hi!

I need to read text files from a directory and do some operations on all the files at once. This in order to get frequency counts for words over all files. Any guesses to why this code is not working?
The error message is: "Cannot open 'C:\Doc...\*.txt' Invalid argument at line 12"

While reading from each file, I would also like remove punctuation marks, s.a ". , ; : ? !" etc and write the output in the same file as I read from. Any ideas on how to do that?

Best,
lillyth.


#!/usr/local/bin/perl -w

use strict;
use lib 'C:\Documents and Settings\Usr1\Desktop\L';
use Lingua::EN::Tagger;
open (OUTFILE, '>>terms.txt'); # output file
@ARGV = 'C:\Documents and Settings\Usr1\Desktop\L\textFiles\*.txt';

my $tagged_text = '';
my $p = new Lingua::EN::Tagger;

while (<>) {
while(<$_>){
my $temp = $p->add_tags( $_ );
$tagged_text = $tagged_text. $temp ;
}
}
my @word_list = $p->get_words( $tagged_text );
foreach my $word_list (@word_list) {
print OUTFILE "$word_list \n";
}

close (OUTFILE);
 
You'll probably want to look at the glob function and/or the File::Glob module.
Code:
my @files = glob "C:/Documents and Settings/Usr1/Desktop/L/textFiles/*.txt";
 
yea, you can't use a wild-card * in the @ARGV array and have it open files. That works in glob() and <> but not in @ARGV and $ARGV, which is what you are doing.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Hi again.

Thanks! I changed to the proposed
my @files = glob "C:/... ".
Now it is complaining at the second while loop. I would like to do the following:
for each file i
for each row j in i
do something.

Am I calling the open file right in :while(<$_>)?
Best,
-nina.
 
could you post a sample of a file before you run the script and also the result file that you want to get..
this way we might be able to be of more help


``The wise man doesn't give the right answers,
he poses the right questions.''
TIMTOWTDI
 
The files are two text files with the following:
computer science ( or computing science ) is the study and the science of the theoretical

and:
foundations of information .


The resulting file should be:
computer
1
science
1
information
1
people
1

Was this what you wanted to know?

-lillyth
 
maybe my request was not clear enough...anyway...
in order to do a in depth directory search for files
you can use
Code:
use strict;
use File::Find;
my $dir = 'C:\Documents and Settings\Usr1\Desktop\L\textFiles\';
find(\&do_what_ever_I_want, $dir);
# do_what_ever_I_want will run for every file in $dir.
sub do_what_ever_I_want {
    my $file = $_;
    print $file;
}
give it a try File::Find is very handy and you can narrow the results ... take a look here File::Find


``The wise man doesn't give the right answers,
he poses the right questions.''
TIMTOWTDI
 
How large are the files you're accessing? If they're not too huge, the easiest way is probably to read them into memory then overwrite the file.

Code:
my @files = glob "C:/some/dir/*.txt";

foreach my $file (@files) {
  open IN, "< $file" or die "Cannot open $file for read\n$!";
  my ($line, @temp);
  while ($line = <IN>) {
    chomp;
    # Process each line
    push @temp, $line;
  }
  close IN;

  open OUT, "> $file" or die "Cannot open $file for write\n$!";
  print OUT "$_\n" for @temp;
  close OUT;
}
I didn't get a chance to test this, but it should give you a start.
 
A small, and I am sure inadvertent, error, in perluserpengos code, in this line:

Code:
my $dir = 'C:\Documents and Settings\Usr1\Desktop\L\textFiles\';

the last backslash is actually escaping the last single-quote so perl will report a syntax error. Change to:

Code:
my $dir = 'C:/Documents and Settings/Usr1/Desktop/L/textFiles';

Windows fully supports forward slashes in directory paths and avoids the whole problem of interpolation of backslashes in perl strings.

Also, File::Find is going to drill down through all sub directories looking for all the files that match the search criteria, this may not be what you want to do, although if you did, File::Find is the ticket. If you just want to look through the one directory use glob().

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
yea Kevin is right about windows path in single quotes...I just copied the line from lillyth code..the point was not the path really, but to show the use of File::Find.
there are many ways of getting a list of the files in the dir..
if all the files that you need are in that one directory, you can open the directory and get the filenames you need like this
Code:
opendir (DIR,'C:/Documents and Settings/Usr1/Desktop/L/textFiles") || die "Can't open dir: $! \n";
my @files = grep { /\.txt$/ } readdir(DIR);
closedir DIR;

foreach my $file ( @files ) {
  do what ever i want with 'em
}
there more than one ways to do it really...depends on your style



``The wise man doesn't give the right answers,
he poses the right questions.''
TIMTOWTDI
 
Hi again!

Now I have pretty much tried every suggestion posted here.
In the particular version shown below, I get the following error:

Cannot open C:./Documents for read
No such file or directory at ... line 12

Most versions seem to have the same problem, when I want to open the current file, i.e. the following line while($line = <IN>), to perform some task on each line, it complaints.

Any further suggestions?
-lillyth


#!/usr/local/bin/perl -w

#use strict;
use lib 'C:\Documents and Settings\Usr1\Desktop\L\Curvature';
use Lingua::EN::Tagger;
open (OUTFILE, '>>termsFromDir.txt'); # output file
my @files = glob 'C:\Documents and Settings\Usr1\Desktop\L\Curvature\textFiles\*.txt';
my $tagged_text = '';
my $p = new Lingua::EN::Tagger;

foreach my $file (@files){
open IN, "< $file" or die "Cannot open $file for read\n$!";
my $line;
while($line = <IN>){
chomp;
my $temp = $p->add_tags( $_ );
print $temp;
$tagged_text = $tagged_text. $temp ;
}
close IN;
}
my @word_list = $p->get_words( $tagged_text );
foreach my $word_list (@word_list) {
print OUTFILE "$word_list \n";
}
close (OUTFILE);
 
For the sake of troubleshooting, try printing @files and see what data is actually there.

Also, try using forward-slashes (/) in your paths instead of back-slashes.
 
I cannot print out what is in @files because it is simply not reading the files... It claims it cannot open C:./Documents...
And I have tried with " " instead of ' ' and with forward-slashes as well as backslashes and dubble-slashes in case it is interpreting what is inside the quotes.

When I remove glob and use one specific document, the program works. Am I doing anything wrong with the glob-function or is there a way to get around it?
Btw, I tried out the find function and that doesn't seem to be working either.

Best,
lillyth
 
Starting at the line with the glob, try this:
Code:
my @files = glob 'C:/Documents and Settings/Usr1/Desktop/L/Curvature/textFiles/*.txt'
print "$_\n" for @files;
exit;
See if the file paths that are printed look correct (and you're sure there's no typos in the path, yeah?)
 
@rharsh:

I take the pathname from my XP-window so there should be no typos... Using your code I get empty lines when I run the program, which means that the problems come in:
Code:
open IN, "< $file" or die "Cannot open $file for read\n$!";[\code]

when trying to open $file and the response is:
Cannot open C:./Documents for read
Any further ideas?

Btw, thank you all for you help!
-lillyth, now getting a bit tired of Perl.
 
@rharsh:

I take the pathname from my XP-window so there should be no typos... Using your code I get empty lines when I run the program, which means that the problems come in:
Code:
 open IN, "< $file" or die "Cannot open $file for read\n$!"; [\code]

when trying to open $file and the response is:
Cannot open C:./Documents for read
Any further ideas?

Btw, thank you all for you help!
-lillyth, now getting a bit tired of Perl.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top