Help with scanning a huge text file 3

jackz15 · Aug 25, 2006

hi i have to write a perl program which takes in a word from the user and then returns all the possible anagrams from it(anagrams are just the words formed by scrambling up the current word). It has to check that the anagrams actually exist in this 5mb text file, or web page containing it. Then it prints out the new rods. I'm thinking of using LWP:simple to read right off the web page, but the problem is performance, wouldn't it be extremely laggy to read through the page everytime the uer enter a new word? Is there a way to increase the performance?
thanks in advance

Rieekan · Aug 25, 2006

You could set up a cron job to read the web page periodically to pick up any new words available. That way, the user doesn't have to wait each time for the request to hit the webpage and return data.

- George

KevinADC · Aug 25, 2006

Does the 5MB text file change often? Is it a 5MB web page? You definetly want to store the text file locally if possible.

TrojanWarBlade · Aug 26, 2006

Also, you might like to consider setting up the local file as a btree or hash for faster lookups.

Trojan.

jackz15 · Aug 26, 2006

the file does not change and theres only one word per line.
so no need to set up cron job.theres about 450000 words in it so wouldn't setting up a hash for faster lookup take a long time?

jackz15 · Aug 26, 2006

i have decided to read the text file, the first time it will look through the file for the words, while its looking if it sees a word that doesn't fit then it will remember it by inserting itself in a hash. So in the future the program first checks the hash, and then the file. Now this all sound nice, but i'm having trouble making all the anagrams of the word... help?

KevinADC · Aug 26, 2006

lets see he code you have been using to try and make the anagrams.

jackz15 · Aug 26, 2006

i just discovered NDBM_File.pm's use, it seems that combined with tie it will be able to put everything in a hash for me, much faster than what i could script. But i can't seem to be able to install that module. CPAN gives me the entire perl distribution saying the the module comes with the latest version of perl; i use activerperl and it doesn't have that module. Can someone give me a link to download the module or the source code>?

jackz15 · Aug 26, 2006

nvm found the source, and so i manually created the module but now it is coming out with the errors:
Can't locate loadable object for module NDBM_File in @INC (@INC contains: C:/Perl/site/lib C:/Perl/lib .) at E:\anagrams.pl
line 4
Compilation failed in require at E:\anagrams.pl line 4.
BEGIN failed--compilation aborted at E:\anagrams.pl line 4.
there isn't much code in the module, so here it is:
package NDBM_File;

use strict;
use warnings;

require Tie::Hash;
use XSLoader ();

our @ISA = qw(Tie::Hash);
our $VERSION = "1.06";

XSLoader::load 'NDBM_File', $VERSION;

1;

__END__
what is the problem here?

jackz15 · Aug 26, 2006

here is my code too, this probably is where the error is:
#!\usr\bin\perl -w
use strict;
use Fcntl; # For O_RDWR:read and write
use NDBM_File;
tie(%w, 'NDBM_File', 'words.txt', O_RDWR|O_CREAT, 0666)
or die "Couldn't tie NDBM file 'filename': $!; aborting";

while(<>){
foreach my $e(/\w+/g){
my @a=split(//,$e);
sort @a;
my %s=map(@a => $_)%w
print "$_ $match\n" if $match;

}

}

KevinADC · Aug 26, 2006

First, to get the anagrams of a word, use List:

ermutor, it makes it very simple to get all the permutations of a list. You will need to install the module.

Code:

use List::Permutor;
my $word = 'perl';
my $perm = new List::Permutor split(//,$word);;
my @anagams = ();
{
   local $" = '';
   while (my @set = $perm->next) {
      push @anagrams,"@set";
   }
}
print "$_\n" for @anagrams;

But keep in mind, even a modest 6 letter word has 720 permutations, not all will be unique if there are repeated letters. Add one more letter to the word and there are a whopping 5,040 permutations.

k1ng0fn3rd · Aug 26, 2006

or, though i don't dare try it myself, you could try archiving the file every time it is accessed, then simply find the differences and work with those.

--------------------------
The best answer to your question will definitely be RTFM.

Kirsle · Aug 26, 2006

If you want to get everything into a hash, you can just make hash keys and use Data:

umper (a standard module) to write it into a file for easy reloading later.

Code:

use Data::Dumper;

my %words = ();

open (IN, "./wordlist.txt");
while (<IN>) {
   chomp;
   $words{$_} = 1;
}
close (IN);

open (OUT, ">./outfile.txt");
print OUT Dumper (%words);
close (OUT);

And when you want to load the hash again later:

Code:

my %hash = do "outfile.txt";

jackz15 · Aug 27, 2006

thanks guys, now i've got all the permutations and ability to store them in hashes. But i have trouble comparing them from the user's input, it seems that everytime the comparison fails or succeeds the loop starts over again and finds the same anagram. I cannot figure out how to bypass that perfectly, theres always errors...

KevinADC · Aug 27, 2006

lets see your code.

jackz15 · Aug 28, 2006

#!\usr\bin\perl -w
use strict;
use List:

ermutor;
my $word;
my @split;
my %h;
my @tot;
my $num='0';
my $anum='0';
my $n='0';
ENTRY:while(<>){
chomp;
$word = $_;

@split=split "", $word;

last;
}
my $perm = new List:

ermutor split(//,$word);
open (FILE, "E://words.txt");
my @anagrams = ();
{
local $" = '';
while (my @set = $perm->next) {

while(defined $set[$n]){
push @anagrams,$set[$n];

$n++;
}
}

}

while(<FILE>){

for my $words(/\b[@split]{1,}?\b/ig){

if(length($word) eq length($words)){
push @tot, $words;

}
}
}

while(defined $tot[$num]){

if($tot[$num] eq $anagrams[$anum]){
print $tot[$num] . "matches" . $anagrams[$anum] . "\n";
$anum='0';
$num++;
}else{
if(defined $anagrams[$anum++]){
}else{
print "undef";
$num++;
$anum='0';
}
}

}

jackz15 · Aug 28, 2006

so basically my code first takes in words in the dictionary that is the same length and contains one or more of the same letters compared to the word erntry. This just narrows the choices down a bit. So then it pushes everything into an array. After that it goes on a loop comparing the array item with the anagram results that was also pushed into another array, so it compares the first item...and if it matches then print it and go on to next number, else go to the next item on anagrams array. anagrams array reaches to the end then go on to the next item for @tot(the array of words).
it comes out with :
...
Use of uninitialized value in string eq at E:\permutor.pl line 48, <FILE> line 479625.
Use of uninitialized value in string eq at E:\permutor.pl line 48, <FILE> line 479625.
Use of uninitialized value in string eq at E:\permutor.pl line 48, <FILE> line 479625.
Use of uninitialized value in string eq at E:\permutor.pl line 48, <FILE> line 479625.
undefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefunde
fundef
why?

jackz15 · Aug 28, 2006

#!\usr\bin\perl -w
use strict;
use List:

ermutor;
my $word;
my @split;
my %h;
my @tot;
my $num='0';
my $anum='0';
my $n='0';
ENTRY:while(<>){
chomp;
$word = $_;

@split=split "", $word;

last;
}
my $perm = new List:

ermutor split(//,$word);
open (FILE, "E://words.txt");
my @anagrams = ();
{
local $" = '';
while (my @set = $perm->next) {
my $n='0';
my $co;
while(defined $set[$n]){
$co=$co. $set[$n];
$n++;
}
push @anagrams,$co;

}

}

while(<FILE>){

for my $words(/\b[@split]{1,}?\b/ig){

if(length($word) eq length($words)){
push @tot, $words;

}
}
}

while(defined $tot[$num]){

if($tot[$num] =~ /$anagrams[$anum]/i){

print "Anagram of " . $anagrams[$anum] . ": ". $tot[$num] . "\n";
$anum='0';

$num++;
}else{
$anum++;
if(exists $anagrams[$anum]){
}else{

$num++;
$anum='0';

}
}

}

hee hee figured it out! List:

ermutator gets every letter one by one so i have to fuse those letters first or else i can't compare with them! Do you still see some errors in this code though?

jackz15 · Aug 28, 2006

its quite slow though is there a way to speed it up?

jackz15 · Aug 28, 2006

is there a way for List:

ermutor to only find permutations that is the same length as the word entered? cause it goes in a loop starting off with one lettered permutations, then two, thens three....
how do i change that?

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Help with scanning a huge text file 3

Programmer

Programmer

Technical User

Programmer

Programmer

Programmer

Technical User

Programmer

Programmer

Programmer

Technical User

Programmer

Programmer

Programmer

Technical User

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor