Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help with scanning a huge text file 3

Status
Not open for further replies.

jackz15

Programmer
Jun 28, 2006
103
US
hi i have to write a perl program which takes in a word from the user and then returns all the possible anagrams from it(anagrams are just the words formed by scrambling up the current word). It has to check that the anagrams actually exist in this 5mb text file, or web page containing it. Then it prints out the new rods. I'm thinking of using LWP:simple to read right off the web page, but the problem is performance, wouldn't it be extremely laggy to read through the page everytime the uer enter a new word? Is there a way to increase the performance?
thanks in advance
 
You could set up a cron job to read the web page periodically to pick up any new words available. That way, the user doesn't have to wait each time for the request to hit the webpage and return data.

- George
 
Does the 5MB text file change often? Is it a 5MB web page? You definetly want to store the text file locally if possible.
 
Also, you might like to consider setting up the local file as a btree or hash for faster lookups.


Trojan.
 
the file does not change and theres only one word per line.
so no need to set up cron job.theres about 450000 words in it so wouldn't setting up a hash for faster lookup take a long time?
 
i have decided to read the text file, the first time it will look through the file for the words, while its looking if it sees a word that doesn't fit then it will remember it by inserting itself in a hash. So in the future the program first checks the hash, and then the file. Now this all sound nice, but i'm having trouble making all the anagrams of the word... help?
 
lets see he code you have been using to try and make the anagrams.
 
i just discovered NDBM_File.pm's use, it seems that combined with tie it will be able to put everything in a hash for me, much faster than what i could script. But i can't seem to be able to install that module. CPAN gives me the entire perl distribution saying the the module comes with the latest version of perl; i use activerperl and it doesn't have that module. Can someone give me a link to download the module or the source code>?
 
nvm found the source, and so i manually created the module but now it is coming out with the errors:
Can't locate loadable object for module NDBM_File in @INC (@INC contains: C:/Perl/site/lib C:/Perl/lib .) at E:\anagrams.pl
line 4
Compilation failed in require at E:\anagrams.pl line 4.
BEGIN failed--compilation aborted at E:\anagrams.pl line 4.
there isn't much code in the module, so here it is:
package NDBM_File;

use strict;
use warnings;

require Tie::Hash;
use XSLoader ();

our @ISA = qw(Tie::Hash);
our $VERSION = "1.06";

XSLoader::load 'NDBM_File', $VERSION;

1;

__END__
what is the problem here?
 
here is my code too, this probably is where the error is:
#!\usr\bin\perl -w
use strict;
use Fcntl; # For O_RDWR:read and write
use NDBM_File;
tie(%w, 'NDBM_File', 'words.txt', O_RDWR|O_CREAT, 0666)
or die "Couldn't tie NDBM file 'filename': $!; aborting";


while(<>){
foreach my $e(/\w+/g){
my @a=split(//,$e);
sort @a;
my %s=map(@a => $_)%w
print "$_ $match\n" if $match;





}

}
 
First, to get the anagrams of a word, use List::permutor, it makes it very simple to get all the permutations of a list. You will need to install the module.


Code:
use List::Permutor;
my $word = 'perl';
my $perm = new List::Permutor split(//,$word);;
my @anagams = ();
{
   local $" = '';
   while (my @set = $perm->next) {
      push @anagrams,"@set";
   }
}
print "$_\n" for @anagrams;

But keep in mind, even a modest 6 letter word has 720 permutations, not all will be unique if there are repeated letters. Add one more letter to the word and there are a whopping 5,040 permutations.
 
or, though i don't dare try it myself, you could try archiving the file every time it is accessed, then simply find the differences and work with those.

--------------------------
The best answer to your question will definitely be RTFM.
 
If you want to get everything into a hash, you can just make hash keys and use Data::Dumper (a standard module) to write it into a file for easy reloading later.

Code:
use Data::Dumper;

my %words = ();

open (IN, "./wordlist.txt");
while (<IN>) {
   chomp;
   $words{$_} = 1;
}
close (IN);

open (OUT, ">./outfile.txt");
print OUT Dumper (%words);
close (OUT);

And when you want to load the hash again later:

Code:
my %hash = do "outfile.txt";
 
thanks guys, now i've got all the permutations and ability to store them in hashes. But i have trouble comparing them from the user's input, it seems that everytime the comparison fails or succeeds the loop starts over again and finds the same anagram. I cannot figure out how to bypass that perfectly, theres always errors...
 
#!\usr\bin\perl -w
use strict;
use List::permutor;
my $word;
my @split;
my %h;
my @tot;
my $num='0';
my $anum='0';
my $n='0';
ENTRY:while(<>){
chomp;
$word = $_;

@split=split "", $word;

last;
}
my $perm = new List::permutor split(//,$word);
open (FILE, "E://words.txt");
my @anagrams = ();
{
local $" = '';
while (my @set = $perm->next) {

while(defined $set[$n]){
push @anagrams,$set[$n];

$n++;
}
}

}

while(<FILE>){

for my $words(/\b[@split]{1,}?\b/ig){

if(length($word) eq length($words)){
push @tot, $words;

}
}
}

while(defined $tot[$num]){

if($tot[$num] eq $anagrams[$anum]){
print $tot[$num] . "matches" . $anagrams[$anum] . "\n";
$anum='0';
$num++;
}else{
if(defined $anagrams[$anum++]){
}else{
print "undef";
$num++;
$anum='0';
}
}

}
 
so basically my code first takes in words in the dictionary that is the same length and contains one or more of the same letters compared to the word erntry. This just narrows the choices down a bit. So then it pushes everything into an array. After that it goes on a loop comparing the array item with the anagram results that was also pushed into another array, so it compares the first item...and if it matches then print it and go on to next number, else go to the next item on anagrams array. anagrams array reaches to the end then go on to the next item for @tot(the array of words).
it comes out with :
...
Use of uninitialized value in string eq at E:\permutor.pl line 48, <FILE> line 479625.
Use of uninitialized value in string eq at E:\permutor.pl line 48, <FILE> line 479625.
Use of uninitialized value in string eq at E:\permutor.pl line 48, <FILE> line 479625.
Use of uninitialized value in string eq at E:\permutor.pl line 48, <FILE> line 479625.
undefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefundefunde
fundef
why?
 
#!\usr\bin\perl -w
use strict;
use List::permutor;
my $word;
my @split;
my %h;
my @tot;
my $num='0';
my $anum='0';
my $n='0';
ENTRY:while(<>){
chomp;
$word = $_;

@split=split "", $word;

last;
}
my $perm = new List::permutor split(//,$word);
open (FILE, "E://words.txt");
my @anagrams = ();
{
local $" = '';
while (my @set = $perm->next) {
my $n='0';
my $co;
while(defined $set[$n]){
$co=$co. $set[$n];
$n++;
}
push @anagrams,$co;


}

}

while(<FILE>){

for my $words(/\b[@split]{1,}?\b/ig){

if(length($word) eq length($words)){
push @tot, $words;

}
}
}

while(defined $tot[$num]){

if($tot[$num] =~ /$anagrams[$anum]/i){

print "Anagram of " . $anagrams[$anum] . ": ". $tot[$num] . "\n";
$anum='0';

$num++;
}else{
$anum++;
if(exists $anagrams[$anum]){
}else{

$num++;
$anum='0';

}
}

}

hee hee figured it out! List::permutator gets every letter one by one so i have to fuse those letters first or else i can't compare with them! Do you still see some errors in this code though?
 
its quite slow though is there a way to speed it up?
 
is there a way for List::permutor to only find permutations that is the same length as the word entered? cause it goes in a loop starting off with one lettered permutations, then two, thens three....
how do i change that?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top