Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

very slow performance with hash 1

Status
Not open for further replies.

hr

Technical User
Nov 11, 2000
29
0
0
CH
Hi all
i'm putting 800 000 lines in a hash table, this takes about 7 houre

does somebody know a workaround to do this faster?

i'm using v5.6.1 - activestate 630 on a w2k box

thanks
hr
 
Can you provide some code for us to look at? The following only takes 11 seconds on my NT system.
Code:
$date = localtime();
print "$date \n";
for ($i=0; $i<800000; $i++) {
    $hash{$i} = &quot;here is a line $i&quot;;
}
$date = localtime();
print &quot;$date \n&quot;;
 
hello raider2001

here is my script

############################################################
#################### store part of tmp file in a hash table
# open tmp in file
$tmp_txt2 = &quot;c:/t/tmp.txt&quot;;
open TMP_TXT2, &quot;<$tmp_txt2&quot; or die &quot;Cannot open $tmp_txt2
for write :$!&quot;;

while(<TMP_TXT2>) {


(@key = /(.{15})/ig); # extract first 15 byte, this is the key

(@value = /(.{46})/ig); # extract first 46 byte, this is value for two lucky
numbers

$hash_table{&quot;$key[0]&quot;} = &quot;$value
[0]&quot;; # build hash,
key is key, value is value


}

# closing file handle
close (TMP_TXT2);


and here is what is inside the tmp.txt

00000000007838711486A0100000000007838729599A0100000000007838
709103A0100000000007838717508A01
00000000228354916169A0100000000228354964129A0100000000228354
967206A0100000000228354900717A0100000000228354977369A0100000
000228354995919A0100000000228354932916A010000000022835495783
0A0100000000228354949460A0100000000228354905654A01
00000000420344591459A01|23 extra Zeichen 23|
00000000574791079642A0100000000574791099334A01
00000000721166035566A0100000000721166069230A0100000000721166
077675A0100000000721166084234A0100000000721166032711A01
00000000848697295355A0100000000848697266948A0100000000848697
273774A01
00000000998869659216A0100000000998869634565A0100000000998869
602580A0100000000998869621223A0100000000998869614807A0100000
000998869680438A0100000000998869619618A010000000099886962876
2A0100000000998869666375A0100000000998869663152A010000000099
8869679170A0100000000998869673302A01
00000001180649477202A0100000001180649450256A01
00000001325566557594A01|23 extra Zeichen 23|
00000001486209368574A0100000001486209329855A0100000001486209
399142A01
00000001643347563217A01|23 extra Zeichen 23|
00000001787969809630A0100000001787969804286A0100000001787969
872685A0100000001787969898808A0100000001787969845119A0100000
001787969860500A0100000001787969844197A010000000178796987311
1A0100000001787969891745A0100000001787969895545A010000000178
7969862839A0100000001787969881823A01
00000001920812015195A0100000001920812045959A0100000001920812
048705A0100000001920812078840A0100000001920812015877A0100000
001920812016847A0100000001920812047459A010000000192081205692
7A0100000001920812015221A0100000001920812055943A010000000192
0812054381A0100000001920812066355A0100000001920812077532A010
0000001920812030094A0100000001920812061951A01000000019208120
39487A0100000001920812048281A0100000001920812019594A01000000
01920812068520A0100000001920812045893A0100000001920812076168
A0100000001920812036867A0100000001920812060858A0100000001920
812043515A0100000001920812012514A0100000001920812016384A0100
000001920812005823A0100000001920812083012A010000000192081200
8775A0100000001920812040484A0100000001920812038568A010000000
1920812026809A0100000001920812078449A0100000001920812014051A
0100000001920812006140A0100000001920812042428A01000000019208
12088681A0100000001920812088328A0100000001920812067726A01000
00001920812081278A0100000001920812074290A0100000001920812024
909A01
00000002102170874093A0100000002102170861590A0100000002102170
816167A01
00000002243288427310A0100000002243288481650A0100000002243288
435387A0100000002243288467932A0100000002243288468977A0100000
002243288491288A0100000002243288484423A010000000224328846325
2A0100000002243288477312A0100000002243288410281A010000000224
3288408921A01
00000002419572386307A0100000002419572304080A0100000002419572
309258A0100000002419572337945A0100000002419572350391A01
00000002594110057838A01|23 extra Zeichen 23|
00000002773759438457A01|23 extra Zeichen 23|

I just recognized that there are some duplicates inside the
file
 
Well I built a file with 576,911 lines and the code was much faster on my machine. I suspect that you are running low on memory. By tweaking the pattern match you can make it faster though. Here are my results:

(@key = /(.{15})/ig);
(@value = /(.{46})/ig);
$hash_table{&quot;$key[0]&quot;} = &quot;$value[0]&quot;;
original code: 24 seconds

(@key = /(.{15})/);
(@value = /(.{46})/);
$hash_table{&quot;$key[0]&quot;} = &quot;$value[0]&quot;;
remove unnecessary 'ig' in pattern matches: 8 seconds

$_ =~ m/^(.{15})(.{31})/;
$hash_table{$1} = $1 . $2;
only do 1 pattern match: 6 seconds

Good Luck
 
:-Q hello raider2001 :-Q

now i'm really fast! 211 seconds only

this my new solution:
while(<TMP_TXT2>) {


$hash_key[0] = substr ($_, 0, 15);

$hash_value[0] = substr ($_, 0, 46);

$hash_table{&quot;$hash_key[0]&quot;} = &quot;$hash_value
[0]&quot;; # build
hash, has_key is key, hash_value is value


}

thanks for your advice
hr
 
Just wondering what type of system you have. My system is an 800MHz Pentium 3 with 256 MB.
 
my system is a compaq armada e500 with 512 mb and 1000 mhz cpu, in fact my script takes 27 second now, 211 second is the time for the whole script whre the hash thing is inside
 
didn't you mean:

$hash_key[0] = substr ($_, 0, 15);
$hash_value[1] = substr ($_, 0, 46);
$hash_table{&quot;$hash_key[0]&quot;} = &quot;$hash_value[1]&quot;;

and wouldn't

$hash_table{substr ($_, 0, 15)} = substr ($_, 0, 46);

be faster?
Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.
 
hi Mike

with your idea, my script is 4 second faster

thanks!

hr
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top