How can I make this script faster?

Kirsle · Oct 16, 2006

I found this source code that had subroutines for calculating the distance between two points on the globe. To go with this I have a flat text database of every US zip code and its coordinates.

The database file looks like this:

Code:

"48428","+42.937629","-083.154685","DRYDEN","MI","LAPEER","STANDARD"
"48429","+42.894936","-084.025960","DURAND","MI","SHIAWASSEE","STANDARD"
"48430","+42.833330","-083.763433","FENTON","MI","GENESEE","STANDARD"
"48432","+43.952572","-082.973268","FILION","MI","HURON","STANDARD"
"48433","+42.978274","-083.808006","FLUSHING","MI","GENESEE","STANDARD"
"48434","+43.663206","-082.613319","FORESTVILLE","MI","SANILAC","PO BOX ONLY"
"48435","+43.277734","-083.391990","FOSTORIA","MI","TUSCOLA","STANDARD"
"48436","+42.872466","-083.858445","GAINES","MI","GENESEE","STANDARD"
"48437","+43.018423","-083.691666","GENESEE","MI","GENESEE","PO BOX ONLY"
"48438","+42.918840","-083.512490","GOODRICH","MI","GENESEE","STANDARD"
"48439","+42.922700","-083.673760","GRAND BLANC","MI","GENESEE","STANDARD"
"48440","+42.930620","-083.360013","HADLEY","MI","LAPEER","PO BOX ONLY"
"48441","+43.867295","-082.800293","HARBOR BEACH","MI","HURON","STANDARD"

I ran a test script to see how time efficient it was for it to read through this database for two zip codes to find their coordinates.

Here's the code from my test script:

Code:

#!/usr/bin/perl -w

use strict;
use warnings;
use threads;
use threads::shared;
our $pi = atan2(1,1) * 4;

our $close : shared;
our $counter : shared;
$counter = 0;
$close = 0;

my $cnt = threads->create (sub {
	while (1) {
		if ($close == 1) {
			print "Processing took $counter seconds.\n";
			sleep 999;
		}
		elsif ($close == 2) {
			select (undef,undef,undef,0.001);
			$counter += 0.001;
		}
	}
});

print "From> ";
chomp (my $from = <STDIN>);
print "To>   ";
chomp (my $to = <STDIN>);

# Begin counting.
$close = 2;

# Open the database.
open (DB, "./ZIP_CODES.txt");
my @db = <DB>;
close (DB);
chomp @db;

my $start = {};
my $end   = {};

foreach my $line (@db) {
	$line =~ s~"~~g;
	my ($zip,$lat,$long,$city,$state,$county,$type) = split(/\,/, $line, 7);

	if ($zip == $from) {
		$start->{long} = $long;
		$start->{lat}  = $lat;
		$start->{area} = "$city, $state";
	}
	elsif ($zip == $to) {
		$end->{long} = $long;
		$end->{lat}  = $lat;
		$end->{area} = "$city, $state";
	}
}

print "Start: $start->{lat}, $start->{long}\n";
print "  End: $end->{lat}, $end->{long}\n\n";

my $bench1 = $counter;
print "Database reading took $counter seconds.\n";

# Calculate their distance.
my $miles = &distance ($start->{lat}, $start->{long}, $end->{lat}, $end->{long}, 'M');

my $bench2 = ($counter - $bench1);
print "Calculation took $bench2 seconds ($counter running total)\n\n";

print "$from ($start->{area}) is " . int($miles) . " miles away from $to ($end->{area})\n";

$close = 1;
while (1) {
	sleep 1;
}

#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#:::                                                                         :::
#:::  This routine calculates the distance between two points (given the     :::
#:::  latitude/longitude of those points). It is being used to calculate     :::
#:::  the distance between two ZIP Codes or Postal Codes using our           :::
#:::  ZIPCodeWorld(TM) and PostalCodeWorld(TM) products.                     :::
#:::                                                                         :::
#:::  Definitions:                                                           :::
#:::    South latitudes are negative, east longitudes are positive           :::
#:::                                                                         :::
#:::  Passed to function:                                                    :::
#:::    lat1, lon1 = Latitude and Longitude of point 1 (in decimal degrees)  :::
#:::    lat2, lon2 = Latitude and Longitude of point 2 (in decimal degrees)  :::
#:::    unit = the unit you desire for results                               :::
#:::           where: 'M' is statute miles                                   :::
#:::                  'K' is kilometers (default)                            :::
#:::                  'N' is nautical miles                                  :::
#:::                                                                         :::
#:::  United States ZIP Code/ Canadian Postal Code databases with latitude   :::
#:::  & longitude are available at [URL unfurl="true"]http://www.zipcodeworld.com[/URL]               :::
#:::                                                                         :::
#:::  For enquiries, please contact sales@zipcodeworld.com                   :::
#:::                                                                         :::
#:::  Official Web site: [URL unfurl="true"]http://www.zipcodeworld.com[/URL]                         :::
#:::                                                                         :::
#:::  Hexa Software Development Center © All Rights Reserved 2005            :::
#:::                                                                         :::
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

sub distance {
	my ($lat1, $lon1, $lat2, $lon2, $unit) = @_;
	my $theta = $lon1 - $lon2;
	my $dist = sin(deg2rad($lat1)) * sin(deg2rad($lat2)) + cos(deg2rad($lat1)) * cos(deg2rad($lat2)) * cos(deg2rad($theta));
  $dist  = acos($dist);
  $dist = rad2deg($dist);
  $dist = $dist * 60 * 1.1515;
  if ($unit eq "K") {
  	$dist = $dist * 1.609344;
  } elsif ($unit eq "N") {
  	$dist = $dist * 0.8684;
		}
	return ($dist);
}

#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#:::  This function get the arccos function using arctan function   :::
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
sub acos {
	my ($rad) = @_;
	my $ret = atan2(sqrt(1 - $rad**2), $rad);
	return $ret;
}

#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#:::  This function converts decimal degrees to radians             :::
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
sub deg2rad {
	my ($deg) = @_;
	return ($deg * $pi / 180);
}

#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#:::  This function converts radians to decimal degrees             :::
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
sub rad2deg {
	my ($rad) = @_;
	return ($rad * 180 / $pi);
}

# print distance(32.9697, -96.80322, 29.46786, -98.53506, "M") . " Miles\n";
# print distance(32.9697, -96.80322, 29.46786, -98.53506, "K") . " Kilometers\n";
# print distance(32.9697, -96.80322, 29.46786, -98.53506, "N") . " Nautical Miles\n";

Here's some outputs:

Code:

From> 48433
To>   90210
Start: +42.978274, -083.808006
  End: +33.786594, -118.298662

Database reading took 0.146 seconds.
Calculation took 0 seconds (0.146 running total)

48433 (FLUSHING, MI) is 1956 miles away from 90210 (BEVERLY HILLS, CA)
Processing took 0.147 seconds.

-------------------

From> 48433
To>   90210
Start: +42.978274, -083.808006
  End: +33.786594, -118.298662

Database reading took 0.176 seconds.
Calculation took 0 seconds (0.176 running total)

48433 (FLUSHING, MI) is 1956 miles away from 90210 (BEVERLY HILLS, CA)
Processing took 0.177 seconds.

Is there a way to make this faster (without all the overhaul of making an actual SQL database server or any of that)?

-------------
Kirsle.net | Kirsle's Programs and Projects

BSOD2600 · Oct 16, 2006

Look into implementing something like Dijkstra shortest path. It seems to already be implemented in

http://search.cpan.org/~dburdick/Boost-Graph-1.2/Graph.pm#dijkstra_shortest_path

stevexff · Oct 17, 2006

The problem seems to be the database lookup, not the calculation. Perhaps if you checked the start of each line with a regex to see if it matched the zip codes, and only do the rest of the parsing and splitting if it matched?

Alternatively, cache the 'database' in a hash, so that repeated lookups would use the (fast) hash key lookup rather than real i/o scanning a file?

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

rharsh · Oct 17, 2006

I agree with Steve, the script is doing a whole lot of work (a substitution, a split, etc.) before it determins whether the line contains one of the target zip codes. If you do a regex to grab the zip code, then compare it against your targets, you may get some improvement. Also, a couple of flags to track whether you've found your zip codes (something like $from_found and $to_found) could be helpful. When they're both true, you can stop looping through @db (last if $from_found && $to_found

Another thought would be to use a database rather than a flat text file. If that's not an option, you might get some speed improvement by implementing DBD::CSV.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How can I make this script faster?

Kirsle

Programmer

BSOD2600

Technical User

stevexff

Programmer

rharsh

Technical User

Similar threads

Part and Inventory Search

Sponsor