Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Levenshtein String Distance

Status
Not open for further replies.

demoniac

Programmer
Jun 14, 2001
63
US
Hi all,

Rather off the wall question...but has anybody ever written a levenshtein string distance (also known as edit distance) function in asm? I thought I'd at least ask before potentially reinvinting the wheel. :)

Thanks,
rich
 
Hi Tessa,

The best definition I've found is this webpage here:


That defines exactly what the Levenshtein String Distance algorithm is and also gives examples of it in three different languages. At the bottom of the page are links to examples in even more languages too.

We've currently got the function running in C and it's optimized about as good as we can get it. It's hit very hardcore though and any improvements in throughput, no matter how miniscule, would have a huge impact for us.

So, the only thing left I could think of is to rewrite it in asm. Can't get much faster than that [assuming it's written properly]. ;-)
 
Ok, I'f seen it and now want to know how your c optimized code looks like, to get an id. (in dutch idee, meens a part
or all of what you have been doing).
I think it can be done faster in ASM but if your
optimization is as perfect as some c to asm translations
is, than only reposition of the assembly lines will give
some speed advantage.

A verry interrested Tessa
 
Hi Tessa,

Sure thing. Here is the C function as it stands now. I'm open to any suggestions you have. This is the best version of the function we've gotten so far. Runs much much faster the original implementation we had that used a dynamically sized matrix from the vector class in c++.

len1 = pmap.mvalue.GetLength();
len2 = pmap.xvalue.GetLength();
if (len1 > 0 && len2 > 0){
for (z=1;z<len1;z++){
for (x=1;x<len2;x++){
if (pmap.mvalue[z-1] == pmap.xvalue[x-1]){
cost = 0;
}
else cost = 1;


if (column[z].row[x-1]+1 < column[z-1].row[x]+1){
if (column[z].row[x-1]+1 < column[z-1].row[x-1]+cost){
k = column[z].row[x-1]+1;
}
else k = column[z-1].row[x-1]+cost;

}
else{
if (column[z-1].row[x]+1 < column[z-1].row[x-1]+cost){
k = column[z-1].row[x]+1;
}
else k = column[z-1].row[x-1]+cost;
}
column[z].row[x] = k;
}
}
mcount = column[len1-1].row[len2-1];
}
else{
if (len1 == 0) mcount = len2;
if (len2 == 0) mcount = len1;
}


I'm really eager to see what you make of it or any suggestions you may have. :)

Thanks,
Rich
 
Please use the [tt][ignore]
Code:
[/ignore][/tt]
tags when posting code.

Any chance you could wrap a simple main() around that function, along with a few example tests (and the answers you expect).
Expecially since you don't declare 'column' or 'i' in any way.

It's all very well trying out improvements, but if it produces the wrong answer, it's useless.

Also, exactly which processor / code model are you targeting (ARM/Thumb, x86 protected mode, etc etc).
Which optimisation flag(s) have you tried with your C++ compiler?
Which C++ compiler for that matter.

Have you considered the porting problems you generate by going to assembler?

I assume you used a profiler to tell you that this is the most expensive function in your code?

--
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top