Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Anyone create text and similarity functions in vfp

Status
Not open for further replies.

TinyNinja

Programmer
Oct 1, 2018
99
US
Hey vfp community,

I have been using Jaro–Winkler, Levenshtein, & Max Similarity in excel to clean up data and find matching pairs.

Have anyone created functions for these 3 text and similarity functions in vfp?

Any help is appreciated!
 
New to me, I know what Jaro–Winkler is, having looked it up in Google, but never needed it.

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.
 
Yes, I wrote a Levenshtein function in VFP. I published it in FoxPro Advisor. But unfortunately that was many years ago. I no longer have the article or the code, and it is no longer available on line (as far as I know).

The function turned out be quite slow, mainly because it relied on recursion.

In addition, while Levenshtein is good for comparing two string and evaluating their proximity, it is not suitable for searching a large table in VFP. That's because you can't index a string on its Levenshtein value. To do that, you would need to know what string you want to compare it with, which of course you don't know in advance.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
By the way, do you know that VFP has a DIFFERENCE() function, which is supposed to evaluate similarity between strings. If I understand it right, it works a Soundex principle, so it might be useful for searching for names that you might have mis-heard over the phone, but less useful for finding typing errors, for example.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Not come across difference() either.
One of those weekends I guess

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.
 
I'm wondering if there is something like it in the spell checker I put into VFP apps

** update **
The spell checker I'm using is FoxSpell, and it uses soundex()


Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.
 
It's also a good case to use C++ code, build a DLL from it and use that from VFP.

There are some minor hurdles and ideally, you'd even go one step further and build an FLL, but a DLL is good enough. Or an assembly, too, used with West Winds dotNetBridge as discussed in length in the thread thread184-1803903

The only downsides are further dependencies. With C++ DLLs you introduce the need for msvcrt120.dll (depending on what VS version you use). Or a .NET framework.
With VS .NET 2003 you target msvcrt71.dll, the C++ runtime VFP9 itself needs anyway, for example, which thus means no new dependency. Some code will also need the msvcp71.dll or newer version, you find out with dependency walker (depends.exe). The advantage of C++ DLL or FLL solutions is they don't need registering, just C++ DLL or DLLs.

And then you can use many more resources of implementations, even simple ones like Levenshtein from Rosetta Code:
And some minor hurdles are, that VFP may not support any type. For example the C++ std:string type used here, but there's a simple fix to change the parameter datatype to char*, which VFP does pass in when you specify STRING as datatype in DECLARE calls, and in the C++ function body set internal function variables of std::string type, it's not hard to find conversions from simpler to more complex data types, in this case you could pass in char* c1 and then simple declare the s1 variable by std:string s1=c1; std:string has a constructor that converts char*. It's also a reason to take the C++ code and compile a DLL project yourself, to be able to make such modifications allowing easier usage from VFP.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Another hurdle I just needed to fix to try this:

Create a new VS solution with the project template "DLL (Dynamic Link Library) with exports", the difference to the normal DLL project template is that "with exports" creates a header file "YourDLL.h" including the declarations of a sample function for external usage. The actual code implementations of a sample function and something else are in turn in a separate "YourDLL.cpp" code file.

And you can almost go straight to building this solution for a test and get a DLL working for VFP, too. You want to change the target platform to x86, not x64, otherwise no chance to use this from VFP. Maybe already change from Debug to Release, too, but not that important. What's far less obvious and thus most important to tell here is, that the way the declarations are written in the sample header file, VFP won't find the entry point to a function, for example for the sample DLL function fnYourDll() of the project template.

There's just a slight fix necessary, change the header file here
Code:
VS DLL with exports template header file[/ignore]]YOURDLL_API int fnYourDll(void);
and prefix it:
Code:
fixed for DLL entry point visibility to VFP[/ignore]][highlight #FCE94F]extern "C"[/highlight] YOURDLL_API int fnYourDll(void);

If you now build you can use the DLL in VFP with
Code:
*CD into the Visual Studio output path to Debug or Release folder with YourDll.DLL
DECLARE INTEGER fnYourDll in YourDll.dll
? fnYourDll() && prints 0
Watch out that in general, not just for VS C++ DLLs DECLARE is case sensitive.

To extend this, start with such a declaration in the header. A light bulb icon will appear as VS detects this declaration has no definition yet and offers to copy this over into the cpp file to match the declaration. Why at all? Ask the c++ inventors, I guess all these declarations in a header file will give a short overview of that functions are available, nowadays you could just collapse a code file to only show the declaration head of the definition to have that same overview. Anyway, that's the C++ world and this will also be found in code you may find as implementation of string function X or encryption function Y or whatever you'd like to have in VFP. It's also an extensibility vector of VFP.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Thanks for all the responses!

I never knew VFP had a difference() or soundex() function. I just played with the functions but I feel they aren't what I need.

Koen thanks for the link and I have tested those examples. I am surprised the results are whole numbers and not percentages like they show up on Excel. The whole numbers are throwing me off since I was expecting a 0-1 number (percentage).

Nigel I really like the link you gave, Thank you for that. The Jaro-Winkler function is written in MariaDB unfortunately and would have been great if it was written in both VFP and MariaDB.

Thank you Olaf for the detailed info. I might have to give that a try in the future if I can't find exactly what I am looking for. That looks like a big jump for me but I will tackle that if nothing pans out.

I am more interested in the Jaro–Winkler function.

The work I was doing that made this really help was for names. I had an HR file with employees proper names spelled out. They needed to attend a training and would not always write their name down the proper way, usually a nickname, shorthand, missing words, etc. The Jaro-Winkler and Levenshtein function helped figure out who attended the training even with their names all messed up. I noticed during this exercise that the Jaro-Winkler function return more accurate results and we ended up using that more often to find our matches. I want to use these functions in my future projects.
 
LevenShtein in its base definition is not a number between 0 and 1, it's editing distance. You can get ot to that with a small calculation, as the max value is the length of the longer string, you can divide by that length. Damerau-Levenshtein does so, too, besides other small differences to the original algorithm.

Sure, using another IDE, a language you are not at all used to and merely copy a function declaration/definition into a template in the trust it works is a big jump. But indeed you also rely on VFP code from third party without looking at all of it, just the description of the usage of it. It's obviously easier to deal with problems in VFP code, even when there would be errors.

It's something usable for many more cases than string functions, so it pays to dive into the Visual Studio IDE, the Community edition will be sufficient for such DLLs.

But talking of Jaro-Winkler, it's also not that much more C++ code:

See? and there are many more resources besides Rosetta Code, especially for C++ all kinds of GitHub repositories and other open source project platforms.

Bye, Olaf.

Olaf Doschke Software Engineering
 
TinyNinja,

scroll down further... there's a vfp class to download.

n
 
Hey Nigel, Thanks for that call out, I found it. I see it has the prg that matches the fox.wikis site but it is nice that an testing prg came with it.

Olaf, You are right and I will dabble in the visual studio IDE and see what I can create with your reccomendations.

Thank you all for the help!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top