Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

soundex for arabic language

Status
Not open for further replies.

yahiadal

Programmer
Sep 19, 2006
39
0
6
US
Hi all
I am in need of a soundex algorithm supporting arabic language.All what I found is a php class,but I have no experience with php to translate the class into vfp>Any help will be appreciated>Following is the php class:

<?php
// ----------------------------------------------------------------------
// Copyright (C) 2006 by Khaled Al-Shamaa.
// // ----------------------------------------------------------------------
// LICENSE

// This program is open source product; you can redistribute it and/or
// modify it under the terms of the GNU General Public License (GPL)
// as published by the Free Software Foundation; either version 2
// of the License, or (at your option) any later version.

// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.

// To read the license please visit // ----------------------------------------------------------------------
// Class Name: Arabic Soundex
// Filename: ASoundex.class.php
// Original Author(s): Khaled Al-Sham'aa <khaled.alshamaa@gmail.com>
// Purpose: Arabic soundex algorithm takes Arabic word as an input
// and produces a character string which identifies a set words
// that are (roughly) phonetically alike.
// ----------------------------------------------------------------------

class ASoundex {
var $asoundexCode = array('/ا|و|ي|ع|ح|ه/',
'/ب|ف/',
'/خ|ج|ز|س|ص|ظ|ق|ك|غ|ش/',
'/ت|ث|د|ذ|ض|ط|ة/',
'/ل/',
'/م|ن/',
'/ر/'
);

var $aphonixCode = array('/ا|و|ي|ع|ح|ه/',
'/ب/',
'/خ|ج|ص|ظ|ق|ك|غ|ش/',
'/ت|ث|د|ذ|ض|ط|ة/',
'/ل/',
'/م|ن/',
'/ر/',
'/ف/',
'/ز|س/'
);

var $transliteration = array('ا' => 'A',
'ب' => 'B',
'ت' => 'T',
'ث' => 'T',
'ج' => 'J',
'ح' => 'H',
'خ' => 'K',
'د' => 'D',
'ذ' => 'Z',
'ر' => 'R',
'ز' => 'Z',
'س' => 'S',
'ش' => 'S',
'ص' => 'S',
'ض' => 'D',
'ط' => 'T',
'ظ' => 'Z',
'ع' => 'A',
'غ' => 'G',
'ف' => 'F',
'ق' => 'Q',
'ك' => 'K',
'ل' => 'L',
'م' => 'M',
'ن' => 'N',
'ه' => 'H',
'و' => 'W',
'ي' => 'Y'
);
var $len;
var $lang;
var $code;

function ASoundex($len=4, $lang='en', $code='soundex'){
$this->len = $len;
$this->lang = $lang;
$this->code = $code;
}

/**
* @return String : the calculated soundex/phonix numeric code
* @param String : the word that we want to encode it
* [soundex|phonix] : define mapping code to be used in this converting
* @desc mapCode : methode to create soundex/phonix numric code for a given word
* @author Khaled Al-Shamaa
*/
function mapCode($word){
$encodedWord = $word;

if($this->code == 'phonix'){ $map = $this->aphonixCode; }else{ $map = $this->asoundexCode; }

foreach($map as $code=>$condition){
$encodedWord = preg_replace($condition, $code, $encodedWord);
}
$encodedWord = preg_replace('/\D/', '0', $encodedWord);

return $encodedWord;
}

function trimRep($word){
$chars = preg_split('//',$word);

foreach($chars as $char){
if($char != $lastChar){ $cleanWord .= $char; }
$lastChar = $char;
}

return $cleanWord;
}

function soundex($word){
list($dump, $soundex, $rest) = preg_split('//',$word,3);

if($this->lang == 'en'){ $soundex = $this->transliteration[$soundex]; }

$encodedRest = $this->mapCode($rest);
$cleanEncodedRest = $this->trimRep($encodedRest);

$soundex .= $cleanEncodedRest;

$soundex = preg_replace('/0/', '', $soundex);

$totalLen = strlen($soundex);
if($totalLen > $this->len){
$soundex = substr($soundex, 0, $this->len);
}else{
$soundex .= str_repeat('0', $this->len - $totalLen);
}

return $soundex;
}
}

thank you
yahya
 
It would be asking a lot of members of this forum to translate the above PHP function into VFP for you. It might be better if you try to understand how Soundex works (and it's really not complicated), then try to write your own function in VFP.

That said, have you tried using VFP's built-in SOUNDEX() function with Arabic text? It probably won't work, but it should be the first thing to try.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Hi Mr. Lewis
I used soundex with english names with no problem but with no success in arabic.I tried to build my own program for arabic soundex depending on some readings about the subject but got some results which are not logical in some cases.The above php program is said to be more precise but its logic is not documented to build anew one in vfp. That's why I am asking to translate it to vfp being with no knowledge of php.
thank you

yahya
 
Yahiadal,
This may be much more difficult than you might expect as well. I note in the code alone that it contains both double byte and single byte values. Unless your VFP application is written using the double-byte character getting VFP to display the value after creation is incredibly complicated. I just tried to even paste one of the arrays into a VFP code window, and I get only "?" = 'S' for example. All arabic characters are lost. So this is very complicated.

Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
YOU ARE RIGHT Mr. Scott.I'll try to change the ?s with corresponding arabic letters in vfp and repost the php code.
Thank you
yahya
 
Yahya,
It's unlikely that will be successful. The issue is far more complicated, and related to the fact that VFP can not change unicode in the same form (if I recall correctly).

Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
I am very sorry.The attached file contains some letters not well translated.I will revise it and repost.
yahya
 
The attached file contains some letters not well translated.I will revise it and repost.

That's not the point. The point that Scott was making is that VFP does not handle double-byte characters in the same way as in your PHP code. He mentioned the problem of seeing question marks instead of Arabic characters just as an example of the sort of complication he is warning against. Fixing that in the file that you posted won't change the underlying problem.

That said, I think it should be possible to do this, but it's likely to be more difficult than simply converting PHP syntax to VFP.

Have you tried looking for an external control - such as an Activex control or a web service - that can handle Arabic Soundex?

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
This doesn't work out, as Western Windows Versions don't have the arabic codepage 1256 installed, you have to understand such support is quite impossible for non arabic developers.

I can only give you one major idea implemented here, the $transliteration array. In VFP you can do something like these character translations all in just one CHRTRAN:
Code:
lcAccentsText = 'Café'
lcLatinOnly = CHRTRAN(lcAccentsText,'áàéèíìó','aaeeiio'))
? lcLatinOnly && will print Cafe

If you do this the accents are "removed" and you can do a similar thing - in your case with single arabic alphabet letters or syllables (sorry, I don't know how to name these glyphs), so they will translate to latin letters. I have no idea which character VFP will take for which, when you enter in both right to left and left to right justified texts, but typically CHRTRAN translates the first letter in the second parameter with the first letter in the third parameter, whenever a letter of the second parameter is found in the first parameter. Sounds more complicated than it is, CHRTRAN is like a single call to a series of single letter STRTRAN replacements and second and third param are original/replaced charset.

Once you have latin letters (I assume A-Z will have same ASC() codes in codepage 1256) you might take that as representation or take soundex of that latin letters, though what you get from tranlsation is not really English.

Bye, Olaf.
 
In fact this php code is the only one I could find.I had translated the arabic letters so vfp can see them correctly.If I can translate the logic of php to vfp then may be it will work.
yahya
 
Thank you Mr. Olaf.I understand the complexity of the topic.I will try to study php syntax so i can understand the php program logic

yahya
 
>I had translated the arabic letters so vfp can see them correctly.

Well, this is surely what you see, but as said non arabic windows versions don't have the ANSI character sets vf will need to display that as arabic letters. We have unicode and UTF-8 and can see the arabic letters here in html, but that does not translate into VFPs anis charsets. Only on your most probably arabic Windows version. If you modify the PRG the command window surely will echo "...php_soundex.prg as 1256", but in european or american Windows this just causes the error "Codepage number is invalid". When we edit as 1252 (for example) this will not look arabic. So no real way to help you. Installing such codepage is not a single thing you can do, you would need to install the whole arabic language, but when switching to it, I couldn't operate in Windows anymore.

Bye, Olaf.
 
you are absolutely right Mr. Olaf.
I found a program written in c# with same language encoding problem but I was able to translate it to vfp by rewriting the arabic letters in the code in place of the ?s that appeared.The program worked fine but as the his author said is a betta and need much refinement.
AS i siad if I can understand the php program logic then the translation could be done by manually rewrite the arbic letters in it.
thank you
yahya
 
AS i siad if I can understand the php program logic then the translation could be done by manually rewrite the arbic letters in it.

I'm not convinced that's the correct approach, but if you want to try it, it won't be too difficult for you to learn enough PHP to do the translation yourself. The syntax is not so very different from many other languages, including VFP.

Keep in mind these points:

1. Variable names start with $.

2. = is used to assign a value to a variable, while == is used in conditions to test for equality.

3. .= (dot equals) is like += in other langauges. So a.=1 is the same as a=a+1.

4. { and } are ued to delimit blocks of code in control structures such as if and for.

5. A single dot is used to concatenate strings.

6. Conditions (following if, for, etc.) are enclosed in parentheses.

7. All statementents are terminated by a semi-colon.

Obviously there's a lot more to it than that, but the above should give you a start in translating from PHP to VFP. There are plenty of references and tutorials available on line if you get stuck.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Thank you Mr. Lewis. It's not pad to learn php and I will do it and let you know of any progress in this subject.

yahya
 
Hi
This version of soundex for arabic names is inspired from arsoundex php version
It translate the Arabic string into phonetically equivalent English string using a mapping table
It then apply vfp soundex function to the result string to get the result
Any remarks or improvements are appreciated
To use this program you must be able to run vfp in windows with Arabic support

Yahya
 
 http://files.engineering.com/getfile.aspx?folder=59461917-a010-4813-9e59-f0d12c881438&file=ar_soundex.zip
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top