Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Advice on developing a MultiLingual Site 2

Status
Not open for further replies.

Itshim

Programmer
Apr 6, 2004
277
US
I am currently developing my first multilingual site (what a pain in the A$$).

I have done research on the gettext extension and I've read that every time a new translation is made the server will need to be rebooted so the config file can be loaded again. (If this is not true, please let me know.)

Anyway, because of this I am forced to come up with my own translation code. What I decided on was storing each language in its own file, then using parse_ini_file(), to import the text into my scripts.

This option has been working great, but as I get deeper into the project each language file seems to be growing very quickly. I'm figuring each file (when finished) will be around 1Mb or larger, so...I was wondering if it would be more efficient to store the text in a database and import it with a select statement.

The website uses a MySQL back-end, and I would estimate that over 2/3s of the pages connect to the database; more often than not the db connection already exists, so we don't have to factor in connection overhead, but... into this equation I would also like to consider maintenance time.

If anyone has had previous experience with a similar situation, or can make an educated guess any advice would be appreciated.

Thanks,
Itshim
 
i've used both solutions with no problems. i don't find the processing overhead to troublesome with either solution but i'd be very surprised to see a 1mb language file.

some things to bear in mind in choosing:

1. ISPs often limit the database to a much lower size than the filesystem. My own, for example, allows be 2gb of hosting and 100mb of database. this leads me to store images and cached thumbnails etc in the filesystem rather than the database.

2. when you do a bulk language insert (a new translation) it will be cumbersome to load 1mb of language strings into a database via a webpage. thus you might consider allowing at least the creation of the language file through an ini type text file (for example, it can be done offline etc). and once you've done this why bother shoving it into a database.

3. isn't it possible to break down your language files a wee bit so that the entire language file need not be processed for each page. does your website have a sectioned metaphor that might allow you to have a number of sets of language files that would give you a better tradeoff between ease of maintenance and parse-time?

Justin
 
I've never had to write an internationalized site from scratch, but I have worked with one.

I've set up the OSCommerce online sales package several times and customized it for customers.

OSCommerce doesn't use gettext. Instead, all text that is to be displayed is set up in constants using define(). All the defines for a particular language are in the same file. For example, the file en.inc might read:

Code:
<?php
define (GREETING, 'Welcome');
?>

and the file de.inc might read:

Code:
<?php
define (GREETING, 'Willkommen');
?>

Once the script determines the language to use from the $_SERVER['HTTP_ACCEPT_LANGUAGE'] variable, it simply includes the appropriate language file. All print statements from that point on reference the contstants. For example greeting

Code:
<?php
$language_array = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE'], 2);

$language_file = 'languages/' . $language_array[0] . '.inc';

if (file_exists ($language_file))
{
	include ($language_file);
}
else
{
	include ('languages/en.inc');
}

print '<html><body>';

print GREETING;

print '</body></html>';
?>

Want the best answers? Ask the best questions!

TANSTAAFL!!
 
Justin
i'd be very surprised to see a 1mb language file
You are probably correct in that I estimated on the high side, it just seems that every time I print anything to the screen I have to add two lines to every translation file.

2. when you do a bulk language insert (a new translation)...
Very good point.

3. isn't it possible to break down your language files a wee bit...
Yes, when I originally started I had all languages in one file, and saw that it was going to get messy fast, hence I seperated each language out into its own file. Now that I have reached this point I was debating the db option thinking it would be easier to keep a db clean and organized rather than maintaining multiple files for each language.

sleipnir
I like this method best, specifically for ease of use, and maintinace. I was originally going to use this setup, but I must also use Smarty and I thought that since: parse_ini_file() returns an associative array I could grab the translations and assign them to Smarty variables all in one shot. (Smarty variable name = key in translation array).

Since seeing your suggestion I did a little more research and found you can access PHP constants directly from within Smarty, using: {$smarty.const._CONSTANT_NAME}, meaning there is no need for the $smarty->assign() call at all. Now that is sweet, nothing like bumming lines you thought were manditory.

So at this moment I feel my best option is to use files instead of the db, and to use contants instead of ini files. Then will decide on a solid organizational structure for breaking up each langugage into multiple files and finally running some tests to determine if breaking up the one to one (language = 1 file) will show any real performance gains.

Thank you both for the suggestions.
Itshim
 
Sorry one last question...

I am going to use this code:
Code:
<?php
$language_array = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE'], 2);

$language_file = 'languages/' . $language_array[0] . '.inc';
?>
posted by sleipnir. I noticed that $_SERVER['HTTP_ACCEPT_LANGUAGE'] is obviously set by the browser. Does anyone know of a browser that sends a different string than the standard?

For example: Firefox sends 'en-us' for english, and XXBrowser sends 'english'.

Kinda like $_FILES['type'] where Firefox sends 'image/pjpeg' and IE sends 'image/jpeg'

Thanks Again
 
Out of curiosity guys could you tell me what the language files do?
My first thought was that the textual content would be created for each language supported and you would just load the page content written in that language.
It sounds however as if you use some form of word matching to present a translation of the page in the specified language?

Just curious. I am not likely to have to create a multi-lingual site anytime soon but I like to know the why/what/how of things as it always seems to come in handy later on.
At the moment I am working on other forms of accessibility such as physical or visual impairment.

Thanks.


Stamp out, eliminate and abolish redundancy!
 
The method I've specified looks at the user's language prefrences, as reported by the browser. It then provides all text in that language (if possible) through the use of constants that are defined as needed.

Want the best answers? Ask the best questions!

TANSTAAFL!!
 
That's what I am not clear on. Are the constants defined by stating that this-english-word = this-other-language-word so that one word is replaced by it's corresponding match in the required language?

Does this work well in translation or is it a bit jumpy in like when I use Babelfish to read something from another language site?


Stamp out, eliminate and abolish redundancy!
 
I'm sorry. My example code was vague on this point.

It's not a word-by-word translation. It's an utterance by utterance translation. If an internationalized e-commerce site has 11 paragraphs of uninterrupted text on its return policies page, then that text will be referenced as a single defined constant. And the script will automatically define the string to which that constant expands through file inclusions.

It may be necessary elsewhere to break text up into single words. But what are contained in the internationalized text blocks are usually defined entirely by how you need to break up your text.

Want the best answers? Ask the best questions!

TANSTAAFL!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top