Localizing your site
J. Scott Johnson on 2002 April 06
    

One of the topics that everyone likes to ignore is localization. By localization, I mean the process of making a website or software product usable by users in other nations (or simply with a different language than yours). Everyone ignores this until they absolutely have to do it -- and then its always something like this:





    Horribly difficult 
    Very expensive 
    Time consuming 
    Just plain awful 



In my career, I've seen this repeatedly. I could tell you tales about getting software into German, the arcane nature of sorting and word breaks in Polish or the (far worse) issues with Japanese. But, we don't want to get lost in the details. Since we're starting at the beginning, we're going to do just that -- hit the basics first.

The Basics of Localization
There are really three aspects to localization of a website:



    User Interface 
    Data Handling 



Sorting, strangely referred to as "Collation Sequences"
This article will only cover the user interface. This is the only aspect of localization that can really be considered easy and its also what most people think of when this topic comes up.

What does a "Localized" User Interface mean?
When I think of a localized user interface, it's really simple when you come right down to it -- I want it to speak my language, not English. If I'm Spanish and I come to a website, I want to see "Agregue", not "Home" when I'm looking for the link to the home page. I may not see the content of the website in my language but I ought to at least be able to use the website easily.

Get Rid of those Strings! Or Out Damned String!
At its core, localization basically boils down to eliminating every bit of "literal" text from every single web page -- text that is, that is used as part of the user interface for your site. I bet this sounds, well, scary. Yes. It is. The longer you take to do it, the scarier it is. It's a lot better to start at the beginning of the project rather than after you're done.

Overview
If you think about it, here's what we have to do to get rid of strings:



    1. Store the strings somewhere that lets them be translated. 
    2. Replace the strings in our web pages with something that represents them. 
    3. Figure out what language the user wants and save that for the future. 
    4. Dynamically insert the translated strings into our web pages. 





Step 1: Store the Strings
If we're going to store the strings, we need somewhere to store them. There are several ways to do this but we're going to use MySQL. As you've probably learned, MySQL is a wonderful database for use with PHP and it makes database applications fast and easy. That's good.

We need to start by thinking about what a "string" means. It could be:



    Text of a link 
    Text on a command button 
    Field labels 
    Instructional text 



Something else
Hmm... As we think about this, it's pretty clear that there are different "types" of strings that we need to name. By naming them consistently, we make our application more maintainable and easier to develop. We're going to use these types of names:



    linktext- Text of a link 
    buttontext - Text on a command button 
    fieldlabel - Field labels 
    introtext - Instructional text 



What I'm doing is prefixing (or "putting before") the actual name with how its used.

NOTE: Naming is one of those skills that the more you program, the better you get at it. Personally I tend to long or "verbose" names. This is a preference but the more descriptive you are, the easier it can be to fix later on.

If you think about your website, there are probably lots of different pages but only a few different "types" of pages. For example you might have:

Add Pages -- submit an article
Comment Pages -- comment on an article
View Pages -- view an article
The reason the type of page is important is that your strings have what software people call a "context". This just means "where the string is used". We want to store both the string name or what we'll call the "key" in our database as well as the context. We will, of course, also need the language as well as the string itself.



This little exercise has led us to the structure of the database table that will hold our strings. Here is the SQL code for it:




    create table faq_strings ( 
      string_id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, 
      language CHAR(2) NOT NULL,
      stringkey CHAR(25) NOT NULL,
      context char(10),
      string CHAR(254) NOT NULL 
    )




If you already know SQL then this makes sense to you already and you can feel free to move on to the next section. If it doesn't then this explanation will help.

A relational database like MySQL can be simplified if you think about it as "Many spreadsheets, each with a different number of columns". Its a simple yet powerful idea. A "spreadsheet" is called a table. An entry in the database, like a spreadsheet, is called a row. Columns are, like spreadsheets, just columns. Every entry in the database is stored in a column (and usually only one column although that's a different article). SQL or "Structured Query Language" is a tool for both querying the database, manipulating the database, creating tables and more. Here's the above SQL line by line (with each line annotated with PHP style comments):





    // create the table.  this string table was built 
    //for an FAQ application so its named faq_strings
    create table faq_strings ( 
    //assign an "id" or unique # to each string
    //not essential but good practice
    //This says: An integer, that isn't null, its unique 
    // or "primary key" and automatically is increased by
    // the database as new entries are added
      string_id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, 
    //the language for the string, its 2 characters long
    // and is required
      language CHAR(2) NOT NULL,
    //the unique "name" for the string, again required,
    // 25 characters wide
      stringkey CHAR(25) NOT NULL,
    //where the string is used
      context char(10),
    //when the string was added to the db.  
    // very useful if you let your users help with the
    // translation -- you can sort the db by new entries
      date DATETIME(),
    //The string itself
      string CHAR(254) NOT NULL 
    )



Since I don't know how your copy of MySQL is controlled, I can't help you create the table. Use your standard MySQL tools for this and DO NOT use the version with the comments -- it won't work. Use the version above.



Step 2: Add Some Strings to the Database
To test this approach to localization, we'll need at least a few strings in our database. Here are the SQL insert commands for that:



    insert into faq_strings (language, context, stringkey,string) VALUES
    ('EN','addpage', 'fieldlabel-question', 'Question: ') 

    insert into faq_strings (language, context, stringkey,string) VALUES
    ('EN','addpage', 'buttontext-save', 'Save !')




Step 3: Replace the Strings in our Web Pages with Code
Now that we've created where we're going to store our strings and added some sample strings, we can insert the PHP code for them into our web pages. For example, we're going to pretend that we have an HTML table with 2 columns for adding a database entry. Here it is before we add the php:




    1. fieldlabel-question 
    2. buttontext-save 



This makes our new web page with PHP code inserted the following:




<TABLE>
<TR>
<TD><? print $strings['fieldlabel-question']; ?>
</TD>
<TD><INPUT TYPE="text" name="question" width="25">
</TD>
</TR>
<INPUT TYPE="submit" VALUE="<? print $strings['buttontext-save']; ?>">
</TABLE>




So, what's this do? The

<?php
and 
?>
tell PHP to treat what's between them as code to execute. The print statements output the value that we tell it, but what exactly are we telling it? A very basic programming tool is called an array. An array is a way of storing data that you want to use in a program. There are many different types of arrays but the one that we are using is called a "associative array". With an associative array, a value such as "Question: " is paired with a "key" or unique identifier to access that value. The [ and ] tell php that for the $strings array which key to use (and that its an array).



At this point, I bet you're thinking "He thinks he's so smart, he forgot about the string context". Actually I didn't. Since we're only handling a page at a time, we only have to have an array that contains all strings for that context -- not every single string.

Step 4: Get the User's Language
If we're going to display a localized interface, we have to know what language to use. Somehow, we have to prompt the user for this. What I'd do is probably have a popup window when the user gets to my site for the first time asking them what their preferred language is and then store that in a cookie. Here's the (approximate) code:


<?php
//  insert in home page 
if($ck_language =='') { 
    
//  display some kind of page where the language is capture
    //  its a simple html form so its not shown here. The 
    //  input object should be named "language" and contain the
    //  2 character internet language code
    //  At the end of that form, call the setcookie routine like this:

    
setcookie("ck_language"$language);

    
//  And the cookie will be set!
}
else { 
    
// display your home page, calling the localization routines
    //   of course!
}

?>


Step 5: Insert the Translated Strings or "A Stupid Localization Trick with PHP":
Here's where we bring it all together. These are the steps:



    Set the language and context variables 
    Load the strings from the database 
    Insert them into the page 
    Here's the code: 





<?php

//load our include file which sets our 4 required
//db variables: dbhost, dbuser, dbpassword, db
include "zcommon.php";


// Connecting, selecting database
$link mysql_connect("$dbhost""$dbuser""$dbpassword")
    or die(
        
"System level error:
        
        Could not connect to database: 
        at load strings routine: email $sysadminemail"
        
);

mysql_select_db("$db")
or die(
    
"System level error:
    Could not select database:
    at load strings routine: email $sysadminemail"
);

// Set our variables
$language $ck_language;
$context="addpage";

$query "
    SELECT
        stringkey, string 
    FROM 
        faq_strings 
    WHERE 
        context='$context' and 
        language='$language'
    "
;

$result mysql_query($query
    or die (
        
"System level error:
        Could not connect to database:
        at load strings routine: email $sysadminemail"
    
);

// IMPORTANT!!!
// HERE'S THE MAGIC LITTLE BIT: 
while ($row mysql_fetch_array($result))
{
  
$stringkey $row["stringkey"];
  
$string $row["string"];
  
$strings["$stringkey"] = $string
}


?>




If you've done any database programming in PHP at all, the above code made sense. It's actually pretty simple. First we include some variables from a common file with include. Then we do our basic setup and connect to our database. Next we define two variables that we'll use in our SQL query. One of these we pulled from our cookie and the other we set on a page by page basis. Another way to handle this is to define an additional array in the include file which maps page names to contexts (that's a bit advanced but makes it easier). After this is the database query which loads only the correct strings. The real magic starts after this.

A while loop is a programming construct which executes until a condition is satisfied. In this case, it will loop as long as there is data that came back from the database. What happens within the loop is the following:



    1. Create a variable named stringkey that holds the unique key for the string 
    2. Create a variable named string that has the text for the string 
    3. Add to an array named strings the "name-value" pair for the string. 
    A name value pair is just a simple programming idea that a name 
    can represent a value so they are a "name-value" pair. 
    After the while loop finishes, the contents of the array can 
    be inserted into your page with the print statement illustrated above. 



That's it! You now have localization.

What About Formatting and Editing?
One of the remaining questions that you probably have is how to handle the formatting, things like bold facing field labels and such. The real question is if that should go into the database or not. My opinion is not. I'd isolate formatting from the content so that you can upgrade the look and feel of your website without having to muck with the text strings. Once you have strings for more than one language, you'll see the value of this. Clearly, adding and editing strings for more than one language requires some kind of tool. If there's interest, I can go into how to build such a tool and even how to allow your user's to translate for you! Email sjohnson@fuzzygroup.com if you're interested.

Performance
This approach could be criticized for introducing a database query for every page view. That's correct -- it does. There are certainly other ways to handle localization (such as pre-building pages). However, these other techniques can still use this approach. What you've learned is a good general technique that can be easily modified as site scales up. Given that every site starts with little to no traffic, I prefer fast development and then addressing performance as needed.

Conclusion and Disclaimer
If all you want to do for your localization is handle the user interface, and that's always the first step, you've just learned a valuable approach. I should end by stating that localization is a pretty complex topic. I've grossly oversimplified it in places but this is the core of what you need to know.

About the Author
Scott Johnson is a high tech veteran having founded his first software company at 19 (in 1987, long before the "dot com"). That company, NTERGAID, made and shipped hypertext tools before the web was even conceived. After running NTERGAID successfully, Scott sold the company to Dataware where Scott led Enterprise Knowledge Management products. After Dataware, Scott moved to Mascot Network in charge of Product Management, Product Marketing. He's now available for consulting work.
Tentatively planning to Open Soon! (no dates ...) // Doing heavy development now...