Being focussed on different codepages (that was what I had read here) I decided to make different translation tables for different countries. So my collection contains now albania, belarus, bulgarije, cyprus-turkish, czech-republic, estonia, greece, hungary, kaliningrad, kazachstan, kosovo, kyrgyzstan, latvia, lithuania, macedonia, moldavie, mongolia, poland, romania, russia, serbia, slowakia, tajikstan, transnistria, turkey, turkmenistan, ukraina, uzbekistan. Work on thailand and china is in progress.
Now the problem would be to take the right transliteration table at the right moment: that is depending on a given lat,lon find out in which country is it and then take its table.
It happened that I just had programmed an algoritm to find out if a given lat,lon would lie in an area if you had for example a .gpx file with a track along the border of that area. So there were no big problems implementing all. Now it had to be done. I needed .gpx files for a lot of countries. As I did not know where to get them (I know now that geofabriek has them for a lot of countries. But I know now also that they are not always precise) the solution was to make a boundary2track program first that given the id for a boundary relation would download all data for that relation making a gpx file out of the border/boundary.
Gpx files are in this way made for above mentionend areas/countries. The transliteration tables were made in the meantime. Character for character by hand. For every country two files. For instance:
russia.frontier.gpx
russia.tansliterationtable.txt
All such files are placed in an Areas directoy. At startup the transliteration manager module of translit looks in that directory and creates areatransliterator instances for every pair of files.
Now that translit reads the osm data and sees a (at the moment only nodes are transliterated. Not ) it extracts the lat,lon values and asks the transliterationmanager in which areas it is. If it is not in any area translit is ready with that node and will output it unchanged. (It will also not inspect nodes which consist of only one line). Otherwise it will then look if there is a place tag and a name tag and not already an int_name or name:en tag. If a transliteration is needed it invokes the right areatransliterator. Depending on the result a tag will be added and the changed node written to the output.
My fear was first that adding more countrys (by means of adding their respective files to the Areas directory) would influence the processing time. But if it does it’s very minor. Four or twentyfive countries: it does not matter.