Yesterday I quickly draw a rough 'frontier’line for the Russian Federation. Then there were 44.666 0.25 sized bboxes to be retrieved. After every call and investigation of the result the pogram pauses for a second. That alone counts for more then twelve hours. This morning about one third was done.
Result. 7131 places already with english translation. 14787 to be translated/transliterated. (Two/third has to be done yet.)
I now save the node id in the to do list. The next run I can use it for direct retrieve.
I’ve consulted the irc channel on this issue, they told me that most renderers internally make the same conversion that you want to do for Mkgmap, but without uploading the results to the database. So the recommended solution is to fix this in Mkgmap instead of in the OSM database.
From how I read that thread: it’s difficult to determine automatically which transliteration table you should use.
So ultimately I wish we could automatically transliterate from Japanese, Indian, Malayan, Greek, Russian, Chinese, etc to English so a readable map for the whole world would be possible… Dunno if this is possible at all, my searches all lead to transliteration from a specified language to English instead of language detection…
I found this page which seems to do a pretty solid job and is using Java, although I cannot find any source code. Perhaps a good starting point for Mkgmap? Or this python solution.
Google translate does a fair job. I entered “น้ำไหลลงเขา” and told it to detect the language and translate to English. It came up with “Water flows downhill.” Google detected Thai and translated correctly.
My program to add a missing name:en or name:trans:en or whatever tag works. As a test I let it handle less then ten nodes in a changeset. All worked ok.
Now to be sure that only places which are in a selected country are handled I need a borderline around the area.
Hmmm… that can be found in OSM data. So I wrote a module that collects the ways representing the border. Starting with a relation id recursively retrieving other relations, ways and nodes. Then build a .gpx file. All took quite a while on 60189 (Russian Federation). So I tested on smaller countries like 102879 Austria, 161033 Mongolia.
For Austria got a lot of ways. And they ‘lay on the frontier’. Ok so far. But the ways are in random order. The next way does not start where the former stopped. To make things worse the direction of the ways is not consistent. Most are from east to west (for the frontierline Germany/Austria). I need a closed curve of a country to determin if a “lat,lon” is inside that country. So work is now on sorting the ways and revert them if needed.
Until now I had no look at mkgmap because all this takes time. It is nice to see that others react. I will study all links later.
There is another solution if mkgmap cannot handle the transliteration. Mkgmap works on raw osm data I read. Those are the data files in xml format. My program (well another version) could add the missing tags to those xml files.
I posted a question about this on the Mkgmap mailinglist but did not get any response so far.
So it indeed seems like there are two options: upload the transliterated tags with name:trans:en or preprocess the data each time a new update is performed. But I’m afraid that preprocessing the entire planet file on each update will take a very long time.
Besides that, you are only transliterating cyrillic languages, there are so many more languages that need transliteration. Maybe this needs to be discussed on the main OpenStreetMap mailinglist…
I just noticed the date changed to 14/10/09 so I tried a download again. All the roads I added are still not showing. Maybe I added the tags after 14/10/09.
The easiest way to find out is to see if the changes are visible in Potlatch…if they are, then there’s a problem on Lambertus’ end. If they’re not, then the problem is at Skywoolf/reinholdM’s end.
Yes, the first time report of missing ways could be the result of bad timing, but those ways should be in the map definately by now. So if they aren’t then there’s a problem somewhere…
It may be that you got Steve wrong. I think he means that you can do a transliteration from e.g. Cyrillic to Latin without knowing the language, but to do a thorough job, you would transliterate a bit different depending on whether you got e.g. Russian or Bulgarian (both languages using Cyrillic script). The links you provided don’t seem to try to figure out the language, they would just detect if something is Cyrillic.
I think it would be a start to complete mkgmap’s transliteration tables with the perl script Avar provided and see if the resulting maps work for people (even if not perfect). I guess an awkward transliteration is still way way better that just seeing question marks…
Pretty much nothing that I’ve done since the beginning of September shows up in the map that I downloaded from Lambertus’ site today. It’s mostly in these areas: