OpenStreetMap Forum

The Free Wiki World Map

You are not logged in.

#1 2009-11-27 12:59:41

greencaps
Member
Registered: 2009-10-05
Posts: 423

Frontend transliterator: translit. A battle against the ??????'s

This summer I drove through russia with osm maps in my Garmin (60Cx) which showed many ??????'s. It was not nice. The maps were still very usable. Home again I decided to contribute to osm in such a way that next year the maps will contain transliterated placenames and roadnames.  This not only my problem. There are several posts here were the ????'s are complained. My discussion started here: http://forum.openstreetmap.org/viewtopi … =2625&p=13

A summary: The url http://api.openstreetmap.org/api/0.6/node/245888277 delivers following data.

<node id="245888277" lat="57.400239" lon="83.9420288" version="3" changeset="1541349" user="KekcuHa" uid="30590" visible="true" timestamp="2009-06-17T04:24:33Z">
    <tag k="name" v="Жуково"/>
    <tag k="place" v="village"/>
    <tag k="cladr:name" v="Жуково"/>
    <tag k="created_by" v="Potlatch 0.7"/>
    <tag k="addr:postcode" v="636306"/>
    <tag k="addr:district" v="Кривошеинский район"/>
    <tag k="cladr:code" v="7001000000800"/>
    <tag k="addr:country" v="RU"/>
    <tag k="addr:region" v="Томская область"/>
    <tag k="is_in" v="Кривошеинский район Томской области"/>
    <tag k="name:trans" v="{Жуково}"/>
    <tag k="cladr:suffix" v="Село"/>
  </node>

If this -cyrillic- data is presented to mkgmap with a name tag list like --name-tag-list=name:en,int_name,name then the result would be ??????'s in the 60Cx.  For mkgmap you have to specify also a codepage I have read here... But even a cyrillic codepage would not be the solution as on that garmin you would see all kind of funny symbols then.

So a solution would be transliteration. (see my first link for what transliteration is).

If mkgmap would get the following data (same node with a name:en tag added) the troubles were over
<node id="245888277" lat="57.400239" lon="83.9420288" version="4" changeset="44444444444" user="greencaps" uid="77777777777 ">
    <tag k="name" v="Жуково"/>
    <tag k="place" v="village"/>
    <tag k="cladr:name" v="Жуково"/>
    <tag k="created_by" v="Potlatch 0.7"/>
    <tag k="addr:postcode" v="636306"/>
    <tag k="addr:district" v="Кривошеинский район"/>
    <tag k="cladr:code" v="7001000000800"/>
    <tag k="addr:country" v="RU"/>
    <tag k="addr:region" v="Томская область"/>
    <tag k="is_in" v="Кривошеинский район Томской области"/>
    <tag k="cladr:suffix" v="Село"/>   
    <tag k="name:en" v="Zukovo"/>
  </node>

So we decided to make a program (that now has the name translit) that could be part of the toolchain of Lambertus for weekly generating routable maps of the whole world. Translit would add on the fly such tags if necessary.

First tests have been done by Lambertus. I'm confident that it will work on his system too. Time to tell about translit.

To be continued..

Offline

#2 2009-11-27 13:23:14

Lambertus
Administrator
From: Apeldoorn (NL)
Registered: 2007-03-17
Posts: 3,269
Website

Re: Frontend transliterator: translit. A battle against the ??????'s

Yes, the first update including translit results is being uploaded at this moment.


Mapping tools: Garmin GPSmap 60CSx, Giant Terrago 2002

Offline

#3 2009-11-27 13:50:11

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

Being focussed on different codepages (that was what I had read here) I decided to make different translation tables for different countries. So my collection contains now albania, belarus, bulgarije, cyprus-turkish, czech-republic, estonia, greece, hungary, kaliningrad, kazachstan, kosovo, kyrgyzstan, latvia, lithuania, macedonia, moldavie, mongolia, poland, romania, russia, serbia, slowakia, tajikstan, transnistria, turkey, turkmenistan, ukraina, uzbekistan. Work on thailand and china is in progress.

Now the problem would be to take the right transliteration table at the right moment: that is depending on a given lat,lon find out in which country is it and then take its table.

It happened that I just had programmed an algoritm to find out if a given lat,lon would lie in an area if you had for example a .gpx file with a track along the border of that area. So there were no big problems implementing all. Now it had to be done. I needed .gpx files for a lot of countries. As I did not know where to get them (I know now that geofabriek has them for a lot of countries. But I know now also that they are not always precise) the solution was to make a boundary2track program first that given the id for a boundary relation would download all data for that relation making a gpx file out of the border/boundary.

Gpx files are in this way made for above mentionend areas/countries. The transliteration tables were made in the meantime. Character for character by hand. For every country two files. For instance:
russia.frontier.gpx
russia.tansliterationtable.txt

All such files are placed in an Areas directoy. At startup the transliteration manager module of translit looks in that directory and creates areatransliterator instances for every pair of files.

Now that translit reads the osm data and sees a <node> (at the moment only nodes are transliterated. Not <ways>) it extracts the lat,lon values and asks the transliterationmanager in which areas it is. If it is not in any area translit is ready with that node and will output it unchanged. (It will also not inspect nodes which consist of only one line). Otherwise it will then look if there is a place tag and a name tag and not already an int_name or name:en tag. If a transliteration is needed it invokes the right areatransliterator. Depending on the result a tag will be added and the changed node written to the output.

My fear was first that adding more countrys (by means of adding their respective files to the Areas directory) would influence the processing time. But if it does it's very minor. Four or twentyfive countries: it does not matter.

Last edited by greencaps (2009-11-27 13:55:34)

Offline

#4 2009-11-29 12:08:38

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

Now that <node>'s would be transliterated the next step were the <way>'s. I copied the code for my nodehandler changed "<node"to "<way"and "place"to "highway"and let it run. Well that did not work out. I had forgotton that ways do not contain a lat,lon.

  <way id="27950733" visible="true" timestamp="2008-10-24T13:48:37Z" version="1" changeset="555361" user="lordsa" uid="61290">
    <nd ref="306993060"/>
    <nd ref="306993062"/>
    <nd ref="306993065"/>
    <nd ref="306993067"/>
    <nd ref="306993069"/>
    <nd ref="306993071"/>
    <nd ref="306993074"/>
    <tag k="name" v="Bažnyčios g."/>
    <tag k="highway" v="residential"/>
  </way>

The <nd ref="Id" refer to nodes. Did I have to inspect these nodes? Now translit is offered xml osm data. And that contains first all the nodes and then the ways. So upon inspecting a way the info for the related nodes wis already passed.

At this time I had already my doubts if my approch of different transliterationtables for different countries was the way to go. What I had seen meanwhile while making transliterationtables for russia, romania, greece and even thailand and china that a UTF-8 character used for the cyrillic alfabet would not be used for greek or romanian or for any other.

I did not know much about charactersets but in the old sets where every character is represented by -the value of -one byte (8 bits) you need a characterset as there are only 256 values possible with a byte. So the value 198 is in cyrillic a different character as in ours.

But UTF-8 takes one to six bytes to represent a character. Our characters can be represented by one byte. I found that for cyrillic characters always two bytes are taken (found only one exception were three were needed). Greek takes two bytes too. Thai takes three and the kind of chinese (What kind is that? Could someone tell me the name?) that is used in osm takes three too.

So UTF-8 is a characterset in itself. If it is UTF-8 you are ready. (Do not laugh if you already knew: I had to find out the hard way. http://www.ietf.org/rfc/rfc2279.txt is my friend.).

Only the minor problem that the Garmin does not know UTF-8 forces to do something. And the way is not to make for instance one byte cyrillic character of two utf8 characters because the Garmin will not handle that too. The way is also not to do it in two steps: make a one byte cyrillic character of the two and then replace that with a transliteration. No you can do away with all -old- codepages. Just make one transliterationtable straight from UTF-8 to garminusable characters.

This idea could nicely be applied to the transliteration of ways. First I combined all the transliterationtables I had at that moment to one world.transliterationtable.txt. 

to be continued..

Offline

#5 2009-11-29 12:30:45

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

At the moment when all tables are combined (except for the thai and chineese ones) a world transliterationtable is constructed with 339 entrys.

It was time to try it on the <way>'s. To my joy all went like I thought it would. Program translit working on osm data that contained parts of russia, ukraina, lithuania, romania and greece transliterated all as if it had separate tables for every country.

For instance the above shown way in lithuania (http://api.openstreetmap.org/api/0.6/way/27950733) would leave it as:

  <way id="27950733" visible="true" timestamp="2008-10-24T13:48:37Z" version="1" changeset="555361" user="lordsa" uid="61290">
    <nd ref="306993060"/>
    <nd ref="306993062"/>
    <nd ref="306993065"/>
    <nd ref="306993067"/>
    <nd ref="306993069"/>
    <nd ref="306993071"/>
    <nd ref="306993074"/>
    <tag k="name" v="Bažnyčios g."/>
    <tag k="highway" v="residential"/>
    <tag k="name:engels" v="Baznycios g."/>
  </way>


When I saw this I realised that the algorithm used for the nodes to determine a transliteration table by means of lat,lon's laying in country borders was superfluous.

Offline

#6 2009-11-29 12:49:00

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

One size fits all

A nice demonstration of the potential of one transliteration table is this way on the border of Russia and China:

http://api.openstreetmap.org/api/0.6/way/39159352
http://www.openstreetmap.org/browse/way/39159352

If you click the links you will see that your browser has no difficulties displaying names which for the first half consist of cyrillic characters and for the second halfe of chinese. This is because it's utf-8.

<osm version="0.6" generator="OpenStreetMap server">
<way id="39159352" visible="true" timestamp="2009-08-16T08:33:43Z" version="5" changeset="2160956" user="katpatuka" uid="17497">
<nd ref="469076948"/>
...
<nd ref="469077186"/>
<tag k="boat" v="yes"/>
<tag k="int_name" v="Ussuri"/>
<tag k="name" v="Уссури / 乌苏里江"/>
<tag k="name:en" v="Ussuri River"/>
<tag k="name:ru" v="Уссури"/>
<tag k="name:zh" v="乌苏里江"/>
<tag k="waterway" v="river"/>
</way>
</osm>

  <way id="39159352" visible="true" timestamp="2009-08-16T08:33:43Z" version="5" changeset="2160956" user="katpatuka" uid="17497">
    <nd ref="469076948"/>
...
    <nd ref="469077186"/>
    <tag k="boat" v="yes"/>
    <tag k="int_name" v="Ussuri"/>
    <tag k="name" v="Уссури / 乌苏里江"/>
    <tag k="name:en" v="Ussuri River"/>
    <tag k="name:ru" v="Уссури"/>
    <tag k="name:zh" v="乌苏里江"/>
    <tag k="waterway" v="river"/>
  </way>

Above you see twice the same way. For the first the text is copy/pasted from a browser. For the second one from tekst in wordpad. (Copy/Pasting/Displaying utf-8 in different programs is a story on its own...).

Translit does not mind the combination of cyrillic and chinese and transliterates it all and adds the missing tag:
<way id="39159352" visible="true" timestamp="2009-08-16T08:33:43Z" version="5" changeset="2160956" user="katpatuka" uid="17497">
    <nd ref="469076948"/>
...
    <nd ref="469077186"/>
    <tag k="boat" v="yes"/>
    <tag k="int_name" v="Ussuri"/>
    <tag k="name" v="Уссури / 乌苏里江"/>
    <tag k="name:en" v="Ussuri River"/>
    <tag k="name:ru" v="Уссури"/>
    <tag k="name:zh" v="乌苏里江"/>
    <tag k="waterway" v="river"/>
    <tag k="name:engels" v="Ussuri / WuSulijiang"/>
  </way>

Edit:: well in this case adding a tag was not needed as there is already a name:en. But I found it too beautifull to not tell...

Last edited by greencaps (2009-11-29 13:20:35)

Offline

#7 2009-11-29 13:15:24

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

http://api.openstreetmap.org/api/0.6/way/10930885
This way from Greece treated by translit:

  <way id="10930885" visible="true" timestamp="2008-08-21T16:58:46Z" version="7" changeset="359706" user="twincam" uid="57830">
    <nd ref="97468697"/>
    <nd ref="97468722"/>
    <nd ref="97468724"/>
    <nd ref="97468758"/>
    <tag k="created_by" v="Potlatch 0.10b"/>
    <tag k="highway" v="residential"/>
    <tag k="name" v="Αφροδίτης"/>
    <tag k="name:engels" v="Afrodities"/> 
  </way>

Offline

#8 2009-12-01 13:49:27

chris66
Member
From: Germany
Registered: 2009-05-24
Posts: 9,584

Re: Frontend transliterator: translit. A battle against the ??????'s

Hi Greencaps,

Is your program available for download ?

Chris


Mapper aus dem Münsterland/NRW. Nicht auf fakebook.

Offline

#9 2009-12-01 18:26:01

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

No. Not yet. As you can read the implementation changes and changes. It is in an early state of development.

It is now tested by Lambertus. The first results are not visible yet (I mean on http://garmin.na1400.info/routable.php ) . I have to spend more time on the transliterationtable(s). I first want  to see that it runs at Lambertus like I want it to run.

After that we will see.

Are you interested in a special country/language?

Last edited by greencaps (2009-12-01 18:26:59)

Offline

#10 2009-12-01 22:25:52

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

Ukraina 63240172.img

Part of 63240172.img (Ukraina) displayed by GPSMapEdit. Hope that after the update of next weekend the questionmarks are gone.

Offline

#11 2009-12-02 10:04:28

Lambertus
Administrator
From: Apeldoorn (NL)
Registered: 2007-03-17
Posts: 3,269
Website

Re: Frontend transliterator: translit. A battle against the ??????'s

It's my fault that the transliterated names aren't showing up yet. I simply forgot to add the 'name:engels' to the list of tags used for displaying the name. This is fixed now, but I'm running into a bug (nothing related to translit) that let's Mkgmap crash on a lot of tiles. This has to be fixed before I'm running a new update (also, a new planet will be available tomorrow which I want to use for the next update).

I am sure the transliteration will fine in general because adding the Chinese name:zh_py worked fine as well.


Mapping tools: Garmin GPSmap 60CSx, Giant Terrago 2002

Offline

#12 2009-12-02 11:24:33

chris66
Member
From: Germany
Registered: 2009-05-24
Posts: 9,584

Re: Frontend transliterator: translit. A battle against the ??????'s

greencaps wrote:

Are you interested in a special country/language?

Europe. smile

So, is there some europe country missing ?

Chris


Mapper aus dem Münsterland/NRW. Nicht auf fakebook.

Offline

#13 2009-12-02 11:54:04

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

Well you have seen the list. Only eastern Europe.

Even the ß will not be transliyterated yet.


@Lambertus: the forum clock is one hour offtime. I post at 11:50.


Edit: In my profile checking "Daylight savings is in effect (advance times by 1 hour)." did it.

Last edited by greencaps (2009-12-05 18:38:30)

Offline

#14 2009-12-02 12:56:37

liosha
Member
From: Moscow
Registered: 2008-03-04
Posts: 8,447
Website

Re: Frontend transliterator: translit. A battle against the ??????'s

I use this perl module for transliteration: http://search.cpan.org/~sburke/Text-Uni … idecode.pm
Maybe it's tables will be useful  smile

Offline

#15 2009-12-02 14:16:31

chris66
Member
From: Germany
Registered: 2009-05-24
Posts: 9,584

Re: Frontend transliterator: translit. A battle against the ??????'s

greencaps wrote:

Even the ß will not be transliyterated yet.

But the german "ß" ist part of latin1 charset and don't needs to be transcripted to "ss".


Mapper aus dem Münsterland/NRW. Nicht auf fakebook.

Offline

#16 2009-12-02 14:58:17

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

chris66 wrote:

But the german "ß" ist part of latin1 charset and don't needs to be transcripted to "ss".

It does because what counts is if a garmin device can display it.

I think a GPSmap 60Cx cannot. Well it is difficult to find out. ß in streetnames in osm are on Lambertus' site ss. In City Navigator its only ss. That will have a reason I think.

If you have/know a small .img file with ß's please give me a link. I'm eager to try it out.

Offline

#17 2009-12-02 15:02:52

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

liosha wrote:

I use this perl module for transliteration: http://search.cpan.org/~sburke/Text-Uni … idecode.pm

Thank you. I see that all has been done before.

I will spit it through but at first glance it looks to be a transliteration from two byte unicode (See the Bei Jing example on that page). But osm comes with utf8 (1 to 6 bytes). You do a conversion first from utf8 to unicode-2 before using this function?

Offline

#18 2009-12-02 15:14:13

liosha
Member
From: Moscow
Registered: 2008-03-04
Posts: 8,447
Website

Re: Frontend transliterator: translit. A battle against the ??????'s

It converts from perl's internal unicode representation.
So the code is something like this:

use Encode;
use Text::Unidecode;
.....
$transliterated_string = unidecode( decode( 'utf8', $utf8_string ) );

Offline

#19 2009-12-03 09:24:31

chris66
Member
From: Germany
Registered: 2009-05-24
Posts: 9,584

Re: Frontend transliterator: translit. A battle against the ??????'s

greencaps wrote:
chris66 wrote:

But the german "ß" ist part of latin1 charset and don't needs to be transcripted to "ss".

It does because what counts is if a garmin device can display it.

I think a GPSmap 60Cx cannot. Well it is difficult to find out. ß in streetnames in osm are on Lambertus' site ss. In City Navigator its only ss. That will have a reason I think.

If you have/know a small .img file with ß's please give me a link. I'm eager to try it out.

Speeking for my Legend  HCX:

In general the device is able to diplay all(?) latin1 characters:

Etrex1.png

But: When compiling the map, mkgmap changes all street names to uppercase
(unless you use the --lower-case option). But there is no upper case
for the "ß", so it is converted to "SS".

The Garmin device convertes back to lower case in the tooltips and in other fields.

If the --lower-case option is used, the street names are displayed
as A.........  in the map (only first letter is shown).

Chris


Mapper aus dem Münsterland/NRW. Nicht auf fakebook.

Offline

#20 2009-12-03 10:14:37

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

Chris you are talking about what mkgmap can/does. But I want to know something about the Garmin. I asked if you had/knew an .img file which contained a ß. Does not matter who put in in.

But your picture shows something very nice. Just above the Süntelstasse hint: ÄËäÜß.

Isn't that a ß at the end? Did you type it in for a waypoint?

Offline

#21 2009-12-03 12:00:45

chris66
Member
From: Germany
Registered: 2009-05-24
Posts: 9,584

Re: Frontend transliterator: translit. A battle against the ??????'s

greencaps wrote:

I asked if you had/knew an .img file which contained a ß. Does not matter who put in in.
But your picture shows something very nice. Just above the Süntelstasse hint: ÄËäÜß.
Isn't that a ß at the end? Did you type it in for a waypoint?

Yes, that it a ß entered in a waypoint name.

here a gmapsupp.img generated with --lower-case, so you have a lot of ...straße

http://www.megaupload.com/?d=J0LT749R

Chris


Mapper aus dem Münsterland/NRW. Nicht auf fakebook.

Offline

#22 2009-12-03 13:33:12

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

Thank you.

That is a very small map. I could hardly find the bbox on my device.

This is how it shows in the 60Cx.
ringel s

Well isn't this a strange device?
-It can show a ß and it cannot.
-It can show lowercase and it can not.

What happened at Garmin to make this possible?

So I know now that translit should make an ss of it too.

Offline

#23 2009-12-03 15:07:00

chris66
Member
From: Germany
Registered: 2009-05-24
Posts: 9,584

Re: Frontend transliterator: translit. A battle against the ??????'s

greencaps wrote:

Well isn't this a strange device?
-It can show a ß and it cannot.
-It can show lowercase and it can not.

It can not show rotated lower case characters.

But the question remains, why the device is not able to convert them to upper case. smile

Chris


Mapper aus dem Münsterland/NRW. Nicht auf fakebook.

Offline

#24 2009-12-03 15:21:48

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

chris66 wrote:

It can not show rotated lower case characters.

Ahhh now I see. Rotation is the problem.

Thank you for pointing out.

Offline

#25 2009-12-03 18:03:46

greencaps
Member
Registered: 2009-10-05
Posts: 423

Re: Frontend transliterator: translit. A battle against the ??????'s

Now isn't this nice:

transliterated

First results from translit published on http://garmin.na1400.info/routable.php

No questionmarks!

I have to add here that this will only be true for place names. The next step will be to do the road/street names too (highway=...).

At the moment there are translit results for belarus, bulgaria, kaliningrad, romania, russia and ukraina.

Please comment and express your wishes.

Offline

Board footer

Powered by FluxBB