Worldwide routable Garmin maps: URL REMOVED

ColinMarquardt · October 19, 2009, 10:25pm

mkgmap is kinda prepared to do transliteration for you. See this thread:
http://www.mkgmap.org.uk/pipermail/mkgmap-dev/2009q3/003808.html

Cheers
Colin

Lambertus · October 20, 2009, 8:33am

From how I read that thread: it’s difficult to determine automatically which transliteration table you should use.

So ultimately I wish we could automatically transliterate from Japanese, Indian, Malayan, Greek, Russian, Chinese, etc to English so a readable map for the whole world would be possible… Dunno if this is possible at all, my searches all lead to transliteration from a specified language to English instead of language detection…

I found this page which seems to do a pretty solid job and is using Java, although I cannot find any source code. Perhaps a good starting point for Mkgmap? Or this python solution.

Buadhai · October 20, 2009, 9:02am

Google translate does a fair job. I entered “น้ำไหลลงเขา” and told it to detect the language and translate to English. It came up with “Water flows downhill.” Google detected Thai and translated correctly.

So, the technology exists.

IrlJidel · October 21, 2009, 7:52pm

I’ve also noticed the last 3-4 updates have not reflected any of my changes in Ireland.

For example:
http://osm.org/go/es96wQgVz-

I added a number of bus stops and service roads in car park on Sept 19th.
http://www.openstreetmap.org/browse/changeset/2536911

Are Ireland tiles 63240087.img 63240088.img generating using old data somehow?

Thanks

greencaps · October 21, 2009, 10:08pm

My program to add a missing name:en or name:trans:en or whatever tag works. As a test I let it handle less then ten nodes in a changeset. All worked ok.

Now to be sure that only places which are in a selected country are handled I need a borderline around the area.

Hmmm… that can be found in OSM data. So I wrote a module that collects the ways representing the border. Starting with a relation id recursively retrieving other relations, ways and nodes. Then build a .gpx file. All took quite a while on 60189 (Russian Federation). So I tested on smaller countries like 102879 Austria, 161033 Mongolia.

For Austria got a lot of ways. And they ‘lay on the frontier’. Ok so far. But the ways are in random order. The next way does not start where the former stopped. To make things worse the direction of the ways is not consistent. Most are from east to west (for the frontierline Germany/Austria). I need a closed curve of a country to determin if a “lat,lon” is inside that country. So work is now on sorting the ways and revert them if needed.

Until now I had no look at mkgmap because all this takes time. It is nice to see that others react. I will study all links later.

There is another solution if mkgmap cannot handle the transliteration. Mkgmap works on raw osm data I read. Those are the data files in xml format. My program (well another version) could add the missing tags to those xml files.

Skywoolf · October 21, 2009, 11:31pm

I checked after the date changed to 7/10/2009 also but all the roads I added in the Philippines were not showing. Maybe next time.

Lambertus · October 22, 2009, 7:34am

No, all tiles are rendered using the full planet dump from the date that you see in the website below the map.

You can find country polygons on the Cloudmade website.

I posted a question about this on the Mkgmap mailinglist but did not get any response so far.

So it indeed seems like there are two options: upload the transliterated tags with name:trans:en or preprocess the data each time a new update is performed. But I’m afraid that preprocessing the entire planet file on each update will take a very long time.

Besides that, you are only transliterating cyrillic languages, there are so many more languages that need transliteration. Maybe this needs to be discussed on the main OpenStreetMap mailinglist…

Skywoolf · October 22, 2009, 8:22am

I just noticed the date changed to 14/10/09 so I tried a download again. All the roads I added are still not showing. Maybe I added the tags after 14/10/09.

reinholdM · October 22, 2009, 9:59am

I checked also the 14/10/09 map update for my area and I was wondering even changes from end of September does not show up.

Lambertus · October 22, 2009, 10:23am

Hmm, a pattern starts to emerge here. Maybe it’s a bug in my build chain… I’ll pay a closer attention to file versions etc in this weekends update.

csdf · October 22, 2009, 12:31pm

The easiest way to find out is to see if the changes are visible in Potlatch…if they are, then there’s a problem on Lambertus’ end. If they’re not, then the problem is at Skywoolf/reinholdM’s end.

Skywoolf · October 22, 2009, 12:39pm

Yes the changes are there in Potlatch and in JOSM but not in the downloaded map.

Lambertus · October 22, 2009, 1:41pm

Yes, the first time report of missing ways could be the result of bad timing, but those ways should be in the map definately by now. So if they aren’t then there’s a problem somewhere…

reinholdM · October 22, 2009, 2:22pm

Also the changes I talk about are visible for quite some time in potlatch, JOSM, mapnick and so on…

ColinMarquardt · October 22, 2009, 11:44pm

It may be that you got Steve wrong. I think he means that you can do a transliteration from e.g. Cyrillic to Latin without knowing the language, but to do a thorough job, you would transliterate a bit different depending on whether you got e.g. Russian or Bulgarian (both languages using Cyrillic script). The links you provided don’t seem to try to figure out the language, they would just detect if something is Cyrillic.

I think it would be a start to complete mkgmap’s transliteration tables with the perl script Avar provided and see if the resulting maps work for people (even if not perfect). I guess an awkward transliteration is still way way better that just seeing question marks…

Cheers
Colin

Buadhai · October 23, 2009, 3:08am

Pretty much nothing that I’ve done since the beginning of September shows up in the map that I downloaded from Lambertus’ site today. It’s mostly in these areas:

http://www.openstreetmap.org/?lat=14.9776&lon=102.0267&zoom=14&layers=B000FTF

http://www.openstreetmap.org/?lat=14.406&lon=101.8469&zoom=12&layers=B000FTF

This is Nakhon Ratchasima province in northeast Thailand.

Edit: I believe this is the first changeset that does not appear in the current map from Lambertus:

Changeset: 2420956

Created at: Wed, 09 Sep 2009 04:47:12 +0000
Closed at: Wed, 09 Sep 2009 04:47:33 +0000

http://www.openstreetmap.org/browse/changeset/2420956

Lambertus · October 23, 2009, 8:20am

Yes probably, I haven’t eaten much cheese about transliteration (that’s a Dutch proverb )

I hope the wheel is not being reinvented again with this script. My searches show that it requires a lot of knowledge to transliterate all non-roman languages, so we should definitely use the efforts of existing projects to do so.

Btw, the basic transliteration function in Mkgmap that seem to work in ASCII mode is currently being copied to the Latin1 mode as well (which I’m using for my maps): http://www.mkgmap.org.uk/pipermail/mkgmap-dev/2009q4/004912.html

So, still, it looks like going for preprocessing is the quickest path to transliteration. If this doesn’t double or triple the processing time then I’m open to add such a script into my toolchain. But let me be clear: this is not going to be a new project for me, I’m not going to develop such a transliteration script.

greencaps · October 25, 2009, 2:11pm

With amazing I looked at file russian_federation.poly from that site. Googling for a .poly extension was resultless. It looks like this:

russian_federation
1
1.4110078E+002 5.337188E+001
1.4110078E+002 5.344619E+001
1.4132091E+002 5.344619E+001
1.4132091E+002 5.337188E+001
1.4110078E+002 5.337188E+001
END
2
1.4119539E+002 4.621428E+001
1.4119539E+002 4.628667E+001
1.4124376E+002 4.628667E+001
1.4124376E+002 4.621428E+001
1.4119539E+002 4.621428E+001
END

and so on. I had seen dat already some days ago wondering why there were blocks of five and today I added code to my program to show it on the map. Aha 1617 rectangles to see. But using them is not precise as the rectangles overlap with bordering countries. So I use a real frontierline (extracted from osm data). Result for Mongolia: 115 places that have alreay int_name or name:en. 47 places who miss them. Adding a transliteration would be a piece of cake now.

But I ‘discovered’ something else. Well I saw it. For those 115 that had already an int_name or name:en tag the name sometimes contained an international or english name too. 528068123 Gachuurt, 528064318 Terelj.

Now what is the purpose of the name tag in osm data/maps? What should be in the name tag? The name as used in the country? I think so. It is not difficult to detect this automatically and produce a list of node id’s for later treatment.

I think so too. And is not that file 160 GB? But wasn’t it split first? How big are the splits? And aren’t they compressed when offered to mkgmap? Could you give me an indication of those filesizes?

Lambertus · October 25, 2009, 5:46pm

greencaps:

Lambertus:

You can find country polygons on the Cloudmade website.

With amazing I looked at file russian_federation.poly from that site. Googling for a .poly extension was resultless. It looks like this:

russian_federation
1
1.4110078E+002 5.337188E+001
1.4110078E+002 5.344619E+001
1.4132091E+002 5.344619E+001
1.4132091E+002 5.337188E+001
1.4110078E+002 5.337188E+001
END
2
1.4119539E+002 4.621428E+001
1.4119539E+002 4.628667E+001
1.4124376E+002 4.628667E+001
1.4124376E+002 4.621428E+001
1.4119539E+002 4.621428E+001
END

and so on. I had seen dat already some days ago wondering why there were blocks of five and today I added code to my program to show it on the map. Aha 1617 rectangles to see. But using them is not precise as the rectangles overlap with bordering countries. So I use a real frontierline (extracted from osm data). Result for Mongolia: 115 places that have alreay int_name or name:en. 47 places who miss them. Adding a transliteration would be a piece of cake now.

That is indeed amazing. I have no clue why they define a country’s polygon like that. It’s not very useful it seems.

I think the name tag should be the official local name.

The planet file is about 7.3 GB compressed which makes it about 80 GB (or so) uncompressed (but noone should use an uncompressed planet really). I normally split the planet in two sections using Osmosis, but that is because Splitter would need too much memory otherwise. These extracts are then split using Splitter and then rendered with Mkgmap.

What I can envision is that you application loops through the compressed planet once extracting all the nodes and ways and relations with names. Then determine in which country the name is so you know which source language you’d need to transliterate. Then transliterate the name, update the name value and output the changes in a new compressed planet file. That new planet file could then be used for Mkgmap.

BTW. I just saw an Mkgmap commit in which the transliteration code that already existed for the ASCII code-page is also made available for Latin1. I could do a new run after this weeks update to see what the results look like.

greencaps · October 25, 2009, 9:35pm

And what is what Splitter produces? A compressed file? And in what sizes?

I can handle uncompressed files up to 2GB. Or was the limit four GB? I have to check. To handle larger files I have to use 64 bit filepointers which I never did. But I see no problem implementing this as I saw code for it sometimes. But working on compressed files ? I have no idea how to do that. I did not even know that it was possible to work on a piece of a compressed file. I thought that a compressed file first had to be decompressed before use. If not that is fine but for me no clue where to start.

The whole point with mkgmap and converting from one code tabel to the other is that it places a questionmark ‘?’ if it cannot find a match. If it would just place one byte from the two bytes that it had to convert than it would do much better. Or just the byte if the character was one byte. Then you would not have seen me here as then I could transliterate afterwards the .img files downloaded from your side. Well atleast I think so.

So please find the piece of code where mkgmap places the ‘?’. (I did not deep in it as I did not finish with my tagupdater yet).

Today I extracted the borderline of White Russia. Then made a run for placenames. Found 22518 places which needed an int_name (or name:en). There were 36 places which had already a translation. 36!