Worldwide routable Garmin maps: URL REMOVED

Yes the changes are there in Potlatch and in JOSM but not in the downloaded map.

Yes, the first time report of missing ways could be the result of bad timing, but those ways should be in the map definately by now. So if they aren’t then there’s a problem somewhere…

Also the changes I talk about are visible for quite some time in potlatch, JOSM, mapnick and so on…

It may be that you got Steve wrong. I think he means that you can do a transliteration from e.g. Cyrillic to Latin without knowing the language, but to do a thorough job, you would transliterate a bit different depending on whether you got e.g. Russian or Bulgarian (both languages using Cyrillic script). The links you provided don’t seem to try to figure out the language, they would just detect if something is Cyrillic.

I think it would be a start to complete mkgmap’s transliteration tables with the perl script Avar provided and see if the resulting maps work for people (even if not perfect). I guess an awkward transliteration is still way way better that just seeing question marks…

Cheers
Colin

Pretty much nothing that I’ve done since the beginning of September shows up in the map that I downloaded from Lambertus’ site today. It’s mostly in these areas:

http://www.openstreetmap.org/?lat=14.9776&lon=102.0267&zoom=14&layers=B000FTF

http://www.openstreetmap.org/?lat=14.406&lon=101.8469&zoom=12&layers=B000FTF

This is Nakhon Ratchasima province in northeast Thailand.

Edit: I believe this is the first changeset that does not appear in the current map from Lambertus:

Changeset: 2420956

Created at: Wed, 09 Sep 2009 04:47:12 +0000
Closed at: Wed, 09 Sep 2009 04:47:33 +0000

http://www.openstreetmap.org/browse/changeset/2420956

Yes probably, I haven’t eaten much cheese about transliteration (that’s a Dutch proverb :wink: )

I hope the wheel is not being reinvented again with this script. My searches show that it requires a lot of knowledge to transliterate all non-roman languages, so we should definitely use the efforts of existing projects to do so.

Btw, the basic transliteration function in Mkgmap that seem to work in ASCII mode is currently being copied to the Latin1 mode as well (which I’m using for my maps): http://www.mkgmap.org.uk/pipermail/mkgmap-dev/2009q4/004912.html

So, still, it looks like going for preprocessing is the quickest path to transliteration. If this doesn’t double or triple the processing time then I’m open to add such a script into my toolchain. But let me be clear: this is not going to be a new project for me, I’m not going to develop such a transliteration script.

With amazing I looked at file russian_federation.poly from that site. Googling for a .poly extension was resultless. It looks like this:

russian_federation
1
1.4110078E+002 5.337188E+001
1.4110078E+002 5.344619E+001
1.4132091E+002 5.344619E+001
1.4132091E+002 5.337188E+001
1.4110078E+002 5.337188E+001
END
2
1.4119539E+002 4.621428E+001
1.4119539E+002 4.628667E+001
1.4124376E+002 4.628667E+001
1.4124376E+002 4.621428E+001
1.4119539E+002 4.621428E+001
END

and so on. I had seen dat already some days ago wondering why there were blocks of five and today I added code to my program to show it on the map. Aha 1617 rectangles to see. But using them is not precise as the rectangles overlap with bordering countries. So I use a real frontierline (extracted from osm data). Result for Mongolia: 115 places that have alreay int_name or name:en. 47 places who miss them. Adding a transliteration would be a piece of cake now.

But I ‘discovered’ something else. Well I saw it. For those 115 that had already an int_name or name:en tag the name sometimes contained an international or english name too. 528068123 Gachuurt, 528064318 Terelj.

Now what is the purpose of the name tag in osm data/maps? What should be in the name tag? The name as used in the country? I think so. It is not difficult to detect this automatically and produce a list of node id’s for later treatment.

I think so too. And is not that file 160 GB? But wasn’t it split first? How big are the splits? And aren’t they compressed when offered to mkgmap? Could you give me an indication of those filesizes?

That is indeed amazing. I have no clue why they define a country’s polygon like that. It’s not very useful it seems.

I think the name tag should be the official local name.

The planet file is about 7.3 GB compressed which makes it about 80 GB (or so) uncompressed (but noone should use an uncompressed planet really). I normally split the planet in two sections using Osmosis, but that is because Splitter would need too much memory otherwise. These extracts are then split using Splitter and then rendered with Mkgmap.

What I can envision is that you application loops through the compressed planet once extracting all the nodes and ways and relations with names. Then determine in which country the name is so you know which source language you’d need to transliterate. Then transliterate the name, update the name value and output the changes in a new compressed planet file. That new planet file could then be used for Mkgmap.

BTW. I just saw an Mkgmap commit in which the transliteration code that already existed for the ASCII code-page is also made available for Latin1. I could do a new run after this weeks update to see what the results look like.

And what is what Splitter produces? A compressed file? And in what sizes?

I can handle uncompressed files up to 2GB. Or was the limit four GB? I have to check. To handle larger files I have to use 64 bit filepointers which I never did. But I see no problem implementing this as I saw code for it sometimes. But working on compressed files ? I have no idea how to do that. I did not even know that it was possible to work on a piece of a compressed file. I thought that a compressed file first had to be decompressed before use. If not that is fine but for me no clue where to start.

The whole point with mkgmap and converting from one code tabel to the other is that it places a questionmark ‘?’ if it cannot find a match. If it would just place one byte from the two bytes that it had to convert than it would do much better. Or just the byte if the character was one byte. Then you would not have seen me here as then I could transliterate afterwards the .img files downloaded from your side. Well atleast I think so.

So please find the piece of code where mkgmap places the ‘?’. (I did not deep in it as I did not finish with my tagupdater yet).

Today I extracted the borderline of White Russia. Then made a run for placenames. Found 22518 places which needed an int_name (or name:en). There were 36 places which had already a translation. 36!

Splitter produces the tiles that are almost always less then 30 MB in gz (gzip) format. It would be useful if your code is able to read and write gz format. Alternatively, if it is able to read from stdin and write to stdout then pipes can be used in combination with the zcat and gzip tools which is fine too.

Working with uncompressed OSM data stored on disk is only really doable if a) you have lots and lots of disk space b) you’re working on a small area. Neither is what I’m doing.

Perhaps Mkgmap could do that, but then you will be able to make a good map, but all your work has to be duplicated by others to be able to do the same. So I’m in favor of a solution that can be used by everyone.

Maybe a lot of smaller places don’t have a well known English or international translations. But, indeed, it’s not much. Are you planning on transliterating roadnames and POI as well?

I’ve made very sure that I have used the latest planet dump and other intermediate files for the new map update. Also, I’ve found no irregularities in the toolchain. So I’m very sure that the data used for the maps is up to date.

Edit: I’ve checked the planet file and it contains way 42003279 so it should be in the Garmin map too.

I can verify that my changes in Minnesota from early October are now in the tile I just downloaded. They were missing for about 1 or 2 cycles but it looks good now.

Thank you

Will investigate this when time comes…

That would be the next step indeed. With a minor code change/add to do. At least I’m interested to see how ways are stored in osm. I could test this soon for Ukrain. I just downloaded all bbox’s for Ukrain and saved them also. 445MB. That is a nice file for testing on way’s. That download took two and a half hour. It is a nice object first to see how much time is needed for extracting the place nodes. Transliterating them and ading an extra tag.

By the way. In Ukrain there are 30741 place names to do and 347 done.

many thanks Lambertus.

Everything looks good now.

Progress report:
Made a version which can handle a big data.osm file ( < 2GB or < 4Gb still don’t know)
Use: Put transliterationtable files and matching boundary.gpx files in an "Areas’ directory and drop the file.

As I had a 445MB file from data of Ukrain and places just outside its border not only I put in an Area Ukrain but also from the neighbouring countries. So I could test if all nodes would be handled properly i.e. with the transliterationtable of their country.

It all works.

These were the Areas:
bulgarije
hungary
kazachstan
moldavie
poland
romania
russia
serbia
slowakia
transnistria
ukraina
witrusland

To my surprise about 700 from the 271237 nodes fell outside all areas. Did I miss a neighbouring country? The next run I made a .gpx file of those nodes and discovered that all points (well execpt two ) were from this island:
http://www.openstreetmap.org/?mlat=46.401759380715&mlon=31.7477416992188&zoom=10&layers=B000FTF

I bet it belongs to Ukrain.

One of the two other outside all areas points was:
http://www.openstreetmap.org/?mlat=45.4438073&mlon=35.5707632
http://api.openstreetmap.org/api/0.6/node/311431487

So a correct decision. ( Even if it had fallen inside Ukrains border I I would not have transliterated it as there is no place tag).

How much time would you give me?

Hi greencaps,

I don’t recommend uploading transliterated names to OSM database.
The reasons are:

  • Wikipedia lists about a dozen of transliteration schemes for Russian language. Which one should be used? Why?
  • Bing Maps, Google Maps, Yahoo Maps use 3 different transliteration schemes.
  • How do you keep transliterated name in sync when cyrillic name is changed?
  • Some places may have established transliterated name, how do you differentiate those manually edited names vs. automatic?
  • Some place names in Russia have been imported from vmap0 which is English only, and reverse English->Russian transliteration have been used, as a result there are many errors in the cyrillic names.
  • There is ongoing project to match all place names from vmap0 against Russian address databases and other sources to spot all errors in cyrillic names and fix those.
  • IMHO it’s better to populate OSM database with translated names or at least the names matching vmap0, GNS, Geonames(?) databases.
  • Transliteration depends on the source language. Russian->English (for example Kiev) is different from Ukrainian->English (Kyiv). Transliterating regions around the border can be quite tricky.
  • Transliteration also depends on the target language, Russian->German is different from Russian->English.
  • Belarus now has Russian names in the ‘name’ tag and Belarusian names in the ‘name:be’ tag. I don’t know the reason for that (it might be due to import from data sources available only in Russian), but I can imagine they can switch the tags one day to have Belarusian language in the ‘name’ tag and Russian in the ‘name:ru’ tag. Persisted transliterated names will become incorrect…
  • Transliterated Russian maps are already available: http://gpsmapsearch.com/osm/mp/__russian_federation.translit.7z as well as some other countries.
  • There are many requests for Russian maps suitable for foreign visitors on many GPS/sat-nav related forums. This problem is well understood and one day OSM will improve on that to include many tourist attractions, site seeing places, etc…, at least in English.

All of the above suggest it’s better to do transliteration on the fly rather than persist one possible version out of many.
This way you can easily change transliteration schemes, change target language, etc…

Regarding ‘name:trans’/‘name:latin’ tags:

In OSM we use ‘name:’ scheme.

There is a standard scheme for specifying language codes (including regions, scripts and other details): see http://www.w3.org/International/articles/language-tags/ and RFC 5646.

According to this Russian written in latin script would be ‘name:ru-Latn’.
If you don’t want to specify the language but still want to specify the script, then may be ‘name:’ tag, i.e. name:Latn?
All script codes are 4-letter (the list of all possible values is available at http://unicode.org/iso15924/iso15924-codes.html) so there should be no confusion with language codes which are 2- or 3-letters.

Yuri, all valid arguments I’m sure. But afaik greencaps has no intention of uploading the results of his script. His script can be used by Garmin mapmakers like me (but also others) to help in transliteration. I’m sure that some transliteration in the maps is better then having questionmarks (?) all over the place (I know, it’s especially a problem with my maps and Mkgmap).

Thank you for summarising considerations for adding transliterations to osm database. I understand your concerns. At the moment my goal is an on the fly transliterater which can be put in Lambertus’ build chain. Thank you for the links also. I will study them.

But meanwhile I will alo further develop my persistent updator.

That would be an option. But not enough as still the language had to be specified.

Agreed. I’m not familiar with vmap0, GNS and Geonames but will investigate when time comes… (Anybody who can give me a clue how to use them over the internet is kindly invited to give me a link or hint.)

I made a run for Belarus. There are 27956 nodes with place tags. 169 have name:be, 52 name:en and 11 int_name. It looks as if not to many people in Belarus are adding to osm. Why would not they write belarussian if they added by hand? By the way: looking for someone who can switch them? ;-).

Progress report.
Made a console version which reads from stdin and writes to stdout. As I have no program which would grab that output it was displayed in the console window. As I had no program also which would read the source and write it to stdout where I could read it from stdin I tested by reading from the file self.

Where my not console version takes ten minutes for 500MB the console needed an hour. But I think this is because cmd.exe displays it all in its console window.

What is your build time and how much would you give me?