Standardization of Thai words

Hello,

Maybe as some of you have noticed, recently I’ve done a lot of changes/additions to Thailand map. First of all I want to say because I’m quite an excited newbie, sometimes I’m making mistakes. I’m sorry for that, please inform me if you discover any mistakes I’ve done, I’ll undo them.

And the purpose of this topic, as most of you know it is difficult to convert Thai words to Latin alphabet. I have noticed that there are multiple tagging of some words. For example, “island” is sometimes tagged as “Koh” & sometimes as “Ko”.

So I suggest standartizing common Thai words. For example “Ko” instead of “Koh”, “Hat” instead of “Haad”, “Ban” instead of “Baan”, “Ao” instead of “Aow” etc. Those are usually the default usage at sources like Wikipedia, Lonely Planet, Rough Guides etc. I have also noticed that Thai government is also using “Ko”, “Hat” etc. to name natural parks.

What do you guys think about that? Should I begin doing this this kind of standardization?

Thanks

Hi,

please don’t.

For the name:en we agreed to use RTGS.
As you might have noticed this is not used on signs in a consistent way. We had this discussion a longer time ago already. To help people who actually use the map we stick to the “on the ground” rule and put it in the map as found on the sign.

Please double check thai script names you enter. These are the real names. Different types of transliteration can be automatically created to facilitate for example searches.

Stephan

ok. but still I think the rendered map doesn’t look professional when one island is named as “ko” and another one as “koh”. no other serious map out there does that.

maybe a better approach would have been standartising the name tag and using non-standart name at the alt_name tag. for example hat rin for name tag, haad rin for alt_name tag. well that’s my 2 cents…

Hi,

the name tag is OK in most places I’ve seen. It contains the name of the place in the correct writing (Thai script, as we’re in Thailand). Some places use abbreviations which is a thing we don’t want in the OSM database. This is a thing you can certainly fix once you detect it.

For places like islands it would be probably OK to use the correct and official RTGS transliteration (Ko) for name:en. There is no such thing like “the” official sign of it.

For street names it’s a different story. For example here in Chiang Mai some sois are written as “lane” on the sign. To keep people from getting confused we used the name on the sign. In case there are two signs with different transliteration you can use “alt_name:*” style tags to keep the alternative forms in the database. So tools like the place finder using nominatim can still find it.

Restaurants of other places sometimes use a funny way of transliterating their name. But then it’s their decision to do so. The authoritative name is the Thai one.

So I’m still against massive retagging, but for specific cases it should be alright. Probably discuss each case before editing?
How did you plan to do the edit? Doing a so called mechanical edit can break things quite easy.

Stephan

For some provinces and districts I’ve been adding the transliterated Thai name under the name:th_latn tag. For example, Nonthaburi Province is tagged

name=จังหวัดนนทบุรี
name:en=Nonthaburi Province
name:th_latn=Changwat Nonthaburi

This allows us to keep the name:en tag understandable for English readers, while also presenting the information in the Thai name. (I’ve done likewise for some roads, using “Road” for name:en and “Thanon” for name:th_latn.)

Would it be a good idea to use translated terms e.g. “Island” and “beach” instead of “Ko”/“Koh” and “Hat”/“Haat”/“Had”/“Haad” for the name:en tag, and move the romanised Thai terms to name:th_latn for other geographical elements as well? (I elected to use “th_latn” since “latn” is the ISO 15924 code for Latin, and OSM tags generally use the underscore.)

As I must have missed your reply, a bit delayed:

In Japan they use name:jp_rm for the romanized form of the name (in Kana). Korea uses name:ko_rm for the Hangul romanized writing.

So based on that a more established way would be to use name:th_rm for the romanized (written with latin alphabet) form of names following RTGS.

Stephan

Serbian appears to use name:sr-Latn, which follows the IETF’s BCP 47 standard. Probably th_latn wasn’t a great choice, with the substitution of the hyphen and upper-case L. It seems there are really two alternatives: name:th_rm, to follow Korean and Japanese, or name:th-Latn, to follow Serbian. The Korean/Japanese tags are more OSM-style, but I think the Serbian way has the advantage of following an established standard, which would better allow for future expansion. Wouldn’t it be better to settle on th-Latn?

Following BCP 47 sounds OK as well. From a technical POV it doesn’t matter. Nominatim is parsing complete name:*

And for map styles and other applications it needs to be coded. Doesn’t matter whether it’s name:th_rm or name:th-Latn.

Another option is not to have it at all. In most cases the RTGS transliteration can be automatically generated from the Thai name.
I thought about trying it on a map style. A while ago I had talked about possible ways to implement it with Jochen Topf. He received some money from Wikimedia to implement a prototype for Wikipedia maps.
On of the most interesting ways to implement it would be a library to include in Postgres.

Unfortunately I did not find any free implementation of RTGS. I found some documentation and it looks not overly difficult, but as my Thai skills are quite limited I’m afraid to do my own implementation.

It might be time-saving to only create these tags when the romanized form is not easily obtainable from the Thai script.

I’ve changed existing uses to name:th-Latn. I’ll add the suggestion to the Wiki if no one objects.

Automatic generation of romanised names sounds great, but from experience I think it’d be much more difficult than it seems. Although Thai is a largely phonetic language, and native Thai words mostly nicely lend themselves to such systems, the magnitude of imported terms, especially from Pali/Sanskrit, greatly complicates things, as they are not spelled phonetically. Take Bangkok’s name, กรุงเทพมหานคร, for example. Phonetically it would be read as krung-the-phom-han-khon rather than Krung Thep Mahanakhon. A dictionary-based approach seems necessary for the software to be able recognise these terms, but still proper nouns will prove difficult to tackle.