street names in Israel have several fundamental problems

  1. Carmel and HaCarmel are actual different names. See for example:
    HaCarmel: http://goo.gl/maps/VwKi7
    Carmel: http://goo.gl/maps/Ni9Au
    We have to choose the correct one for name, name:he, name:en, etc.
    For name:en1, name:he1, etc. we can put the other one.

  2. Nominatim already removes apostrophes when you search - try searching for “Earls Court” and “Earl’s Court”, for each search you will get results with and without apostrophe. I think the name on the sign has to always be one of the stored names. Whether it should be name:en or name:en1, I’d have to think more.

IMHO, it’s a search engine’s job to recognize wrongly typed queries, and it’s very wrong to add all possible type/translate variants. Name must have the only right value, and search engine must know to derive all other values. It’s not possible now, but for sure will be.

I have created a different kind of table that includes the first 5030 streets (sorted by alphabetic order by name field).

https://docs.google.com/spreadsheet/ccc?key=0AjoRSMeOZcXDdGFaYjFZcXB6eS1RTVR1RGhmWjI3M2c

How is this table different from the one yrtimiD created?

  1. It includes all streets no matter if there is something missing or not (this makes it searchable and you can most probably copy paste the street names, no need to translate).
  2. It marks all missing fields for name:he and name:en with a different color. If somebody writes something in this field, it is changing the background color to white.
  3. It also includes the update field that will change its color if somebody wrote something in it.
  4. It includes the russian and arabic translation. As it is very easy to copy paste things, it could help a lot to add those translations to existing streets.

Why did I create this list only for the first 5030 entries?

Because Google is not capable to load all 21.000 entries in one table in Google Drive.
And because I think we should start with a small chunk and then extend it.
This way we will not cause any damage.

For everybody that would like to know how to create this list by yourself … here is the command line “code”.
After it I loaded it in Excel and used the sort function.


osmconvert israel_and_palestine.osm.pbf --all-to-nodes  --drop-author -o=temp\israel_nodes.o5m
osmfilter temp\israel_nodes.o5m  --keep="highway=residential and name=* and is_in!=Egypt and is_in!=Jordan and is_in!=Syria and is_in!=Lebanon and is_in!=Gaza" -o=temp\israel_streets.o5m
osmconvert temp\israel_streets.o5m -B=poly\israel.poly --csv-headline --csv="@id @lon @lat is_in highway name name:he name:he1 name:he2 name:he3 name:ar name:ru name:en name:en1 name:en2 name:en3 name:en4 name:en5 name:en6" -o=israel_streets_withArRuEnHe.csv
pause

I don’t think that we will get there in the near future.
It is very easy to add more fields to be searched through in software, but it is a lot harder to create a working algorithm that will take care about the different translation types.
Specially for Israel, where we have such a “strange” left to right language :slight_smile:

@yrtimiD: Can we use my table or is there a problem with the ID or other stuff?

I have added many entries to yrtimiD’s table. I have also highlighted in yellow where I think there is probably a small mistake, but don’t want to fix it without looking at name:he1 or name:en1 first.

It would be better to work from Mr_Israel’s table (once the changes to yrtimiD’s table are imported). Currently though I find Mr_Israel’s table hard to work with, because the columns name:he1-4 are too wide. If that is changed, so that it is possible to see the most important names (name, name:he, name:en) at the same time, it will be better because it is more complete.

Hey Eric,

Of course you can change the size of the columns. This shouldn’t be a issue.
This is just a visualization problem you writing about.
We could make them very small and define to grey them out if they are empty anyway.
This would make it even easier to avoid them…

I have changed the view of the table. It should be fitting more onto the browser now.

@yrtimiD: Can you tell me if you can reimport my table (technically speaking)?
I personally don’t know how to do that, thats why I haven’t done anything so far on my table. But I would be very fast finished, if you tell me that you are able to work with it.

I think yes, but to be sure - I’d need a version column.
Also, your ID column looks very strange to me.

If you tell me how get your columns, then I will add them.
I’m using OSMCONVERT to create the table and the details are listed bellow.

yrtimiD - can you upload every entry in your table which has a number in the Update column, OR for all which all three entries (name, name:en, name:he) are present? After that is done, we can use Mr_Israel’s table (or a variation of it).

Mr_Israel: I must have a browser issue…

Here: https://www.dropbox.com/s/8w81rgmuxd0x7lk/israel_only.poly
is mine israel only .poly file.
Because I have no good knowledge in palestinian areas, I tried to include all settlements with hebrew names.
Fill free to edit and publish.

I haven’t received that much feedback about the problems I specified.
So here again a kind of summary… Please give me a feedback of your opinion. Otherwise it doesn’t make sense to work through so much data and update OSM without taking about this and that problem right away.

1. Street name translations not standardized

Eric22: (?)
yrtimiD: (?)
Mr_Israel: I would try to standardize the street names in some kind of way. I would recommend to use a standard translation and don’t care about the translation written on the street sign.

Current situation:


מנדלי מוכר ספרים        Mendele Mocher Sfarim
מנדלי מוכר הספרים        Mendele Mocher Sforim
מנדלי מוכר הספרים        Mendele Moher Sefarim
מנדלי מוכר ספרים        Mendele Moher Sefarim
מנדלי מוכר ספרים        Mendelei Mocher Sfarim
מנדלי מוכר ספרים        Mendeley Moher Sfarim
מנדלי מוכר ספרים        Mendeli Mokher Sfarmin

What I would use in all 7 cases:


[name]                [name:he]                [name:he1]           [name:en]                   [name:en1]                 [name:en2]                     [name:en3]                       [name:en4]                  [name:en5] 
מנדלי מוכר הספרים          מנדלי מוכר ספרים      מנדלי מוכר ספרים    Mendele Mocher Sfarim     Mendele Mocher Sforim       Mendele Moher Sefarim          Mendelei Mocher Sfarim           Mendeley Moher Sfarim      Mendeli Mokher Sfarmin

**2. Ha (“ה”) or no Ha in front of the street name **

Eric22: (?)
yrtimiD: (?)
Mr_Israel:
Use the street name as written on the street sign for the “name” tag of the way and translate the words to English.
Please try as much as possible to use name:en1 for the root of the word, to make it easier to find.

Example:
name:en HaGalil
name:en1 Galil

**3. With a apostrophe after Ha (“ה”) or without in the English translation **

yrtimiD: (?)
Eric22:
Nominatim already removes apostrophes when you search - try searching for “Earls Court” and “Earl’s Court”, for each search you will get results with and without apostrophe. I think the name on the sign has to always be one of the stored names. Whether it should be name:en or name:en1, I’d have to think more.

Mr_Israel:
I would like to remove all quotes from the Ha phenomen, no matter what is written on the street sign.
There is no reason to keep it and create even more variations of street names with this strange character.
It shouldn’t matter if the translation is written on the street name with a apostrophe. We should remove all of them from the name:en tag. If you think the information should be still saved, we could save it as name:en1.

Checked manual for osmconvert, and it looks like impossible to output version of the object.
Also, now I know how to convert back your ID values.

I needed the version value to be sure I do not overwrite somebody else’s changes.

So, Mr_Israel, I think you can continue your fixes, and then you’ll finish, I’ll apply all non empty values from changed rows back to the ways and upload them to the OSM.

I’ll actually will prefer the wikipedia variant of translation in name:en (for example http://en.wikipedia.org/wiki/Mendele_Mocher_Sforim)
also, i’d drop all 100% incorrect variants like “Mendeli Mokher Sfarmin”

I’m really not sure if we need to keep all that strange english varians, I’d keep only one, mostly used translation (if it’s equal to the wikipedia variant)
For hebrew names - must be only one value, as all other are for sure mistaken.

About Ha* and Ha’* I’m really don’t know what is better and right, but for sure we must normalize all to one schema.

So how would you like to decide what version is correct.
With or without “Ha” ?

Example:

מנדלי מוכר ספרים
מנדלי מוכר [b]ה[/b]ספרים

I don’t want to be the one to delete correct value, because I think I’m more accurate than the person that did the editing.
We should try to never loose any data.

Beside that we have also the issue of Jerusalem. Should we translate it to name:en=Jerusalem or name:en=Yerusalaim

This will not be answered by Wikipedia.

And also I’m not sure with “Mokher” and “Moher” and “Mocher”.
iGo for example would use “Mokher”.

IMHO, the version “מנדלי מוכר הספרים” is wrong. But here we do have wiki for answer.
For other cases it’s quite difficult to know what is right, as we all saw many road signs with awful typing errors.
I asked my friend (his mother learned linguistics), so tomorrow we’ll possible get answer about apostrophe usage after Ha.

Let’s for start fix all typing errors and all common errors like:


א.ד. גורדיון (wrong Yud)
אהרונוביץ (no ' at the end)
אירוס ארגמן (for sure must be HaArgaman in hebrew)
אלוף הניצחון or אלוף הנצחון
אריה דולצין or אריה דולצ'ין
all variants of ארלוזורוב must be for sure ד"ר חיים ארלוזורוב
same for ביאליק must be full name and not shortcut
etc...

Ok.

After 2 - 3 hours of work, I have a table with a lot of fixes ready.
There is still a lot left to do in the table, but its very interesting if the upload will be possible that easy or not.
https://docs.google.com/spreadsheet/ccc?key=0AjoRSMeOZcXDdGFaYjFZcXB6eS1RTVR1RGhmWjI3M2c

Following things have been changed so far:

  1. tried removed all “derech” / “roads” / “streets”
  2. copied from existing translations to missing translations.
  3. If there are several translation types of the same Hebrew spelling I used name:en1, name:en2 etc. and updated this way all roads.
  4. If there where several of a kind of road and there was a existing arabic or russian translation I copied those into other fields.

To Mr_Israel’s questions:

  1. Hebrew - as written on sign. English - For names (of people, places, etc.) - standard spelling according to Wikipedia or other resources, unless there is a note saying “I saw the road sign and the English spelling is ___” or similar.
  2. Same as Mr_Israel above
  3. I change my mind - I prefer without apostrophe, unless there is a note saying “I saw the road sign and there is an apostrophe” or similar.

(I assume that almost all English translations are based on assumptions or general knowledge, not having actually looked carefully at the spelling on the sign.)

Agreed

Not sure about this one. On Tel Aviv street signs there is no ד"ר and I don’t think the average person would think to look for this. And when I search for the address on the Israel post office website (that’s pretty official, no?) it just uses ארלוזורוב in Tel Aviv and Haifa (the only two I checked)! So I think maybe we shouldn’t add first names and titles when they aren’t already there.

If you pronounce “Yerushalayim” you’re probably already using Hebrew letters, and “Jerusalem” is universally known, and it avoids issues with all the Arabs and international people who also consider the city to be “theirs”, so I prefer “Jerusalem”. “Yerushalayim” should be a secondary name.

About name:en
this tag is for rendering map for english speaking people, so if some name have well known english translation - it must be used. And if we have such well known name - all transliterations like “Yerushalaim” unneeded.

If somebody want to review (and update) all city and towns he/en/ar/ru names:
https://docs.google.com/spreadsheet/ccc?key=0AolLjmdDjvyydFUySlBiY1NwWUV1dm9wQVBrOVBCTEE