Automated edits for name tags

New algorithm:

  • If name exists, deduce language, if name:lang doesn’t exist for that language, copy name to name:lang. (If it exists, warn if the two names are different)

  • if only one name:lang exists, but name doesn’t, copy name:lang to name. (or warn if name:lang’s language isn’t really lang)

name:lang is currently only for he,ar,en.

Offline test for Israel + PL (not uploaded)


Total modifications: 77440
Total name:en to name: 743
Total name to name:en: 26520
Total name:ar to name: 435 **
Total name to name:ar: 3693
Total name:he to name: 1*
Total name to name:he: 46048

* Probably low because many have both name:he and (name:ar or name:en or name:ru) so
 the script doesn't know which one to copy and skips. Plus because of my previous run.

** Probably high because my previous run didn't touch most of PL.

Live test on a very small scale: https://www.openstreetmap.org/changeset/49171684

Places (villages, towns and so on) in OSM sometimes have both a node and a boundary (way or relation).
It is rather important all tags which exists simultaneously on both objects should be equal.
So different software will use the same tag value regardless where they get the data

Here is some list of such errors.
Different name:en on node and boundary for example
http://wowik.000space.com/places/il/err.htm#err17

P.S. There are also a lot of non equal name (name:he) tags
https://www.openstreetmap.org/node/278470039 https://www.openstreetmap.org/way/92001545
https://www.openstreetmap.org/node/1068786999 https://www.openstreetmap.org/way/92001538
https://www.openstreetmap.org/node/278474036 https://www.openstreetmap.org/way/92001543

Sounds like a good idea, but I think it should be on a separate script.

Actually, I don’t think a script can do it. You’d need a human to manually tell which is the preferred name, right?

Edit: At least partial automation is possible: I could also obtain a list of the official names, and assign higher priority to them, when there’s a conflict.

I just conducted a run on the northern district. Some problems cannot be auto fixed and need human help, they are mentioned in the changeset comment:

https://www.openstreetmap.org/changeset/49223571

Changes applied to Haifa District. I’m going to pause for a few days to see if some issues arise, before doing the remaining districts.

Haifa errors that need human help: https://www.openstreetmap.org/changeset/49225264

It appears we have two different space characters in names.


\u00A0 - Unbreakable Space (NBSP)
\u0020 - Space (SP)

@wowik Many places have node, landuse=residential, and also an administrative boundary from the MOIN import. So that’s 3 places per … place.

Please note that massive updates are causing delays in the updates of the Israel Hiking and Biking maps.
Since 2-June it takes more than 36 hours to complete the maps updates, compared to 1-4 hours previously.

Also see https://www.facebook.com/groups/994960670559126/permalink/1360244540697402/

Thanks for notifying me!

  1. 99%+ of the changes were name to name:he copies. This should be a one-time load.

  2. I updated 5 of the 7 Israel districts in one week. I could have spread this over weeks had I considered the load. I will update the remaining 2 districts in 1 week intervals or more, and will pause the updates for now until things propagate properly.

  3. I am glad you reported this now. I was about to run it for the remaining districts.

I halted my changes for now, but is IHM down? And is it related?

https://www.facebook.com/groups/994960670559126/permalink/1364330870288769/

Would this algorithm addition be acceptable?

  • If name and name:he mismatch (and they’re both Hebrew), see which one was the most recently updated, and update the other one accordingly.

Strongly against it - I found many streets where last edit was problematic. When you do automatic edit over problematic one it make very hard to spot an error and make harder to find the source of error.

I accidentally pushed an experiment. Will revert in a second.

Reverted. Further explanation: The autofix code has been ready for a while, but didn’t work due to a bug in the scripting plugin. Today, that bug was fixed, and I went ahead and tested the code, but I also accidentally uploaded.

Now that it works, we need to decide if it’s desired.

I believe the benefits are greater than the drawbacks, but I am willing to stand corrected. Here is why:

  • Most people make good edits (I hope!), if so, the the last edit should usually be correct. We need to check out the diff to confirm this: #50233725 (which is now reverted)

  • Name/name:lang mismatch are very hard to manually detect because sometimes it’s the good name that renders or shows up in editor descriptions. Because of that, sometimes these mistakes survive for months or even years. Example here. I argue it’s often easier to see the bad edit when it appears in both tags. Quite often, a name mismatch can only be detected by someone actively looking for mismatches.

  • When I find a mismatch, I often trace the history to figure out which name is newer. That’s mechanical and boring, and the bot can do it for me, faster.

  • I want the bot to “simulate” a single name tag. When you edit one tag, you are forced to edit the other whether it’s a good or a bad edit. I believe this makes life easier.

We could have a middle-ground, where we manually review the autofix. (I can post them here whenever they’re made), It’s still much faster than manual fixes, and will also catch the bad edits.

how about a: fixme=“Name fixed by bot. Please review.”?

I don’t think the “fixme” tagging is working. There are more than 30,000 nodes and 1700 ways with a “fixme” tag in Israel, not no mention “fixme” in “note” tags…

Having an table with columns for “name”, “name:he”, “name1”, and “name:he1” would be more efficient.

See this post for how to update OSM tags using a CSV file.