Automated edits for name tags

The bot does not introduce additional fixing effort. In the case of “Herzl הרצל”, you had to fix both “name” and “name:he” regardless of the bot’s work. There were two errors (missing name:he, wrong name), and there remained two errors.

(In fact, I argue the bot makes this slightly easier, because you don’t have to type in “name:he” in JOSM and you just edit the value)

On the other hand the bot alleviates effort by fixing many cases (like “ksp מחשבים”).

Regarding autofixes (fixing cases where name,name:he exist but mismatch, by tracing history) Yes, a minority of nodes that had 1 error would have 2. But most nodes that had 1 error would have 0. The net result is less man work.

(autofixes are not enabled but the code works)

Here are a few ideas for safe corrections that a bot can do and save manual work:

  • Remove leading spaces, trailing spaces, and multiple spaces from the “name” tag and all “name:*” tags
  • Add “outer” role on members of “type=polygon” and “type=boundary” relations

The bot supports this (and does it opportunistically), I’ll just need to fetch all of the bad names with an overpass regex to do it for all nodes.

Out of scope for this script, but that’s a good idea for a different script.

I’m all for saving manual work, and my point is that the “all characters in the name tag are either Hebrew, digits, English, space, or symbol characters, with at least 1 Hebrew character” and the “autofixes” have a net result of vast work saving for the most nodes, despite copying an error in a minority of nodes.

Not just for nodes. Ways and relations too.

http://overpass-turbo.eu/s/qCx

It supports this for all primtiives. I didn’t mean “nodes”. Sorry.

I’ve found that many fixes done mainly by Moveit team are applied only to name tag, leaving name:* unchanged - that leads to mismatch between English and Hebrew street names. Thanks for the old name:he tag these problems easily detectable on KeepRight, but if you’ll run you bot all such problems will be wallpapered…

Not connected to this, I was thinking about how to detect name:he/name:en tag mismatches. May be table can be created for streets that have same name:he, but different name:en in different cities or street parts?

That’s a good point. I’ll keep autofixes off.

I’ve been thinking of an algorithm that compares a Hebrew and an English string and decides if they’re likely the same. No table needed. It’s not tested yet: “Normalize” Hebrew and English, then compare. Normalization roughly as follows:

  • We start with a Hebrew an an English string and apply these:

  • Remove all vowels (u,a, ו, etc)

  • Lowercase everything

  • Convert all Hebrew characters to English (א > A, ב > B) and so on.

  • Normalize problematic / phonetically similar characters, (e.g. b,v,ב all become b)

  • Normalize the remaining problem characters like צ which may translate to ts or tz etc. (This requires real world testing)

Now compare the strings. If they’re not identical, mark the node as suspicious. False positives will help me refine this.

The above will mark all non-transliterations. False positives -.-

In theory, this can be done by scanning the history of all nodes: If name:he changes without name:en changing or vice-versa, mark the node as potentially bad. Optionally, unmark the nodes that seem to properly transliterate using the algorithm above to clear some noise.

The first run would be expensive, but later runs only need to inspect the deltas.

Regarding auto-fixing name & name:lang mismatches, I’ll implement this compromise instead:

Rather than auto fixing, the bot will create a log of suggestions. A log entry will be roughly as follows.


id: <node id>, name: X, name:he: Y, It is suggested to change both names to <X or Y, depending on which is more recent>

zstadler, I think we have fundamental/philosophical differences, (and that’s ok). From your view, a bot should simply never make a bad edit.

From my view, a bot’s ultimate purpose is to reduce mechanical human work, and so, if the following criteria are met, it’s a good bot:

  • It’s editing with extremely high “good edits” to “bad edits” ratio.

  • The “bad edits” are always caused by trusting the judgment of a human who previously inserted bad data.

  • The problems with the “bad edits” are very small and repairing them is not significantly harder than before the bot’s edit.

For instance, auto-fixing “name” and “name:he” mismatches qualifies as ok under those criteria:

  • Almost all autofixes will be good. Because almost all name/name:he mismatches are because someone was forgetful and updated one tag.

  • Bad edits occur when someone introduces a bad “name”, and then the bot copies it to “name:he” or vice versa.

  • The damage of the bad edit is nearly zero. You already had a “name” tag which needed human attention, and it was copied to “name:he”. But if a human reviewer laters corrects ONE of those tags, the bot will auto-modify the other. No extra work was added.

Those who’re following the thread already know, but it’s worth noting this autofix feature is disabled as of now. (There are other problems related to name:en e.g. see Sanniu’s post).

@SafwatHalaby, I agree that we have different approaches.

I think that the value a potential bad action should be compared to the value of a good action. For example, compare not having a name:he tag with copying a non-Hebrew name to the tag. Adding errors to the OSM DB cannot be justified by doing many low-value-added edits.

I’m especially worried about automatically overriding tag values, as opposed to setting non-existent tags.
For example, when both “name” and “name:he” exist and are different, the mismatch is often a result of an edit by an inexperienced editor. The new editor changed the name tag that was entered earlier to both “name” and “name:he” by a more experienced editor.

My approach to bulk edits has been to manually check and approve it before uploading to OSM. If you had done that, I assume changeset 49032230 would have happened before changeset 49031645

@zstadler,

I think that using name mismatches as a sort of improvised new user tripwire system is a very poor version of proper QA, because that tripwire only catches a subset of inexperienced user mistakes. Other simple actions like adding a new POI with a bad name will evade the tripwire. So we have to do proper QA (e.g. Osmcha with the new user filter) to catch all errors, and with a proper QA software, the tripwire becomes redundant.

Moving something which is not Hebrew to “name:he” is something I’d consider a serious issue. It damages a previously healthy primitives without any user intervention.

The changesets you linked to are a good point. I don’t do full reviews except for the smaller edits. I always take a sample though. But they’re also a good example of what I’d consider a non-serious bot error; a user added a bad value which the bot relied on and copied. As opposed to hypothetically copying Arabic to name:he, no serious damage was made because the POIs were already damaged in the first place.

Barring the “tripwire” trick, are there any other reasons why overriding should be treated differently from name copying? Suppose a bad edit adds a new POI with name=“םשלםבז”, the bot then copies that to name:he, adding extra “damage”.

Although this specific case is closed (I won’t do autofixes), I am interesting in understanding your viewpoint, because I feel similar disagreements may arise later on for whatever future scripts there may be.

I think we should use any semi/automated way to try and catch potential map issues. A mismatch between name and all name:* tags is an indication of an editing error. This hint is gone once an override happens.

Since its start, osmcha has been used just once to review an edit in Israel (your review from 8 days ago). This is not an effective means of improvement.

Personally, I prefer to go over lists of potential map issues and fix them one way or another, than to blindly review changesets with osmcha… When I fix an error of a new editor (and if he/she is still actively editing), I can send a polite note that explains the mistake and suggest a way to avoid it in the future.

Ok. I suppose it’s all good as long as the bot isn’t hiding human errors then.

I’ve literally reviewed all Israel edits shortly after Osmcha’s start (experienced edits only skimmed). I don’t bother clicking the “reviewed” button or even log in, perhaps I should change that habit. My history has quite a few fixes and reverts that were found thanks to Osmcha.

.

Osmcha does not stop you from doing that and I often do it. There’s a quick link to the changeset url at OSM, allowing easy comments.

Interestingly, this week the OSMWeekly newsletter reported:

It seems that the author is also concerned with the effects of bad mechanical edits.

In a reply later in the thread the author says:

Fortunately or not, in Israel there isn’t a large community that can do the cleanup…

Different views regarding bots and automated edits are expressed in the thread.

P.S.,
Some automated edits are safe and are very welcome. For example, removing extra whitespace in names (consecutive blanks, leading and training blanks), which I currently do semi-manually using this Overpass-Turbo query. I cannot think of a scenario where such an edit can introduce an error.

For the record: thread archive.

I don’t think the bot ever introduced stuff that required later manual cleanup though, and please do correct me if I’m wrong. It’s important for me to make the bot as non intrusive as possible.

(Even the now canceled autofixes wouldn’t have spawned cleanup jobs from scratch. Only if bad edits are there in the first place, they get copied to one more tag)