Automated edits for name tags

Regarding auto-fixing name & name:lang mismatches, I’ll implement this compromise instead:

Rather than auto fixing, the bot will create a log of suggestions. A log entry will be roughly as follows.


id: <node id>, name: X, name:he: Y, It is suggested to change both names to <X or Y, depending on which is more recent>

zstadler, I think we have fundamental/philosophical differences, (and that’s ok). From your view, a bot should simply never make a bad edit.

From my view, a bot’s ultimate purpose is to reduce mechanical human work, and so, if the following criteria are met, it’s a good bot:

  • It’s editing with extremely high “good edits” to “bad edits” ratio.

  • The “bad edits” are always caused by trusting the judgment of a human who previously inserted bad data.

  • The problems with the “bad edits” are very small and repairing them is not significantly harder than before the bot’s edit.

For instance, auto-fixing “name” and “name:he” mismatches qualifies as ok under those criteria:

  • Almost all autofixes will be good. Because almost all name/name:he mismatches are because someone was forgetful and updated one tag.

  • Bad edits occur when someone introduces a bad “name”, and then the bot copies it to “name:he” or vice versa.

  • The damage of the bad edit is nearly zero. You already had a “name” tag which needed human attention, and it was copied to “name:he”. But if a human reviewer laters corrects ONE of those tags, the bot will auto-modify the other. No extra work was added.

Those who’re following the thread already know, but it’s worth noting this autofix feature is disabled as of now. (There are other problems related to name:en e.g. see Sanniu’s post).

@SafwatHalaby, I agree that we have different approaches.

I think that the value a potential bad action should be compared to the value of a good action. For example, compare not having a name:he tag with copying a non-Hebrew name to the tag. Adding errors to the OSM DB cannot be justified by doing many low-value-added edits.

I’m especially worried about automatically overriding tag values, as opposed to setting non-existent tags.
For example, when both “name” and “name:he” exist and are different, the mismatch is often a result of an edit by an inexperienced editor. The new editor changed the name tag that was entered earlier to both “name” and “name:he” by a more experienced editor.

My approach to bulk edits has been to manually check and approve it before uploading to OSM. If you had done that, I assume changeset 49032230 would have happened before changeset 49031645

@zstadler,

I think that using name mismatches as a sort of improvised new user tripwire system is a very poor version of proper QA, because that tripwire only catches a subset of inexperienced user mistakes. Other simple actions like adding a new POI with a bad name will evade the tripwire. So we have to do proper QA (e.g. Osmcha with the new user filter) to catch all errors, and with a proper QA software, the tripwire becomes redundant.

Moving something which is not Hebrew to “name:he” is something I’d consider a serious issue. It damages a previously healthy primitives without any user intervention.

The changesets you linked to are a good point. I don’t do full reviews except for the smaller edits. I always take a sample though. But they’re also a good example of what I’d consider a non-serious bot error; a user added a bad value which the bot relied on and copied. As opposed to hypothetically copying Arabic to name:he, no serious damage was made because the POIs were already damaged in the first place.

Barring the “tripwire” trick, are there any other reasons why overriding should be treated differently from name copying? Suppose a bad edit adds a new POI with name=“םשלםבז”, the bot then copies that to name:he, adding extra “damage”.

Although this specific case is closed (I won’t do autofixes), I am interesting in understanding your viewpoint, because I feel similar disagreements may arise later on for whatever future scripts there may be.

I think we should use any semi/automated way to try and catch potential map issues. A mismatch between name and all name:* tags is an indication of an editing error. This hint is gone once an override happens.

Since its start, osmcha has been used just once to review an edit in Israel (your review from 8 days ago). This is not an effective means of improvement.

Personally, I prefer to go over lists of potential map issues and fix them one way or another, than to blindly review changesets with osmcha… When I fix an error of a new editor (and if he/she is still actively editing), I can send a polite note that explains the mistake and suggest a way to avoid it in the future.

Ok. I suppose it’s all good as long as the bot isn’t hiding human errors then.

I’ve literally reviewed all Israel edits shortly after Osmcha’s start (experienced edits only skimmed). I don’t bother clicking the “reviewed” button or even log in, perhaps I should change that habit. My history has quite a few fixes and reverts that were found thanks to Osmcha.

.

Osmcha does not stop you from doing that and I often do it. There’s a quick link to the changeset url at OSM, allowing easy comments.

Interestingly, this week the OSMWeekly newsletter reported:

It seems that the author is also concerned with the effects of bad mechanical edits.

In a reply later in the thread the author says:

Fortunately or not, in Israel there isn’t a large community that can do the cleanup…

Different views regarding bots and automated edits are expressed in the thread.

P.S.,
Some automated edits are safe and are very welcome. For example, removing extra whitespace in names (consecutive blanks, leading and training blanks), which I currently do semi-manually using this Overpass-Turbo query. I cannot think of a scenario where such an edit can introduce an error.

For the record: thread archive.

I don’t think the bot ever introduced stuff that required later manual cleanup though, and please do correct me if I’m wrong. It’s important for me to make the bot as non intrusive as possible.

(Even the now canceled autofixes wouldn’t have spawned cleanup jobs from scratch. Only if bad edits are there in the first place, they get copied to one more tag)

I’ll add this to my to do list. Perhaps I’ll even do this globally, if the wider community agrees.

As long as the target tags did not exist, I’m happy. I would not expect a bot to do semantic checks.

I would expect a bot to do reasonable structural checks, as already done by the name copying bot. For example, avoiding copying a name that has no Hebrew letters to name:he and similarly for Arabic. It could be improved, however, to also avoid copying non-textual “name” tags to “name:en”. Perhaps another bot can fix the 1000+ existing such duplicates.

I completely agree. I’ll fix this.

@zstadler: I’ve thought about it and I agree with your view regarding autofixes. Detecting new user errors is more valuable than copying a few dozen extra names.

I still think I need to print logs or add fixmes, so that a human is more likely to fix the mismatches.

name:lang no longer copied to name, ever. Only name to name:lang, when name:lang does not exist.

fixed

If you have the time, the following errors need human attention.

https://wiki.openstreetmap.org/wiki/User:SafwatHalaby/scripts/nameCopy/errors

I’ve just ran this script (After quite a long absence). Hopefully I’ll be running it regularly from now on. Full automation being investigated too.

Changeset: https://www.openstreetmap.org/changeset/85311424

Welcome back!
Will try to look and fix the discrepancies found.
Are you going to update bus stops too?

I’ve executed the bus update script today (last executed in 2018!)

I am planning to update that to fully automatic too.

Changesets:
https://www.openstreetmap.org/changeset/91500256
https://www.openstreetmap.org/changeset/91502029