OpenStreetMap Forum

The Free Wiki World Map

You are not logged in.

Announcement

A fix has been applied to the login system for the forums - if you have trouble logging in please contact support@openstreetmap.org with both your forum username and your OpenStreetMap username so we can make sure your accounts are properly linked.

#26 2017-07-13 05:21:16

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

We could have a middle-ground, where we manually review the autofix. (I can post them here whenever they're made), It's still much faster than manual fixes, and will also catch the bad edits.

Offline

#27 2017-07-13 05:33:59

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

how about a: fixme="Name fixed by bot. Please review."?

Offline

#28 2017-07-13 08:51:13

zstadler
Member
Registered: 2012-05-05
Posts: 269
Website

Re: Automated edits for name tags

I don't think the "fixme" tagging is working. There are more than 30,000 nodes and 1700 ways with a "fixme" tag in Israel, not no mention "fixme" in "note" tags...

Having an table with columns for "name", "name:he", "name1", and "name:he1" would be more efficient.

See this post for how to update OSM tags using a CSV file.

Offline

#29 2017-07-13 08:59:45

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

I'd say fixmes pending review indefinitely are better than name mismatches pending review indefinitely.

By the way, almost all node fixmes are from the GTFS bus stop import. It may be wise to bulk-remove those, to shine the light on the more important fixmes.

Last edited by SwiftFast (2017-07-13 10:46:15)

Offline

#30 2017-07-13 10:32:13

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

Edit: Sorry, I was confusing this with something else and this comment is wrong regarding overpass. Please ignore the previous version of this comment.

I'll not apply auto-fixes for now. I'll look into manually updating them (via the csv method or some other way).

Last edited by SwiftFast (2017-07-13 10:45:10)

Offline

#31 2017-07-13 11:11:46

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

Nope, I still don't like this. It's wasted manpower; Even if I manually fix them all, the mismatches will accumulate over time and the manual fix must be done periodically, and most mismatch "errors" will be in fact legitimate edits where someone changes a name tag and forgets the other. It's mechanical drudgery.

Wouldn't it be easier to just treat the two tags as a single tag and have the bot auto-synchronize them? We also don't even need a fixme for this; If a user mis-edits a tag, it's not the synchronization's fault. Mis-edits are normal in OSM, and name tag mis-edits should be handled like any other mis-edit, through monitoring tools and such.

I can understand the need for a fixme tag for the first edit only (because some autofixes will be grabbed from deeper history), but no need for a fixme when this is synchronized periodically. By the way, it'll only add an additional 372 fixmes to the 30k already present.

Last edited by SwiftFast (2017-07-13 11:13:23)

Offline

#32 2017-07-13 13:08:28

zstadler
Member
Registered: 2012-05-05
Posts: 269
Website

Re: Automated edits for name tags

For the record, http://overpass-turbo.eu/s/qmy is a query that finds elements in Israel that have a Hebrew "name" tag and a "name:he" tag that are different.

It has an option to output a CSV file by un-commenting the CSV output definition.

The current element count is: nodes: 62, ways: 430, relations: 11

Last edited by zstadler (2017-07-13 13:09:41)

Offline

#33 2017-07-13 13:35:33

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

There are also 586 in total, if we count Arabic, Hebrew, English. 372 of which are auto-fixable by swiftfast_bot.

Last edited by SwiftFast (2017-07-13 13:35:48)

Offline

#34 2017-08-01 11:19:41

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

Does everyone agree that we should always have Hebrew "name" tags duplicated at "name:he"? I asked prior to running the scripts and I think everyone agreed, but I cannot find the post anymore. (Rationale: The language of the "name" tag varies in Israel. But name:he guarantees Hebrew).

Last edited by SwiftFast (2017-08-01 11:19:59)

Offline

#35 2017-08-01 12:08:26

zstadler
Member
Registered: 2012-05-05
Posts: 269
Website

Re: Automated edits for name tags

I agree that

If an element does not have a name:he tag,
    and all characters in the name tag are either Hebrew, digits, space, or symbol characters
then its name tag should be copied to the name:he tag

The following cases should not be handles automatically:
- name tags with foreign language characters
- name tags that are different than the name:he tags

Offline

#36 2017-08-01 13:21:42

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

The current rules are similar, except for line 2. I think they are as follows.

and all characters in the name tag are either Hebrew, digits, English, space, or symbol characters, with at least 1 Hebrew character

This allows copying things like "KSP מחשבים". Do you think that's a bad idea?

I should publish the source code soon. (I wanted to fully automate it first, but that's not going to happen soon).

Offline

#37 2017-08-01 13:43:42

zstadler
Member
Registered: 2012-05-05
Posts: 269
Website

Re: Automated edits for name tags

SwiftFast wrote:

This allows copying things like "KSP מחשבים". Do you think that's a bad idea?

It's not good enough because it has no notion of the English contents and would also copy "Herzl הרצל" - using a naming scheme we cleaned-up in Jerusalem a while ago.

Offline

#38 2017-08-01 15:04:28

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

It's not good enough because it has no notion of the English contents and would also copy "Herzl הרצל" - using a naming scheme we cleaned-up in Jerusalem a while ago.

Why is that a bad thing? It wouldn't introduce a new error, it would just keep an already existing error unfixed. This is similar to Sanniu's opposition to the autofixes.


I think everything would be much simpler if we treat the bot as a convenience copy-machine. It "binds" name and name:he. If the bot copies a faulty tag, it's not the bot's fault, and it doesn't really make things worse. A human needed to fix that anyways.

Offline

#39 2017-08-01 15:18:27

zstadler
Member
Registered: 2012-05-05
Posts: 269
Website

Re: Automated edits for name tags

In that scenario, there was no error in the name:he tag, and the bot created one.
On the other hand, the human work needed for a fix is doubled by a bot.
As a person who spend a significant time in manually fixing errors, I would like the bots to "do no evil", rather than spreading it.

Offline

#40 2017-08-01 15:26:01

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

The bot does not introduce additional fixing effort. In the case of "Herzl הרצל", you had to fix both "name" and "name:he" regardless of the bot's work. There were two errors (missing name:he, wrong name), and there remained two errors.

(In fact, I argue the bot makes this slightly easier, because you don't have to type in "name:he" in JOSM and you just edit the value)

On the other hand the bot alleviates effort by fixing many cases (like "ksp מחשבים").

Last edited by SwiftFast (2017-08-01 15:33:39)

Offline

#41 2017-08-01 15:32:08

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

Regarding autofixes (fixing cases where name,name:he exist but mismatch, by tracing history) Yes, a minority of nodes that had 1 error would have 2. But most nodes that had 1 error would have 0. The net result is less man work.

(autofixes are not enabled but the code works)

Last edited by SwiftFast (2017-08-01 15:37:39)

Offline

#42 2017-08-01 15:34:38

zstadler
Member
Registered: 2012-05-05
Posts: 269
Website

Re: Automated edits for name tags

Here are a few ideas for safe corrections that a bot can do and save manual work:

- Remove leading spaces, trailing spaces, and multiple spaces from the "name" tag and all "name:*" tags
- Add "outer" role on members of "type=polygon" and "type=boundary" relations

Offline

#43 2017-08-01 15:43:58

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

Remove leading spaces, trailing spaces, and multiple spaces from the "name" tag and all "name:*" tags

The bot supports this (and does it opportunistically), I'll just need to fetch all of the bad names with an overpass regex to do it for all nodes.

Add "outer" role on members of "type=polygon" and "type=boundary" relations

Out of scope for this script, but that's a good idea for a different script.

I'm all for saving manual work, and my point is that the "all characters in the name tag are either Hebrew, digits, English, space, or symbol characters, with at least 1 Hebrew character" and the "autofixes" have a net result of vast work saving for the most nodes, despite copying an error in a minority of nodes.

Offline

#44 2017-08-01 16:25:18

zstadler
Member
Registered: 2012-05-05
Posts: 269
Website

Re: Automated edits for name tags

SwiftFast wrote:

    Remove leading spaces, trailing spaces, and multiple spaces from the "name" tag and all "name:*" tags

The bot supports this (and does it opportunistically), I'll just need to fetch all of the bad names with an overpass regex to do it for all nodes.

Not just for nodes. Ways and relations too.

http://overpass-turbo.eu/s/qCx

Offline

#45 2017-08-01 16:28:16

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

It supports this for all primtiives. I didn't mean "nodes". Sorry.

Offline

#46 2017-08-02 08:25:01

Sanniu
Member
Registered: 2017-04-19
Posts: 21

Re: Automated edits for name tags

I've found that many fixes done mainly by Moveit team are applied only to name tag, leaving name:* unchanged - that leads to mismatch between English and Hebrew street names. Thanks for the old name:he tag these problems easily detectable on KeepRight, but if you'll run you bot all such problems will be wallpapered...

Not connected to this, I was thinking about how to detect name:he/name:en tag mismatches. May be table can be created for streets that have same name:he, but different name:en in different cities or street parts?

Offline

#47 2017-08-02 08:58:08

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

I've found that many fixes done mainly by Moveit team are applied only to name tag, leaving name:* unchanged - that leads to mismatch between English and Hebrew street names. Thanks for the old name:he tag these problems easily detectable on KeepRight, but if you'll run you bot all such problems will be wallpapered...

That's a good point. I'll keep autofixes off.

Not connected to this, I was thinking about how to detect name:he/name:en tag mismatches. May be table can be created for streets that have same name:he, but different name:en in different cities or street parts?

I've been thinking of an algorithm that compares a Hebrew and an English string and decides if they're likely the same. No table needed. It's not tested yet: "Normalize" Hebrew and English, then compare. Normalization roughly as follows:

  • We start with a Hebrew an an English string and apply these:

  • Remove all vowels (u,a, ו, etc)

  • Lowercase everything

  • Convert all Hebrew characters to English (א > A, ב > B) and so on.

  • Normalize problematic / phonetically similar characters, (e.g. b,v,ב all become b)

  • Normalize the remaining problem characters like צ which may translate to ts or tz etc. (This requires real world testing)

Now compare the strings. If they're not identical, mark the node as suspicious. False positives will help me refine this.

Last edited by SwiftFast (2017-08-02 08:59:39)

Offline

#48 2017-08-02 09:31:15

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

The above will mark all non-transliterations. False positives -.-

Offline

#49 2017-08-05 08:13:18

SwiftFast
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

Not connected to this, I was thinking about how to detect name:he/name:en tag mismatches. May be table can be created for streets that have same name:he, but different name:en in different cities or street parts?

In theory, this can be done by scanning the history of all nodes: If name:he changes without name:en changing or vice-versa, mark the node as potentially bad. Optionally, unmark the nodes that seem to properly transliterate using the algorithm above to clear some noise.

The first run would be expensive, but later runs only need to inspect the deltas.

Offline

#50 2017-10-10 11:22:05

SafwatHalaby
Member
Registered: 2017-04-10
Posts: 311
Website

Re: Automated edits for name tags

Regarding auto-fixing name & name:lang mismatches, I'll implement this compromise instead:

Rather than auto fixing, the bot will create a log of suggestions. A log entry will be roughly as follows.

id: <node id>, name: X, name:he: Y, It is suggested to change both names to <X or Y, depending on which is more recent>

Offline

Board footer

Powered by FluxBB