wikidata tag added to thousands of place nodes. Automated mass edit?

Ignoring the fact that “public domain” isn’t really a legal status in England and Wales (where OSM is based) and ignores database rights, the argument that’s been put forward is that the use of wikidata location in matching clearly is using that data to contribute to OSM.

I have to say where I have looked at local wikidata contributions near me (not from you) there’s no evidence of proximity-based matching (indeed some of the matching errors would suggest that the local matching I’m looking at is based on name only). However you seem to be using a different mechanism (see https://github.com/mapbox/mapping/issues/242#issuecomment-261457939 ).

Whether it’s OK to use Google-derived data in this way is something that as a community we’d need to discuss. In the context of the UK it has been discussed on the “imports” list (see https://lists.openstreetmap.org/pipermail/imports/2016-March/thread.html#4342 ), and other communities may have had similar discussions that I’m unaware of. I’m not a lawyer and am unaware of any English and Welsh case law that could argue either for or against these concerns being valid, but do think that additions such as this should be discussed within the wider community - in the UK some communities were in favour of importing wikidata (which occurred) and some were against.

Of course, many people have added wikidata links implicitly (via iD) or explicitly with local knowledge - no-one’s complaining about that process because the wikidata location isn’t being used in the match; local knowledge is.

I’m going to play devil’s advocate:

Let’s say someone knows Berlin’s name and general location
So what we know is:
Name: Berlin
Location: Somewhere in east Germany, near poland kind of

If that user looked up “Berlin”, well guess what he might be using google derived data. Someone could have created the wikipage and wikidata based entirely off google’s location and name service. So with your logic being applied to the name as well you could never match any data from wikidata or wikipedia as it might have been created with google maps/bing maps(lol) data and visually confirming on the screen is still using said derived data.

There’s a difference between using knowledge to look something up versus blatantly copying data to create something knew. I will now take OSM’s example of looking stuff up:
What we know:
Name: Berlin
Location: 52.5170365, 13.3888599
So if we take the same steps as looking up the city, it comes down to the same thing as “using local knowledge”. OSM data has been created and shared by local mappers who are sharing said knowledge that Berlin is indeed located in East Germany.

To my knowledge Google cannot copyright gps locations of locations when the source of the gps location is local knowledge(OSM Editors). Again no data is being derived from wikidata, only the link or ID to the offsite database is being added.

It would be as if you had to look up a website of say Berlin Cathedral Church. So you launch Google, lookup “Berlin Cathedral Church” and find the website: http://www.berlinerdom.de/index.php and decide to add it to OSM. At that point are you deriving data from Google? No because it’s factual and verifiable information.

p.s. WHAT I DO NOT SUPPORT: is using wikidata to create places/nodes in OSM as those may be derived from copyrighted materials.

So just to confirm - you are just using “name and local knowledge” searching and you are not using “location and proximity” searching (which was described in https://github.com/mapbox/mapping/issues/242#issuecomment-261457939 )?

With regard to statements such as “Google cannot copyright gps locations” I suspect you need to read up a little bit about England and Wales law (as opposed to others). There’s lots of prior discussion from around the time of the licence change; wikipedia’s starter page on the EU Database Directive is https://en.wikipedia.org/wiki/Database_Directive .

I was not using proximity(geolocation) to look up IDs for this. Most(I say most as some was from my geography lessons. Would that count as copyrighted material?) of the information came from the name and the osm node location (i.e. Berlin is in East Germany, in it’s own city-state, but most importantly in the middle-ish of Brandenburg state, same as I know London is South-Est UK and not a city near Toronto, Ontario, Canada)

I’ve been to Germany many times during my employment with my previous employer, so I know Germany well.

Thanks - good to know.

LogicalViolinist or DenisCarriere could you clarify how this tool is being used? Looking at the changesets, there are hundreds of changes being made per second, more than could possibly be done entirely manually in that time or even just reviewed. Are files somehow prepared with more thorough review and then just uploaded in rapid sequence?

There’s a real problem with uploading thousands and thousands of changes without explaining much about how you are doing it (I think I’ve read most of the explanations and am too thick to have actually figured out how I would do it so quickly myself) and then just saying that someone who wants to question the changes has to review them all. By the time they have done any review, you will have uploaded thousands more changes. This is part of the reason for the policy on automated edits. Not to make life difficult for people with good ideas, but to make sure that other people have a chance to understand what is being done prior to the work being carried out, to make sure that poor edits can be improved and so on.

same going on for thousands of relations:

e.g. https://www.openstreetmap.org/changeset/43751676

regards
walter

I read a post on talk-ca in which someone from Canada suggested adding wikidata tags in Canada, via a task manager, and I thought well that sounds like a measured approach. Seeing people adding automatically matched wikidata tags world-wide without prior discussion certainly is something else. Apparently I misread “Join for a more data rich Canada”… I’ll revert these mechanical edits but archive them so that if in the future a decision about mass-adding these tags should be reached, the work was not for nothing.

@woodpeck on what basis are you reverting this?

Last time you reverted Ottawa, you deleted buildings, places, POIs. You’re workflows are very destructive to OSM and someone should be monitoring your reverting process because it’s very poor.

OSM Community, please validate @woodpeck reverts since he will actually be deleting data instead of reverting.

We have many examples of @woodpeck poor revert workflow.

Thanks woodpeck for once again reverting and blocking me. This was manually done over the past couple of weeks (I was preparing data offline). For once I do stuff manually and you want to revert progress to the map, once again proving to the world how the [edit by moderator] some people hate improvements made to the map.

P.s.are you ever going to fix your horrible revert in Ottawa?\

[Edited by moderator: I changed the wording to “some people” as I got complaints about a racism for the original post. Since I found the rest of the posting relevant, I decided to keep it]

@woodpeck_repair You call this a mass edit?? http://www.openstreetmap.org/changeset/43673242 There’s one edit which was clearly done manually.

http://osm.mapki.com/history/way.php?id=211202482

You reverted 15 items???!! What’s wrong with you? You are actively trying to prevent people from editing OSM.

https://www.openstreetmap.org/changeset/43780386

I want the the users following this thread to monitor & comment on your reverts.

https://www.openstreetmap.org/user/woodpeck_repair/history

@LogicalViolinist: If I understand the post of DenisCarriere here correctly he pretends to be the same person as LogicalViolinist … and just continues with those edits (yes, apparently at lower speed): https://www.openstreetmap.org/user/DenisCarriere/history . Additionally you were doing more edits under user LogicalViolinist. Really? :frowning: Can’t you wait wait for the end of discussion – and if it has a positive result continue then? (although you never should have started without a discussion)

It would be useful if you could answer the open questions. Did I miss your answer?

@all thank you!

Denis and I are two different people thank you.

Walter, I just read the discussion, and it appears to be a very different concern. I simply convert existing Wikipedia links into corresponding Wikidata IDs. I don’t do any kind of location lookup, nor do I use any kind of an automated script - only JOSM with Wikipedia plugin. My workflow:

  • download relations for an area of my personal interest to JOSM using this query:

[out:xml][timeout:50];
(
    relation["wikipedia"]["wikidata"!~".*"]["boundary"]({{bbox}});
);
out meta; >; out meta qt;

  • Use “Fetch Wikidata IDs” command in JOSM Wikipedia plugin
  • Resolve any Wikipedia links that were not found in Wikidata (most of it could be auto-fixed with this feature request).
  • upload

I have observed thousands of stale Wikipedia links, which is clearly a problem - the articles are very often renamed (usually leaving a redirect page behind), or worse - become a disambiguation page to multiple meanings of the title or deleted. Wikidata tags are much more permanent and reliable.

My edits simply lock the existing wikipedia links in place, preventing them from becoming stale. In some cases, @SomeoneElse discovered that two relations use the same WP link, e.g. rel 195384 and rel 88077 both link to Derbyshire (WP). Adding Wikidata ID Q23098 to both seems to be a sensible first start, but upon closer inspection it seems it would be better to use Q11775003 for administrative non-metropolitan county. Both are accurate in a way - Q23098 represents the Wikipedia article, so it does apply to both, but from data purity, Q11775003 improves it. In other words, the first edit makes the data better (preventing it from going stale), but editing it further to specify more accurate Wikidata ID makes it better still.

Wikidata IDs are highly useful to Wikipedia maps project, as they allow editors to draw objects by their ID, including directly from Wikidata Query. See examples.

LogicalViolinist, lets try to discuss the issue at hand, without going into a less civilized discussion. I assume the good faith by all participants (Yes, I do come from Wikipedia background), so lets figure out how we all can benefit from the good work we all try to do.

[color=#888]
Just a note from a moderator in one section:
I guess LogicalViolinist is upset with the discussion, but generalizing is not helping to make any point.[/color]

[color=#888]We’ve recently changed a team of moderators in Russian section in order to cope with a flood of trolling. I’ve made a big post on what I consider offensive, and I have no conception of racism or sexism in my vocab, they’re too vague, but I do watch for one thing: bad discussion starts when someone uses generalization + distances himself from that group + attaches a negative characteristic to those. Intentionally or not, this usually ignites more bad debate which either makes some open-hearted contributors quit, or escalates quickly.[/color]

I would not point blame at Frederik Ramm for poor reverts while leaving out the rest of the story. That import should have been better planned and documented. For example buildings which I edited were better than the ones imported but were replaced anyways. The logistics of the import was not ideal which helped contribute to the loss of data in multiple ways. Not to mention the reason in the first place for the revert was because people felt as though it was too hasty and needed more time to discuss. What has become quite obvious to me after having my manual edits deleted, is the need for much better transparency, discussion, and documentation around scripting and bots. Both the import and the reverts were botched and the results can still be seen. I love the idea of automation since there is so much tedious work that could otherwise be streamlined but there clearly needs to be improvement. It was quite upsetting to find the map missing hours of my time, but luckily a full history exists.

Deciding between labelling editing as automated or manual is really just another Sorites paradox. But one thing still remains very clear: anything which can rapidly modify data has the potential to cause rapid damage. Even a misguided manually editor editing constantly could be just as annoying as a problematic fully automated script (as I’ve encountered on Wikipedia). Luckily manual editing is slower with more human interaction and because we are intelligent, we can observe and learn as we go. Scripts on the other hand need to be at a much higher level of perfection right from the beginning. Thus the design of a script should be well discussed, proof-read, tested & documented before executing vs the “jump-right-in” approach used in manual editing.

As for wikidata, there have been many people interested in this for many years now including Mapbox. There are 4M+ place objects which would be brutally tedious to add wikidata ids manually. Because this is a global issue and will periodically need updating for new nodes, this should be discussed on a single global wiki page instead of everyone doing doing their own thing and re-inventing the wheel around the world. Here’s an example that started in 2013 but seems to have gone stale now. This latest attempt at importing was not even documented since I don’t see any reference to it from wikidata. I feel like the best method right now would be to make a script that will largely be agreed upon it. Even if it ends up being so sensitive that only 1/2 the tags get done, then that’s still a great start and better than an all-or nothing approach. I’m going to suggest that the best way to initiate this for someone to start a dedicated wiki page, and call out to editors and others such as mapbox for their thoughts on a wikidata ID tag import, then from there devise a first draft, get more feedback, second draft, etc. As far as I can tell the programming part isn’t the problem, it’s the organization and consensus.

I think using coordinates from Wikipedia/Wikidata should be fine for the script. Unless it is a known fact that a significant proportion from Googles geocoding API is used without their permission, I don’t see any point deciding to use them or not based on speculation. I’ve come across at least several other geo-coding services out there which could have been used. And even then, the coordinates are used strictly to check a proximity and are not being copied into OSM. So I don’t see how that would be a copyright issue. And even then if those two arguments don’t hold up, one could use Googles geocoder API to generate coordinates and if they match with wikidata then don’t include that data. And at what point do coordinates become facts?

@DevonF thanks for a thoughtful reply. I have ran many bots in Wikipedia back in 2005-7, totaling ~3 million edits, plus wrote the MediaWiki API to help the bots be more efficient. And I fully understand how important it is to make bots more helpful, and less damaging. There have been many discussions on this topic, and I would like to summarize the general approach to bots at Wikipedia that overall has been a great success.

  • It should be easy for many people to write (small) task-specific bots. Without a vibrant bot developer community that can jump on all sorts of small tasks, bots will continue to be a nuisance rather than a helpful force. Don’t try to create “one bot to rule them all”. Doesn’t work. Let the community create tiny task-specific bot code based on a well known bot platform. For the matter at hand, it won’t be enough to just have “find wikidata ID based on coordinates” bot - as that mostly works for POIs, but not for ways/relations. Maybe multiple bots are needed - one for churches and one that tag cities (just guessing here). There should be others that would match the outlines of admin levels of a country with wikidata, or those that attempt to sort through the UK’s civil parishes vs the ceremonial ones and match wikidata with OSM. But the barrier of entry should be low, or else the bots community will never improve.
  • It should be always very easy to communicate with the bot owner, to report issues, and to block (temporarily until the bot owner notices, in case the bot runs amok). For that, bots should run under a different, easy to identify account (e.g. user:YurikBot instead of user:Yurik), and have a “STOP” button.
  • Until the bot is approved, it should work at a very slow pace, with each edit verified by its owner. At the same time, unless it is something controversial, it shouldn’t take months for the approval.
  • Easy mass-reverting of the bad edits is much better than having a tedious approval process. Bots are guaranteed to go crazy sooner or later, and when that happens, it should be easy for the community to block and revert. If its easy to revert, e.g. by using a well known “Revert Bot”, bot’s damage will be negligible, and it won’t cause much aggravation. Bot owner would fix it and re-run it, everyone benefits.
  • Incremental improvements are still improvements. We don’t have to jump from “nothing” to “everything perfect” in one step. It’s ok for the bot to make marginal improvements to tags, and then another bot to use those tags to make further changes.

Lastly, unlike Wikipedia that mostly deals with plain text, OSM is much closer to a combination of the SVG file store and a Wikidata database. Writing a bot for visual objects is hard, while Wikidata db is operated almost entirely by bots. Having different rules for the two, and encouraging at least the database portion might greatly improve OSM quality - tag manipulating bots would catch typos and ensure tag consistency and organization. Humans tend to be very bad at those. Geometries are a totally different beast, and I don’t think I am qualified to evaluate how helpful or damaging the bots would be for it.

@nyuriks , as you’re probably already aware OSM has clearly defined rules for bots - see https://wiki.openstreetmap.org/wiki/Automated_Edits_code_of_conduct .

Also by the way you’re still mismatching wikidata links - http://www.openstreetmap.org/changeset/43883335 shows just one example, but there may be many more.

@SomeoneElse, thanks for the link. My post was more of a general thoughts on the topic rather than a new policy. I know that some members of the OSM community don’t like bots, while others are all for it, so I decided to share my own experience.

I am working through all the wikidata IDs for admin-level 1-6, made many manual corrections, and will further clean it up with the second pass when i start admin-tree matching them against the current wikidata structure.