wikidata tag added to thousands of place nodes. Automated mass edit?

For info I’m in discussion with another wikidata added who has added a couple of questionable links locally to me.

The same problem crops up in all sorts of areas in OSM - imagine if a number of people carefully add roads to an inaccesible area from all available sources including aerial imagery, and then someone comes along and adds lots more “roads” along every straight line they can see (fences, hedges, etc.) - it devalues the work of the all the people who were careful adding data in the first place.

Personally, I’d only add wikidata tags of things that I was familiar with or had some understanding of. Anything else is guesswork.

Indeed - it’s reasonable to assume that a lot of wikidata positional information comes from Google. In the problematical examples near me it doesn’t look like that’s a factor (people are just searching for something that “might match” by name) but I’m concerned that processes such as https://github.com/mapbox/mapping/issues/242#issuecomment-261457939 explicitly do use wikidata positional information.

It’d be a shame if all wikidata information had to be redacted from OSM simply because a few people were careless.

Ok just to confirm that in no way is geographocal coordinates being transferred to OSM.

Hi Denis,
please don’t take offense so easily, just because the sheer number and speed of your changes prompted someone to ask for a break for a closer look by more people first. I’m not someone who could judge that stuff myself, but the difference between your more-or-less manual edits and “normal” manual edits still is the pure number! Numbers do make a difference for the level of caution, regardless the methods. Because the faster and the more mass changes, the higher the chance of mass-accidents too, and then it would be too late. For this reason alone already, I sure am glad if “mass-number” edits of any sort are treated with extra caution. So please don’t take asking for a short break in high-speed-mass-editing as offense and negative judgement already, it’s just caution. And it may very well be confirmed that your edits are great and fantastic and everyone is glad about it.

The only thing that is being added to OSM, is the ID of the wikidata which is available under public domain: https://creativecommons.org/publicdomain/zero/1.0/ No new information/location are being added to osm based upon wikidata/wikipedia as stated above that would be a risky move to take.

tl;dr :
The foreign key to wikidata is the only thing being added to osm, which in itself is NOT copyrighted and is in the public domain.

Ignoring the fact that “public domain” isn’t really a legal status in England and Wales (where OSM is based) and ignores database rights, the argument that’s been put forward is that the use of wikidata location in matching clearly is using that data to contribute to OSM.

I have to say where I have looked at local wikidata contributions near me (not from you) there’s no evidence of proximity-based matching (indeed some of the matching errors would suggest that the local matching I’m looking at is based on name only). However you seem to be using a different mechanism (see https://github.com/mapbox/mapping/issues/242#issuecomment-261457939 ).

Whether it’s OK to use Google-derived data in this way is something that as a community we’d need to discuss. In the context of the UK it has been discussed on the “imports” list (see https://lists.openstreetmap.org/pipermail/imports/2016-March/thread.html#4342 ), and other communities may have had similar discussions that I’m unaware of. I’m not a lawyer and am unaware of any English and Welsh case law that could argue either for or against these concerns being valid, but do think that additions such as this should be discussed within the wider community - in the UK some communities were in favour of importing wikidata (which occurred) and some were against.

Of course, many people have added wikidata links implicitly (via iD) or explicitly with local knowledge - no-one’s complaining about that process because the wikidata location isn’t being used in the match; local knowledge is.

I’m going to play devil’s advocate:

Let’s say someone knows Berlin’s name and general location
So what we know is:
Name: Berlin
Location: Somewhere in east Germany, near poland kind of

If that user looked up “Berlin”, well guess what he might be using google derived data. Someone could have created the wikipage and wikidata based entirely off google’s location and name service. So with your logic being applied to the name as well you could never match any data from wikidata or wikipedia as it might have been created with google maps/bing maps(lol) data and visually confirming on the screen is still using said derived data.

There’s a difference between using knowledge to look something up versus blatantly copying data to create something knew. I will now take OSM’s example of looking stuff up:
What we know:
Name: Berlin
Location: 52.5170365, 13.3888599
So if we take the same steps as looking up the city, it comes down to the same thing as “using local knowledge”. OSM data has been created and shared by local mappers who are sharing said knowledge that Berlin is indeed located in East Germany.

To my knowledge Google cannot copyright gps locations of locations when the source of the gps location is local knowledge(OSM Editors). Again no data is being derived from wikidata, only the link or ID to the offsite database is being added.

It would be as if you had to look up a website of say Berlin Cathedral Church. So you launch Google, lookup “Berlin Cathedral Church” and find the website: http://www.berlinerdom.de/index.php and decide to add it to OSM. At that point are you deriving data from Google? No because it’s factual and verifiable information.

p.s. WHAT I DO NOT SUPPORT: is using wikidata to create places/nodes in OSM as those may be derived from copyrighted materials.

So just to confirm - you are just using “name and local knowledge” searching and you are not using “location and proximity” searching (which was described in https://github.com/mapbox/mapping/issues/242#issuecomment-261457939 )?

With regard to statements such as “Google cannot copyright gps locations” I suspect you need to read up a little bit about England and Wales law (as opposed to others). There’s lots of prior discussion from around the time of the licence change; wikipedia’s starter page on the EU Database Directive is https://en.wikipedia.org/wiki/Database_Directive .

I was not using proximity(geolocation) to look up IDs for this. Most(I say most as some was from my geography lessons. Would that count as copyrighted material?) of the information came from the name and the osm node location (i.e. Berlin is in East Germany, in it’s own city-state, but most importantly in the middle-ish of Brandenburg state, same as I know London is South-Est UK and not a city near Toronto, Ontario, Canada)

I’ve been to Germany many times during my employment with my previous employer, so I know Germany well.

Thanks - good to know.

LogicalViolinist or DenisCarriere could you clarify how this tool is being used? Looking at the changesets, there are hundreds of changes being made per second, more than could possibly be done entirely manually in that time or even just reviewed. Are files somehow prepared with more thorough review and then just uploaded in rapid sequence?

There’s a real problem with uploading thousands and thousands of changes without explaining much about how you are doing it (I think I’ve read most of the explanations and am too thick to have actually figured out how I would do it so quickly myself) and then just saying that someone who wants to question the changes has to review them all. By the time they have done any review, you will have uploaded thousands more changes. This is part of the reason for the policy on automated edits. Not to make life difficult for people with good ideas, but to make sure that other people have a chance to understand what is being done prior to the work being carried out, to make sure that poor edits can be improved and so on.

same going on for thousands of relations:

e.g. https://www.openstreetmap.org/changeset/43751676

regards
walter

I read a post on talk-ca in which someone from Canada suggested adding wikidata tags in Canada, via a task manager, and I thought well that sounds like a measured approach. Seeing people adding automatically matched wikidata tags world-wide without prior discussion certainly is something else. Apparently I misread “Join for a more data rich Canada”… I’ll revert these mechanical edits but archive them so that if in the future a decision about mass-adding these tags should be reached, the work was not for nothing.

@woodpeck on what basis are you reverting this?

Last time you reverted Ottawa, you deleted buildings, places, POIs. You’re workflows are very destructive to OSM and someone should be monitoring your reverting process because it’s very poor.

OSM Community, please validate @woodpeck reverts since he will actually be deleting data instead of reverting.

We have many examples of @woodpeck poor revert workflow.

Thanks woodpeck for once again reverting and blocking me. This was manually done over the past couple of weeks (I was preparing data offline). For once I do stuff manually and you want to revert progress to the map, once again proving to the world how the [edit by moderator] some people hate improvements made to the map.

P.s.are you ever going to fix your horrible revert in Ottawa?\

[Edited by moderator: I changed the wording to “some people” as I got complaints about a racism for the original post. Since I found the rest of the posting relevant, I decided to keep it]

@woodpeck_repair You call this a mass edit?? http://www.openstreetmap.org/changeset/43673242 There’s one edit which was clearly done manually.

http://osm.mapki.com/history/way.php?id=211202482

You reverted 15 items???!! What’s wrong with you? You are actively trying to prevent people from editing OSM.

https://www.openstreetmap.org/changeset/43780386

I want the the users following this thread to monitor & comment on your reverts.

https://www.openstreetmap.org/user/woodpeck_repair/history

@LogicalViolinist: If I understand the post of DenisCarriere here correctly he pretends to be the same person as LogicalViolinist … and just continues with those edits (yes, apparently at lower speed): https://www.openstreetmap.org/user/DenisCarriere/history . Additionally you were doing more edits under user LogicalViolinist. Really? :frowning: Can’t you wait wait for the end of discussion – and if it has a positive result continue then? (although you never should have started without a discussion)

It would be useful if you could answer the open questions. Did I miss your answer?

@all thank you!

Denis and I are two different people thank you.

Walter, I just read the discussion, and it appears to be a very different concern. I simply convert existing Wikipedia links into corresponding Wikidata IDs. I don’t do any kind of location lookup, nor do I use any kind of an automated script - only JOSM with Wikipedia plugin. My workflow:

  • download relations for an area of my personal interest to JOSM using this query:

[out:xml][timeout:50];
(
    relation["wikipedia"]["wikidata"!~".*"]["boundary"]({{bbox}});
);
out meta; >; out meta qt;

  • Use “Fetch Wikidata IDs” command in JOSM Wikipedia plugin
  • Resolve any Wikipedia links that were not found in Wikidata (most of it could be auto-fixed with this feature request).
  • upload

I have observed thousands of stale Wikipedia links, which is clearly a problem - the articles are very often renamed (usually leaving a redirect page behind), or worse - become a disambiguation page to multiple meanings of the title or deleted. Wikidata tags are much more permanent and reliable.

My edits simply lock the existing wikipedia links in place, preventing them from becoming stale. In some cases, @SomeoneElse discovered that two relations use the same WP link, e.g. rel 195384 and rel 88077 both link to Derbyshire (WP). Adding Wikidata ID Q23098 to both seems to be a sensible first start, but upon closer inspection it seems it would be better to use Q11775003 for administrative non-metropolitan county. Both are accurate in a way - Q23098 represents the Wikipedia article, so it does apply to both, but from data purity, Q11775003 improves it. In other words, the first edit makes the data better (preventing it from going stale), but editing it further to specify more accurate Wikidata ID makes it better still.

Wikidata IDs are highly useful to Wikipedia maps project, as they allow editors to draw objects by their ID, including directly from Wikidata Query. See examples.

LogicalViolinist, lets try to discuss the issue at hand, without going into a less civilized discussion. I assume the good faith by all participants (Yes, I do come from Wikipedia background), so lets figure out how we all can benefit from the good work we all try to do.