OpenStreetMap Forum

The Free Wiki World Map

You are not logged in.

Announcement

A fix has been applied to the login system for the forums - if you have trouble logging in please contact support@openstreetmap.org with both your forum username and your OpenStreetMap username so we can make sure your accounts are properly linked.

#26 2016-11-19 01:59:19

nyuriks
Member
Registered: 2016-11-18
Posts: 17

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

LogicalViolinist, lets try to discuss the issue at hand, without going into a less civilized discussion. I assume the good faith by all participants (Yes, I do come from Wikipedia background), so lets figure out how we all can benefit from the good work we all try to do.

Offline

#27 2016-11-19 08:43:28

siberiano
Moderator
From: Novosibirsk
Registered: 2010-02-25
Posts: 991
Website

Re: wikidata tag added to thousands of place nodes. Automated mass edit?


Just a note from a moderator in one section:
I guess LogicalViolinist is upset with the discussion, but generalizing is not helping to make any point.

We've recently changed a team of moderators in Russian section in order to cope with a flood of trolling. I've made a big post on what I consider offensive, and I have no conception of racism or sexism in my vocab, they're too vague, but I do watch for one thing: bad discussion starts when someone uses generalization + distances himself from that group + attaches a negative characteristic to those. Intentionally or not, this usually ignites more bad debate which either makes some open-hearted contributors quit, or escalates quickly.

Offline

#28 2016-11-22 04:41:24

DevonF
Member
From: Ottawa, Canada
Registered: 2016-11-21
Posts: 4

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

I would not point blame at Frederik Ramm for poor reverts while leaving out the rest of the story. That import should have been better planned and documented. For example buildings which I edited were better than the ones imported but were replaced anyways. The logistics of the import was not ideal which helped contribute to the loss of data in multiple ways. Not to mention the reason in the first place for the revert was because people felt as though it was too hasty and needed more time to discuss. What has become quite obvious to me after having my manual edits deleted, is the need for much better transparency, discussion, and documentation around scripting and bots. Both the import and the reverts were botched and the results can still be seen. I love the idea of automation since there is so much tedious work that could otherwise be streamlined but there clearly needs to be improvement. It was quite upsetting to find the map missing hours of my time, but luckily a full history exists.

Deciding between labelling editing as automated or manual is really just another Sorites paradox. But one thing still remains very clear: anything which can rapidly modify data has the potential to cause rapid damage. Even a misguided manually editor editing constantly could be just as annoying as a problematic fully automated script (as I've encountered on Wikipedia). Luckily manual editing is slower with more human interaction and because we are intelligent, we can observe and learn as we go. Scripts on the other hand need to be at a much higher level of perfection right from the beginning. Thus the design of a script should be well discussed, proof-read, tested & documented before executing vs the "jump-right-in" approach used in manual editing.

As for wikidata, there have been many people interested in this for many years now including Mapbox. There are 4M+ place objects which would be brutally tedious to add wikidata ids manually. Because this is a global issue and will periodically need updating for new nodes, this should be discussed on a single global wiki page instead of everyone doing doing their own thing and re-inventing the wheel around the world. Here's an example that started in 2013 but seems to have gone stale now. This latest attempt at importing was not even documented since I don't see any reference to it from wikidata. I feel like the best method right now would be to make a script that will largely be agreed upon it. Even if it ends up being so sensitive that only 1/2 the tags get done, then that's still a great start and better than an all-or nothing approach. I'm going to suggest that the best way to initiate this for someone to start a dedicated wiki page, and call out to editors and others such as mapbox for their thoughts on a wikidata ID tag import, then from there devise a first draft, get more feedback, second draft, etc. As far as I can tell the programming part isn't the problem, it's the organization and consensus.

I think using coordinates from Wikipedia/Wikidata should be fine for the script. Unless it is a known fact that a significant proportion from Googles geocoding API is used without their permission, I don't see any point deciding to use them or not based on speculation. I've come across at least several other geo-coding services out there which could have been used. And even then, the coordinates are used strictly to check a proximity and are not being copied into OSM. So I don't see how that would be a copyright issue. And even then if those two arguments don't hold up, one could use Googles geocoder API to generate coordinates and if they match with wikidata then don't include that data. And at what point do coordinates become facts?

Offline

#29 2016-11-22 10:00:56

nyuriks
Member
Registered: 2016-11-18
Posts: 17

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

@DevonF thanks for a thoughtful reply. I have ran many bots in Wikipedia back in 2005-7, totaling ~3 million edits, plus wrote the MediaWiki API to help the bots be more efficient. And I fully understand how important it is to make bots more helpful, and less damaging. There have been many discussions on this topic, and I would like to summarize the general approach to bots at Wikipedia that overall has been a great success.
* It should be easy for many people to write (small) task-specific bots. Without a vibrant bot developer community that can jump on all sorts of small tasks, bots will continue to be a nuisance rather than a helpful force. Don't try to create "one bot to rule them all". Doesn't work. Let the community create tiny task-specific bot code based on a well known bot platform. For the matter at hand, it won't be enough to just have "find wikidata ID based on coordinates" bot - as that mostly works for POIs, but not for ways/relations. Maybe multiple bots are needed - one for churches and one that tag cities (just guessing here). There should be others that would match the outlines of admin levels of a country with wikidata, or those that attempt to sort through the UK's civil parishes vs the ceremonial ones and match wikidata with OSM. But the barrier of entry should be low, or else the bots community will never improve.
* It should be always very easy to communicate with the bot owner, to report issues, and to block (temporarily until the bot owner notices, in case the bot runs amok). For that, bots should run under a different, easy to identify account (e.g. user:YurikBot instead of user:Yurik), and have a "STOP" button.
* Until the bot is approved, it should work at a very slow pace, with each edit verified by its owner. At the same time, unless it is something controversial, it shouldn't take months for the approval.
* Easy mass-reverting of the bad edits is much better than having a tedious approval process. Bots are guaranteed to go crazy sooner or later, and when that happens, it should be easy for the community to block and revert. If its easy to revert, e.g. by using a well known "Revert Bot", bot's damage will be negligible, and it won't cause much aggravation. Bot owner would fix it and re-run it, everyone benefits.
* Incremental improvements are still improvements. We don't have to jump from "nothing" to "everything perfect" in one step. It's ok for the bot to make marginal improvements to tags, and then another bot to use those tags to make further changes.

Lastly, unlike Wikipedia that mostly deals with plain text, OSM is much closer to a combination of the SVG file store and a Wikidata database. Writing a bot for visual objects is hard, while Wikidata db is operated almost entirely by bots. Having different rules for the two, and encouraging at least the database portion might greatly improve OSM quality - tag manipulating bots would catch typos and ensure tag consistency and organization. Humans tend to be very bad at those. Geometries are a totally different beast, and I don't think I am qualified to evaluate how helpful or damaging the bots would be for it.

Offline

#30 2016-11-23 02:39:16

SomeoneElse
Member
Registered: 2010-10-13
Posts: 483

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

@nyuriks , as you're probably already aware OSM has clearly defined rules for bots - see https://wiki.openstreetmap.org/wiki/Aut … of_conduct .

Also by the way you're still mismatching wikidata links - http://www.openstreetmap.org/changeset/43883335 shows just one example, but there may be many more.

Offline

#31 2016-11-23 08:09:51

nyuriks
Member
Registered: 2016-11-18
Posts: 17

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

@SomeoneElse, thanks for the link. My post was more of a general thoughts on the topic rather than a new policy. I know that some members of the OSM community don't like bots, while others are all for it, so I decided to share my own experience.

I am working through all the wikidata IDs for admin-level 1-6, made many manual corrections, and will further clean it up with the second pass when i start admin-tree matching them against the current wikidata structure.

Offline

#32 2016-11-23 09:25:37

SomeoneElse
Member
Registered: 2010-10-13
Posts: 483

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

nyuriks wrote:

I am working through all the wikidata IDs for admin-level 1-6, made many manual corrections, and will further clean it up with the second pass when i start admin-tree matching them against the current wikidata structure.

If you've got a plan that you're following it would be great if you can share that with the OSM community (somewhere that people will read like the "talk" mailing list would be good), so that people know that the current wikidata matches you're adding are just a "rough draught" that will be checked properly later, and not to waste time checking the initial wikidata added.

Offline

#33 2016-11-23 20:28:32

PlaneMad
Member
Registered: 2008-09-16
Posts: 20

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

@nyuriks thats a great overview of how bots should work, the Wikimedia community have definitely leveraged bots much better than in OSM. Its not clear if the wiki listing of OSM bots is updated and it feels like there is a general feeling of the project being against automated edits, rather than encouraging making sensible automated edit.

Having been contributing to the map for the last 10 years, I think a lot of the negativity towards automated edits is that mapping is pretty challenging as it is even for a human, and the chances of machine being better at interpreting the physical world is not much higher. In OSM we're missing a critical feedback loop of being able to 'improve` the map, and then finding out from ground truthing wether it was really improved.

Maybe along with any attempt at automated or mass editing the map, should be a well documented strategy and tools for the community validation for the data. The OSM history tab is woefully inadequate for the community to keep a track of what has changed, and almost always large edits like this one raises suspicion, is eventually stalled and results in endless discussion.

What about using the dev server to push large changes and act as a staging area to check if a bot or mass edit is giving the desired result? Lets create an environment that encourages more experimentation for map improvement and free up more human time for ground truthing and validation.

Offline

#34 2016-11-24 05:08:35

DevonF
Member
From: Ottawa, Canada
Registered: 2016-11-21
Posts: 4

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

I read over Wikipedia's bot policy and it's a great mature resource which addresses a bunch of the concerns here such as starting small & slow, doing trial periods with community feedback, good documentation, addressing specific concerns from others, etc. It's pretty similar to what is already in the Automated Edits code of conduct and import guidelines just not quite as relevant to OSM.

I agree that if the barrier to entry is too high and there is too much discussion nothing gets done. The problem right now is getting everyone onboard with with the values mentioned above.

@LogicalViolinist and @DenisCarriere, I noticed you guys have gone ahead once again while there are still unresolved problems with your mass-editing. It seems pretty clear from your instructions that you are manually (re)importing offline data. If you're edits are reverted again, could you please address some of the concerns? It would truly be faster and more productive for everyone to get along, discuss and resolve the concerns.
Here are some concerns I have:

  1. follow the code of conduct and import guides code of conduct/import guides. for example:

    1. make a wikipage which includes the procedure and notes on where and when it has been deployed.

    2. Add an entry in the import catalog

    3. make a more meaningful changeset description with a link to its wikipage. "Add #wikidata to #Africa places" suggests a project of some kind but doesn't say where. When my edits were deleted, I had to do a bunch of research to find out what the #Ottawa project was all about.

    4. initially run it in places of which you are familiar and edit often so it will be easier to spot mistakes. I feel like Africa should be the last place it should be applied. Maybe

  2. fix the rounding error in the scripts (@aseerel4c26 already pointed this out). I should be fixed to make it compatible with OSM even at that 1cm precision. The objective was to add wikidata ID, not modify the position.

  3. that SPARQL query has an error in it. It doesn't sort by ascending because the variable passed is incorrect. Wikidata returns the unsorted data anyways without an error which could be problematic downstream.

  4. I think the radius should be larger to better detect potential errors in either dataset. If it's too small and there happens to be duplicates, or maybe a neighbourhood and city by the same name in close proximity just outside the radius, it will go undetected when really that should be flagged for closer inspection.

  5. I'm also curious to know where is the rest of the procedure? I noticed there has been references to "extra instructions" which I can't seem to find.

    1. how are you handling multiple matches?

    2. what if an object already has a wikidata tag?

    3. does that bug matter for this import?

looking forward to helping out :)

Offline

#35 2016-11-24 13:09:00

SimonPoole
Member
Registered: 2010-03-14
Posts: 1,189

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

@DenisCarriere looking at http://www.openstreetmap.org/user/Denis … .55/-16.07 I see you are continuing to do undiscussed mass edits, could you please explain why the few rules that OSM has, do not apply to you?

As to adding wikidata tags in Africa, as DevonF points out this is far away from where you have local knowledge and can verify your edits. It is even more dicey than usual because Africa is full of bad place data mainly from (again mostly undiscussed and unvetted) HOT related imports, adding more stuff on top is not going to help.

Last edited by SimonPoole (2016-11-24 13:45:12)

Offline

#36 2016-11-24 13:32:46

pigsonthewing
Member
From: Birmingham, England
Registered: 2014-04-20
Posts: 4
Website

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

nyuriks wrote:

I have observed thousands of stale Wikipedia links, which is clearly a problem

Wikidata IDs are, of course, much more stable.

Offline

#37 2016-11-24 13:51:23

pigsonthewing
Member
From: Birmingham, England
Registered: 2014-04-20
Posts: 4
Website

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

DevonF wrote:

Here's an example that started in 2013 but seems to have gone stale now.

Far from stale; it's resulted in a good deal of ongoing, policy-complaint work which has engendered support from local mappers in a number of parts of the globe.

Offline

#38 2016-11-24 13:55:56

SomeoneElse
Member
Registered: 2010-10-13
Posts: 483

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

SimonPoole wrote:

because Africa is full of bad place data mainly from (again mostly undiscussed and unvetted) HOT related imports, adding more stuff on top is not going to help.

Indeed.  "GNS" seems to be the main culprit for "unlikely positional data", such as https://www.openstreetmap.org/node/2229102764 which apparently has a latitude of "-1".  I don't doubt that there's something around there (the Bing imagery suggests habitation to the west) but a latitude of "-1" just means "clearly this data is rubbish" smile

Offline

#39 2016-11-25 08:44:51

escada
Moderator
Registered: 2011-08-13
Posts: 983

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

<Moderator Hat On>
To: all please stick to the topic. No personal attacks. No mention of race or nationality. No generalisations. Thank you
Please spend your time to answer questions instead of fighting one another.

I removed a couple of posts and adapted one.
<Moderator Hat Off>

Offline

#40 2016-11-25 14:45:53

PlaneMad
Member
Registered: 2008-09-16
Posts: 20

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

Hey everyone, here is a first pass of a Wikidata tagged OSM feature explorer: https://osmlab.github.io/wikidata-osm/

Every OSM feature with a Wikidata tag was extracted and converted to a centroid. This was looked up with the Wikidata location and the circle styling is based on the distance. Big red circles are matches over 10km apart.

The styling needs some more tweaking since most of the large distance mismatches are on area features like districts, where the locations would indeed vary. Hoping this helps both communities fix data on the respective projects. Code: https://github.com/osmlab/wikidata-osm/

Offline

#41 2016-11-25 17:33:17

escada
Moderator
Registered: 2011-08-13
Posts: 983

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

Thanks a lot for this tool @PlaneMad

Offline

#42 2016-11-27 23:12:15

DevonF
Member
From: Ottawa, Canada
Registered: 2016-11-21
Posts: 4

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

@pigsonthewing it looked stale because it was still written as a proposal from years ago and I can't seem to find any bot/tool associated with it on that page. Maybe that wiki page needs updating?

@PlaneMad cool map! Interesting to pick out some of the mistakes which have propagated. For example check out the town of Chesterville ON. The place node.  @DenisCarriere added the wikipedia page which is legit. Then recently @LogicalViolinist added the wikidata ID but clearly whatever tool he used didn't notice that the Chesterville wikipedia was a redirect to North Dundas and so added that wikidata ID instead. And so now it's obvious that Mapbox prefers to use names based on the wikidata id, not the OSM database since now there are two North Dundas.

Offline

#43 2016-11-28 15:06:30

LogicalViolinist
Member
Registered: 2016-11-18
Posts: 10

Re: wikidata tag added to thousands of place nodes. Automated mass edit?

DevonF wrote:

@pigsonthewing it looked stale because it was still written as a proposal from years ago and I can't seem to find any bot/tool associated with it on that page. Maybe that wiki page needs updating?

@PlaneMad cool map! Interesting to pick out some of the mistakes which have propagated. For example check out the town of Chesterville ON. The place node.  @DenisCarriere added the wikipedia page which is legit. Then recently @LogicalViolinist added the wikidata ID but clearly whatever tool he used didn't notice that the Chesterville wikipedia was a redirect to North Dundas and so added that wikidata ID instead. And so now it's obvious that Mapbox prefers to use names based on the wikidata id, not the OSM database since now there are two North Dundas.

Didnt use a tool see reason for what wiki id I put: http://www.openstreetmap.org/changeset/43544541

Offline

Board footer

Powered by FluxBB