If you’ve got a plan that you’re following it would be great if you can share that with the OSM community (somewhere that people will read like the “talk” mailing list would be good), so that people know that the current wikidata matches you’re adding are just a “rough draught” that will be checked properly later, and not to waste time checking the initial wikidata added.
@nyuriks thats a great overview of how bots should work, the Wikimedia community have definitely leveraged bots much better than in OSM. Its not clear if the wiki listing of OSM bots is updated and it feels like there is a general feeling of the project being against automated edits, rather than encouraging making sensible automated edit.
Having been contributing to the map for the last 10 years, I think a lot of the negativity towards automated edits is that mapping is pretty challenging as it is even for a human, and the chances of machine being better at interpreting the physical world is not much higher. In OSM we’re missing a critical feedback loop of being able to 'improve` the map, and then finding out from ground truthing wether it was really improved.
Maybe along with any attempt at automated or mass editing the map, should be a well documented strategy and tools for the community validation for the data. The OSM history tab is woefully inadequate for the community to keep a track of what has changed, and almost always large edits like this one raises suspicion, is eventually stalled and results in endless discussion.
What about using the dev server to push large changes and act as a staging area to check if a bot or mass edit is giving the desired result? Lets create an environment that encourages more experimentation for map improvement and free up more human time for ground truthing and validation.
I read over Wikipedia’s bot policy and it’s a great mature resource which addresses a bunch of the concerns here such as starting small & slow, doing trial periods with community feedback, good documentation, addressing specific concerns from others, etc. It’s pretty similar to what is already in the Automated Edits code of conduct and import guidelines just not quite as relevant to OSM.
I agree that if the barrier to entry is too high and there is too much discussion nothing gets done. The problem right now is getting everyone onboard with with the values mentioned above.
@LogicalViolinist and @DenisCarriere, I noticed you guys have gone ahead once again while there are still unresolved problems with your mass-editing. It seems pretty clear from your instructions that you are manually (re)importing offline data. If you’re edits are reverted again, could you please address some of the concerns? It would truly be faster and more productive for everyone to get along, discuss and resolve the concerns.
Here are some concerns I have:
make a more meaningful changeset description with a link to its wikipage. “Add #wikidata to #Africa places” suggests a project of some kind but doesn’t say where. When my edits were deleted, I had to do a bunch of research to find out what the #Ottawa project was all about.
initially run it in places of which you are familiar and edit often so it will be easier to spot mistakes. I feel like Africa should be the last place it should be applied. Maybe
[/*]
[*]fix the rounding error in the scripts (@aseerel4c26 already pointed this out). I should be fixed to make it compatible with OSM even at that 1cm precision. The objective was to add wikidata ID, not modify the position.[/*]
[*]that SPARQL query has an error [in it](https://github.com/DenisCarriere/geocoder-geojson/blob/f92983adadf21bcec70e39a2eaeef73f74873431/providers/wikidata.ts#L93). It doesn't sort by ascending because the variable passed is incorrect. Wikidata returns the unsorted data anyways without an error which could be problematic downstream.[/*]
[*]I think the radius should be larger to better detect potential errors in either dataset. If it's too small and there happens to be duplicates, or maybe a neighbourhood and city by the same name in close proximity just outside the radius, it will go undetected when really that should be flagged for closer inspection.[/*]
[*]I'm also curious to know where is the rest of the procedure? I noticed there has been references to "extra instructions" which I can't seem to find.
[list=a]
As to adding wikidata tags in Africa, as DevonF points out this is far away from where you have local knowledge and can verify your edits. It is even more dicey than usual because Africa is full of bad place data mainly from (again mostly undiscussed and unvetted) HOT related imports, adding more stuff on top is not going to help.
Far from stale; it’s resulted in a good deal of ongoing, policy-complaint work which has engendered support from local mappers in a number of parts of the globe.
Indeed. “GNS” seems to be the main culprit for “unlikely positional data”, such as https://www.openstreetmap.org/node/2229102764 which apparently has a latitude of “-1”. I don’t doubt that there’s something around there (the Bing imagery suggests habitation to the west) but a latitude of “-1” just means “clearly this data is rubbish”
To: all please stick to the topic. No personal attacks. No mention of race or nationality. No generalisations. Thank you
Please spend your time to answer questions instead of fighting one another.
Every OSM feature with a Wikidata tag was extracted and converted to a centroid. This was looked up with the Wikidata location and the circle styling is based on the distance. Big red circles are matches over 10km apart.
The styling needs some more tweaking since most of the large distance mismatches are on area features like districts, where the locations would indeed vary. Hoping this helps both communities fix data on the respective projects. Code: https://github.com/osmlab/wikidata-osm/
@pigsonthewing it looked stale because it was still written as a proposal from years ago and I can’t seem to find any bot/tool associated with it on that page. Maybe that wiki page needs updating?
@PlaneMad cool map! Interesting to pick out some of the mistakes which have propagated. For example check out the town of Chesterville ON. The place node. @DenisCarriere added the wikipedia page which is legit. Then recently @LogicalViolinist added the wikidata ID but clearly whatever tool he used didn’t notice that the Chesterville wikipedia was a redirect to North Dundas and so added that wikidata ID instead. And so now it’s obvious that Mapbox prefers to use names based on the wikidata id, not the OSM database since now there are two North Dundas.
I’m doing it almost the same way (inspired by nyuriks ) but I find iD much faster for manual updates.
Actually, I think the automatic updates (i.e. matching wikidata ID to existing wikipedia link) should be done by some bot regurarly, so that only objects requiring manual actions are left. In Poland there are a few new wikipedia links missing wikidata ID every day, also in other countries I could see nyuriks has been before me, but new wikipedia links have been added since.
But they didn’t. WE didn’t, should I say. Write. Whatever.
The mismatches were there before, just nobody caught it earlier until wikidata IDs were added. Don’t look at it as creating mismatches, but as noticing them. Mass adding wikidata IDs is preparing grounds for cleaning wikipedia issues, like nyuriks’ list of disambiguation pages. We would be better off if some bot did it on daily basis, it’s a mechanical job really. Only what is left after this mechanical job, requires mappers’ attention. This includes:
correcting wikipedia titlesa that have changed since copying them to OSM
Finding incorrect links to wikipedia
Creating relations for rivers, highways and others(*)
…
I think all these tasks are easier when we have wikidata IDs.
(*)I have placed GitHub issue to reveal that an object belongs to relation that has a wikipedia link and/or website defined so that users won’t try to add wikipedia link to every member of relation - please back me up there if you think it makes sense
In this case they weren’t. What tends to happen is something like:
o OSM has an object for a village and an admin entity
o An OSM user adds a wikipedia tag to the admin entity. The wikipedia entry describes itself as covering both the village and the admin entity, so that’s OK.
o A wikipedian writes a bot that creates a wikidata item from the wikipedia article. The bot creates wikidata entries for villages, not admin entities. That’s not entirely wrong, because the wikipedia article actually covers both.
o A different wikipedian spots that there is an OSM admin entity and a wikidata item with the same name in a similar location and links them via a wikidata tag. This results in the wrong OSM entity being linked to a wikidata item.
That’s not exactly the case here. What happens now is: there are two OSM objects with the same Wikipedia link, so they get the same wikidata IDs. At least if we are still talking about the semiautomated adding of wikidata IDs that nyuriks and I do.
I think that the bottom line is that if you’re adding a wikidata link to OSM you have to check that the wikidata article actually applies to the OSM object - you can’t rely on what’s happened between wikipedia and wikidata to ensure that.
And that’s what we do for every link that does not get wikidata ID on batch run.
Still, I think it’s better to do this batch run as the majority of wikipedia links is correct and then catch doubled wikidata IDs than do all this job manually link by link.
One thing worries me, cause I may not understand wikidata correctly: Is it all right to have two wikidata entries pointing to the same wikipedia article?
I think that is possible. It is possible to describe a group of objects in Wikipedia (e.g. a museum and the paintings in it, while the painting might already be a separate Wikidata item.