broken wikipedia tags

I made a small tool to detect broken wikipedia tags. I was done on request of person from Polish OSM community, but it obviously may check also data in Germany.

Results of small test run are at https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/Bremen.html - it contains list of OSM elements with obvious problems.

If somebody is interested in using this data I may generate such lists for other locations (and improve how listing of broken OSM elements looks, currently it is extremely ugly).

(I am posting it here as from my experience notes opened in Germany are fixed faster than in any other location - and it seems that Germany has plenty of people that both care about OSM data quality and are capable of maintaining it),

I don’t see why they should be broken? Can you explain?
http://wiki.openstreetmap.org/wiki/Key:wikipedia

“Broken” means: Link leads to message in wikipedia “page does not exist” (Diese Seite existiert nicht),
i.e. https://de.wikipedia.org/wiki/Agamemnon%20(Skulptur%20Bremen) should be
https://de.wikipedia.org/wiki/Agamemnon%20(Skulptur,%20Bremen) or (… Skulptur,_Bremen)

Description should be better now. Thanks for mentioning the problem!

I’m thinking about creating a maproulette challenge for each city, correcting wikipedia links and maybe adding wikidata IDs manually in the same step. Local knowledge is helpful but not necessary in this situation. Any thoughts or annotations from other mappers?

For an overpass query it would be better to have the OSM tag value in a separate row to extract it.

I can produce more reports, also in yaml format that should be easier to parse for programs. Or are you planning to write your own detector of wikipedia tag problems?


for adding wikidata there is already https://osm.wikidata.link/

Hilft dass? (Schriftart? - Änderungen?)

https://de.wikipedia.org/wiki/Liste_der_Kulturdenkmäler_in_Bremen-Schwachhausen

… /Liste_der_Kulturdenkm**%C3%A4**ler_in_Schwachhausen

https://de.wikipedia.org/wiki/Liste_der_Kulturdenkm%C3%A4ler_in_Schwachhausen

https://de.wikipedia.org/wiki/Agamemnon_(Skulptur_Bremen)

https://de.wikipedia.org/w/index.php?title=Agamemnon_(Skulptur%2C_Bremen)&type=revision&diff=164269986&oldid=146847727

… /Agamemnon_(Skulptur**,**_Bremen) - Komma!

The problem of the most reported items was that the wikipedia link has changed (renamed). So maybe adding wikidata in the same step would be a good thing. The ID doesn’t change very often :slight_smile:

I don’t want to invest too much time I’ll not have. So creating a single solution for each problem is not in my interest. Creating endless lists in cloud storage for processing by a group of mappers is also not a successful practice.

I would prefer using some established tools.
Maproulette can use Overpass API or GeoJSON files, so that would be a good output format or format your output should be converted to.
In long term an integration into osmose or alike tools would be great if you don’t want to create a new webpage.

Thanks for the link, I didn’t know that tool!

Adding wikidata tags based on wikipedia tags sounds like task for a bot, not humans.

I know, but I was unsure is there a better solution.

Is Maproulette used mainly by people who make tasks for themself or are there are also people looking for things to fix there?

http://maproulette.org/ui/metrics seems to indicate rather low activity, but I may be interpreting it poorly. Generating maproulette task should not be too complicated, maybe I will just check how it will work…

Thanks for the tool! One feature that I like is that it helps with finding wikipedia links that aren’t about the object themselves and should really be changed to something like subject:wikipedia. :slight_smile: I agree with others here that integrating this feature into existing tools like Keepright or Maproulette would probably be the best way to reach users.

Can you explain this rule a bit? I’m not entirely sure what “unwanted language” means. Does it check if the article being linked is the one in the locally spoken language, perhaps?

Yeah, this tool started as “show interesting locations in region” project. Unfortunately, many, many entries were false positives like supermarket with wikipedia tag (wikipedia=pl:Lidl).

Yes, now I am making test Maproulette challenge. Nice thing is that this may be generated for any location and Maproulette has many countries without any tasks.

In this case it checks whatever page is linked to page in wikipedia other than German, with matching page in German wikipedia.

It is not checking local language (single specified language code is used, in this case de), though it is attempting to check location (using wikidata so it may work poorly).

So some objects in Germany links to “en:Kaufhaus des Westens” despite that “de:Kaufhaus des Westens” (marked as the same topic on wikipedia) is available. AFAIK in that situation linking to “de:Kaufhaus des Westens” is preferable.

This rule was added as some low-quality imports of wikipedia tags added for many villages in Poland links to English wikipedia - despite that articles existed also in local language.

Is “German wikipedia is preferable if that is possible” rule followed in Germany? Though maybe “unwanted language” is too strong - maybe something like “unexpected language” should be used?

Looking forward to it!

In my experience, German Wikipedia links tend to be preferred, yes. As there was no big event like the import you mentioned, the topic hasn’t been discussed a lot. But now that I understand what the tool is checking for, it makes sense to me.

Not sure how to best word it, but “unwanted” might be a bit harsh – especially if it’s a false positive (like with local minority languages).

btw: osmose has already some wikipedia checks.
but as i know, osmose has only data(quality) checks, and for this one you need a (http) call to wikipedia and check the http status code … right?

Yes, I am checking whatever Wikipedia page exists.

I am also calling Wikidata to check whatever links lead to disambig page or other type of page indicating problems.

R0bst3r created http://maproulette.org/map/2742 (thanks!)

I am planning to make more (especially in regions without any maproulette tasks) but I am waiting for maproulette maintainer(s) to respond to https://github.com/maproulette/maproulette2/issues/335