Automated edit proposal - convert non-standard dashes to standard dashes

Looks a good idea to me! Does someone have more knowledge than me in parsing this specific data in Overpass?

This could be a starting point:

In keeping with the mood in this thread, you could add all the other quotation mark characters as alternatives to the ASCII double straight quotation mark. :smile:

3 Likes

Something like this I think: nwr[opening_hours!~"[\"”„“]"] (I don’t have them all on my keyboard)

2 Likes

This doesn’t exclude tags such as opening_hours=opens on fridays of course

Just now I could touch the data. I’ve manually checked these 37+44 elements, through MapRoulette, and I “fixed” the dashes (also fixed OH where I could, although many complex stuff I didn’t touch).

MapRoulette 1
MapRoulette 2

After checking all these elements, I can safely affirm that an automatic change would not negatively affect the data at all. Of course, as a first iteration, it was good to be more conservative, but in the future, an automatic edit can be used even for these more specific cases.

Now, next step is to write the wiki page with the proposal. When I have some free time again I’ll work on that and let you all know.

4 Likes

Initially I just want to work with opening_hours tag, and I don’t want to mess with those other dashes in other tags (eg. name). Since I was having trouble with regexp and text editors, I lazily came up (AKA ChatGPT) with a very simple Python code:

Made some manual checks and seems to work. This also includes all dashes mentioned in a Wikipedia page mentioned here, so it’s an improvement.

I uploaded a changeset covering just Brazil, and you can find the wiki page here.

Anything else I should consider?

Out of curiosity, how many of the non-standard dashes are just regular time interval separators?

/\d\d:\d\d\s?NONSTANDARD\s?/\d\d:\d\d/

(or equivalently, weekday intervals)

If that’s the wast majority of the cases, it’s perhaps worth it to just deal with those “simple” cases mechanically, and review everything else.

Not sure if I understood properly but using a regex online service + latest output from Overpass, couldn’t find any match. Did you take a look as well?

10 days since last message, so I performed the edits.

You can find all here (check latest edits from my user): Changesets by matheusgomesms-import | OpenStreetMap

It was a good exercise, was an easy fix that I think it will be valid to correct many POIs. Obviously there are many things still to be fixed, so a MapRoulette task can be used (local language knowledge required, though).

Also, what stood out was that in Japan there were some OH in wrong format due to different charset used there. A more focused task could also be done there regarding this (updating numbers chars and colons, for example).

I intend to perform this maybe every 6 months or once a year, let’s see in 6 months how’s the situation in OSM.

6 Likes

Here are some numbers on opening_hours (plus collection_times etc.). The data should be quite recent, not sure if your change is included in the numbers, though.

  • there is a total of 3502562 opening hours strings in OSM (not including opening hours strings within conditional strings)

  • 96.10% are considered validÂč

  • 2.99% are invalid but can usually be unambiguously parsed by a lenient parser

of the latter, here is a list (the number in front is the number of times that unique string appears in OSM):

https://raw.githubusercontent.com/westnordost/osm-opening-hours/master/src/jvmTest/resources/invalid_but_unambiguous_opening_hours.tsv


Âč what exactly is considered valid differs a bit from parser to parser. E.g. the “reference implementation” parser does not understand everything that is in the spec, while understanding other constructs that are not in the spec. In this case, considered valid by my own osm-opening-hours parser

4 Likes

FWIW, StreetComplete considers opening hours that can be unambiguously parsed but are invalid according to the spec as immediately due for re-survey, i.e. an opening hours quest is created. (And completely invalid opening hours strings anyway.)

When the user then acknowledges that the displayed times are still correct or edits the opening hours, a valid opening_hours in canonical form is saved.

In general, the app asks if any opening hours are still correct once every year. This only works if either the shop hasn’t been edited for at least one year or a check_date:opening_hours with a date that is more than one year old has been set. I.e. the app won’t ask if the opening hours are still correct for most shops whose opening hours syntax you corrected just now for another year now.

3 Likes