Automated incremental Bus Stop (GTFS) updates

This script performs incremental updates as explained in this Wiki page to permanently fix the severely outdated Israel bus stop import. GTFS files are fetched from http://he.mot.gov.il/.

If you’re mapping in Israel, it is important to read the Information For Mappers section at the script’s Wiki page.

The script can probably work in other countries, maybe with some modifications.

Last bus stop update/import: See the changeset history to find out.

Script pages:

I need mapper feedback and review because this will be a massive edit. To makes this easy for everyone, I uploaded a JOSM Session file which has the result of a test run. Anyone can download it and open it with JOSM.

Please check out a place you are familiar with, and tell me if there are bad / good / new / out of date / bus stop edits or let me know if you have any comments.

There are 3 layers in that file:

  • stops_21_10_2017_original.osm- The layers before the test run was ever made. Fetched from overpass at ~22:00 21 dec

  • stops_21_10_2017_script1.osm - The layers after the test run was made, but without running the helper script to eliminate duplicates. The “new” gtfs file was also fetched at about ~22:00 21 dec. (The “old” file was reconstructed from the 2012 import using gtfs_bootstrap.js, but this appears to be practically useless, too many things changed.)

  • stops_21_10_2017_script2.osm - Final result with many duplicates eliminated using spaces.js.

Note that no live runs were made yet. This test is local.

For those unfamiliar with JOSM: You can hide/show the different layers, or make some of them transparent, using the panel on the top right (or ALT+SHIFT+L).

NOTE: The file opens up about 90k nodes. Although it runs smoothly for me, I don’t know if it’s the same for everyone. Make sure your PC is not too old. JOSM seems to consume about 600 - 800 MiB of RAM with all layers open. Close unneeded programs if you don’t have much RAM.

EDIT: session file updated - translations added, and versions now 21 dec

AR, EN translations added! (Session file updated)

I grabbed them from translations.txt as suggested by anonymous_gushdan_mapper

I checked several places, and most of the changes look good.

I did notice one bus stop in the middle of Herzliya’s rail station (right on the platform) which obviously should not be in that location (ref=17034).

Actually, after checking other rail stations, it seems there is such a “bus stop” in the location of many (all?) rail stations (ref=17036, 17042, 17094, etc)!

Also, there’s a stop near Petah-Tikva Kiryat Arie’s rail station (ref=35378), which is about 160m too much south. There is a stop in the original layer, with no ref=*, in the correct location.

Thank you for the feedback.

The gtfs file contains all national railway station refs. The script has been modified and it now ignores them as a temporary solution.

As for the deleted stop: Note that it’s present in the intermediate layer, (…_script1), I’m still trying to optimally tweak the duplicate removal algorithm (_script2), but it is impossible to make it remove duplicates whilst always knowing which one of the two stops has the right position. Even an armchair mapper cannot do that. However, If you ever move that newly added stop to the right position, the bot will not insist on moving it back and all will be good. (It would also generate a message in a log, that perhaps can someday be sent upstream to mot).

Lastly, to minimize similar deletion damage, if that stop had any extra tags whatsoever (shelter / ref / name / etc), it wouldn’t have been removed and would have been marked with a fixme instead.

I think the tradeoffs are optimal that way and that the de-duplication step is necessary. Please let me know if you have improvement suggestions, or if you find anything else that is interesting in the test files.

I’ve published a text log of unsynchronized data where human wisdom is needed and the bot will not automatically edit. For each error, we either need to fix our version, or mot need to fix theirs.

I’ll see if I can easily convert this to a map with points view via Leaflet.

Once we fix our side, we could in principle send the remaining errors back to mot if they’re cooperative.

Edit: the first number is the “ref” value. The last couple of stops have strange ref values like ref=(dan=3937)

Edit 2: Many conflicts are “style/taste” differences (e.g. transliteration schemes or using single quotes or double quotes). I’ll manually edit OSM to match MOT version for cases where it’s a trivial style difference.

I have a fundamental question: Preusmably, whatever name value is in the gtfs is the actual value printed on the stop sign. So, should users really ever change that? Even if the bus stop name is “Road X stop 2”, which is a total mistake because that road was renamed to Y, that stop name is on the ground, written on the sign, making it the true name in accordance with global name tag conventions.

This would mean the gtfs should override the users’ edits in almost all cases in the above log

Presumably, but not necessarily; the MOT database could potentially be out-of-sync with respect to the on-the-ground signage.

I agree that the name of the bus station is whatever is printed its sign (#505 in http://media.mot.gov.il/PDF/HE_TRAFFIC_PLANNING/tamrurim2010.pdf)), even if streets have since then been renamed. In your example, the bus station could be tagged addr:street=Y.

I agree. But I wonder, in practice, how many of the above discrepancies are actually mappers reading mismatching data from stop signs and then changing the stop.

Let’s not have 3 datasets, 2 are complex enough, and let’s just assume that the GTFS name is identical to the physical stop signs. If a mapper ever encounters an exception we’ll handle it individually.

Under that assumption, is there an agreement that GTFS should always override user changes for name tags? (NOT location)

Every override would still generate a log message, to warn the provider of an error if present, and to contact the user if needed.

Scratch that. I’m thinking out loud.

Let’s just keep it as it is and just add a clear guideline: name = what’s on the sign. When a mapper overrides a GTFS name, I contact them asking if this was an armchair guess or a legit survey. If no reply / not a survey, put GTFS back in, otherwise do nothing; it’s a GTFS error (or send it upstream if we ever establish a communication channel with MOT).

Seems fine, as far as I can see.

Thanks!

I’ll be running the script live in a week or so, unless there are objections or reported problems.

Edit: rephrased to reflect newer info.

The update is currently underway. Some manual intervention is required in some cases:

The removal of stops that are part of route relations causes JOSM conflicts that I’m manually resolving. For most routes I am simply removing the stop from the route.

Many routes are possibly out of date. No one is actively maintaining them.

There were other minor conflicts that were trivial to resolve (e.g. user adding stops as members of highways).

Update complete. This is a massive update because it’s the first one. From now on there will be weekly updates that are much smaller.

Changesets:

I’ve come across some duplicate bus stop nodes in Ramat Bet Shemesh.

http://overpass-turbo.eu/s/sLK

I have no idea if this is an isolated case.

Can upload issues cause this? The JOSM upload was interrupted multiple times, and I switched the chunk size in the midst of the upload at some point in an attempt to mitigate interruptions.

No duplicates appear for me when I run the script on a local copy of the older stops.

This is not an isolated case. There are 295 cases. Apparently the first chunk was uploaded twice or something like that. I will try cleaning this up.

Fixed in https://www.openstreetmap.org/changeset/53524487

If someone can explain how and when this occurs, I’d appreciate an explanation to avoid this in the future.