Automated incremental Bus Stop (GTFS) updates

http://www.openstreetmap.org/way/280293501 - here stop glued to the park…

Oh man, I was focused on roads.
I’ve fixed that one. Thanks!
I’ll make a more thorough cleanup this weekend or if I have the time before the weekend.

I am fixing this now. For anyone who has done fixes: Please make sure that after the fix, the stops remains precisely where the bot has moved it, the exact same coordinates.

Done. Did I miss anything?

I’ve done lots of duplicate cleanups. As of now we still have 545 bus stops that do not appear in GTFS, most of them are duplicates with the real GTFS stops meters nearby. Most of them are clustered in Jerusalem, Ber Sheva, and Tel Aviv district. If you are familiar with one of those areas, you can help by cleaning up duplicate stops. Most of them are marked with “fixme=Suspected duplicate stop. Flagged by SafwatHalaby_bot (flag-gtfs1).”

I propose the following tagging changes:

  • Add public_transport=platform, bus=yes, for compatibility with “public transport v2” and for future routes.

  • remove the gtfs:verified tag. Reasons:
    [list=*]

  • The Israel gtfs database is rather accurate most of the time

  • The update bot tolerates and respects user changes. If a user finds a wrong stop, they should simply update or delete it rather than fiddle with the tag.

[/*] [/list]

More about transport V2:


Two incremental updates were performed today. It appears 34 stops were updated by MOT in a single day.

Morning update - probably yesterday’s GTFS file: https://www.openstreetmap.org/changeset/53606095 (The numbers are misleading. Deletions include duplicate removals due to an earlier Overpass mistake that was fixed).
Evening update - today’s GTFS file: https://www.openstreetmap.org/changeset/53621985

The update had a minor positive unexpected effect on https://www.efobus.com/: When you scroll around their map, the markers need a few seconds to appear, but thanks to the synchronized blue square markers on the background map, you can see the position of the stops even before the click-able markers appear.

Here’s a visual map of osm-gtfs conflicts:

http://www.safwat.xyz/stops/ (rudimentary work in progress)

Hi

Thanks for a update.
I’ve found (really osmose found) duplicate stop here: http://www.openstreetmap.org/#map=18/30.01790/34.95113
Import bug or GTFS database has a glitch?

Thank you for the report. I’ve fixed this in changeset #55249140. 99 stop duplicates were removed.

This appears to be an upload issue and has happened before. It seems when JOSM fails an upload, things can get duplicated when I resume it. I’ll investigate this further.

Full list of duplicated stops:

Following this question on Facebook, how does one know when was the last Israeli gtfs import to OSM?

The import frequency is not clear from the Israeli Wiki.

The last bus stop update seems to be made 10 days ago.

You can follow the history of bus stop updates here:
https://www.openstreetmap.org/user/SafwatHalaby_bot/history

A log of all imports can be found here: https://wiki.openstreetmap.org/wiki/User:SafwatHalaby/scripts/gtfs#Changesets

The last import can be inferred from the log.

There is no defined frequency as of now. zstadler and I discussed this in private and we think the frequency should be well defined, and perhaps the entire import process should be automated. (It is currently semi-automated).

I think we can infer “Which stations does each bus route stop on” from the GTFS data. This should allow us to automatically maintain bus route relations (https://wiki.openstreetmap.org/wiki/Buses#Services) based on the GTFS data.

Let’s take Egged’s Route 480 as an example. That route is from Tel Aviv to Jerusalem (and back) with one intermediate stop. In routes.txt we have:

route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_color
7020,3,480,מסוף 2000/עליה-תל אביב יפו<->תחנה מרכזית ירושלים/הורדה-ירושלים-1#,10480-1-#,3,
7022,3,480,מסוף רידינג-תל אביב יפו<->ממילא/קריב-ירושלים-1ק,10480-1-ק,3,9933FF
7023,3,480,תחנה מרכזית ירושלים קומה 3/רציפים-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-3#,10480-3-#,3,
7024,3,480,רמות/מסוף-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-31,10480-3-1,3,
7026,3,480,מסוף אגד/הרב פרדס-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-33,10480-3-3,3,
7027,3,480,כניסה ראשית/הדסה עין כרם-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-34,10480-3-4,3,
7028,3,480,מסוף אגד/קורן-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-35,10480-3-5,3,
7030,3,480,תחנה מרכזית ירושלים קומה 3/רציפים-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-3ד,10480-3-ד,3,
7033,3,480,מסוף אגד/צביה ויצחק-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-3ן,10480-3-ן,3,
7034,3,480,ממילא/קריב-ירושלים<->רדינג-תל אביב יפו-3ק,10480-3-ק,3,9933FF
10958,3,480,רב החובל/אדם-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-3ב,10480-3-ב,3,
15337,3,480,הראל/יסמין-מבשרת ציון<->מסוף ארלוזורוב/הורדה-תל אביב יפו-3ז,10480-3-ז,3,

which correspond to 8 variants. The primary variants are 7020 and 7023/7030, the other variants run once a day each (you can see the details on your favourite public transport app). All variants run under the same route number, 480 (third column).

We can then look up the route variant id, route_id=7020, in trips.txt where we find lines such as

route_id,service_id,trip_id,trip_headsign,direction_id,shape_id
7020,55452791,30445690_030318,ירושלים _ תחנה מרכזית,0,91509

and then we can look up the trip_id in stop_times.txt, which tells us what stops this route variant has (including the departure terminal, destination terminal, and all intermediate stops)

trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type,shape_dist_traveled
30445690_030318,18:12:00,18:12:00,41427,1,0,1,0
30445690_030318,18:48:08,18:48:08,41132,2,1,0,53910
30445690_030318,19:00:38,19:00:38,11734,3,1,0,62466

and now we look the stop_id up in stops.txt

stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,location_type,parent_station,zone_id
41427,20178,מסוף 2000/עליה, רחוב:  עיר: תל אביב יפו רציף: 2   קומה:  ,32.083251,34.795334,0,41391,5000

and 20178 is the ref=* value of the highway=bus_stop node! (See https://overpass-turbo.eu/s/vs1)

Bottom line: using this lookup process, we can build a type=route relation for every bus route variant and automatically maintain its role=stop members.

(That will allow OSM-based public transport routing apps to work in Israel)

Kudos to whomever got the ministry to release this information publicly under a permissive license. :slight_smile:

By the way, the CSV lines above appear to have misordered columns because of how Unicode infers text direction. If you actually parse them with split(‘,’) they would look right.

I am aware I can extract a lot more data, including routes, but right now I don’t have the time or motive to extract routes. Note that the trickiest part is thinking of how to deal with conflicts.

Regarding motive: No one really uses OSM for bus stop routing in Israel. Even if I import everything, there isn’t a convenient app or web interface that allows the layman to actually use the data. Professional users can already use the MOT data directly. So it seems like wasted effort. Please do correct me if I’m wrong in this regard.

Hi Safwat!

Do you accept patches? so someone else could do the work? (I’m interested, but I can’t promise I’d have time)

I know you’ve thought a lot about this, so I might be overlooking something simple, but why don’t we just let the MOT data override anything else until somebody complains that MOT data is outdated?

There are some private/unregulated bus services, e.g., a Shabbat bus from Ramat Gan HaYarden to the sea, but those should be easy to distinguish from official services by, e.g., the value of the ‘operator’ tag.

It’s a chicken and egg problem. Someone has to make the first move.

The code is open source and patches are welcome, but I’d strongly recommend discussion first.

Overriding is the easy, trivial option. But when I wrote the import scripts, I didn’t like that. It would have destroyed legitimate edits. The MOT data is not perfect, and I’ve confirmed this on multiple occasions.

My basic premise was that the vast majority of OSM edits are good edits that improve the dataset. So, when an OSM editor moves a bus stop to the other side of the road, or deletes a stop, or adds a missing stop, it probably means MOT has it wrong. Every such change makes the OSM data a tiny bit better than MOT data.

If we mirror the MOT data as-is and override all mapper data, we get an identical dataset. If we incorporate mapper data, we get an enhanced, superior dataset.

At least that’s the theory. In practice, I’ve observed certain kinds of edits users degrade the data unintentionally. And I am considering an update.

Namely:

  • Unintentionally moving the bus stops a tiny distance (<3m) while editing something unrelated. Creating needless differences with MOT.
  • Renaming bus stops. I now believe that even if there’s a typo or an incorrect name in the MOT set, it should remain as-is, because it is the name used by all other navigation apps and it’s the name used in the bus speaker systems, and it’s likely the name on the physical bus stopsign. Names should be fixed MOT-side.

The update would override osm-side name edits or movements smaller than 3 (maybe 5) meters, but would retain other user edits.

Bus stops are optional members of a route=bus relations. The route’s roads (ways), on the other hand, are mandatory according to the route relations wiki.

An automatic process for identifying the route’s member ways from the MOT data is non-trivial at best:

  1. The MOT data includes route shapes, but most likely they are different than the shapes of the OSM roads. There will be a need to write code which reliably identifies participating OSM roads from a given MOT shape.

  2. This code will need to split existing OSM roads when a bus route travels only parts of the an OSM road.

____Some updates:

  • I will now run the script weekly, on Saturday evenings.
  • A page dedicated to changeset history has been created: https://wiki.openstreetmap.org/wiki/User:SafwatHalaby/scripts/gtfs/changesets
  • I will now focus my OSM time mainly on scripts, and less on map editing or monitoring. This should prevent me from “spreading too thin” and allows me to adequately maintain and update the script.
  • I now use SafwatHalaby and not SafwatHalaby_bot for applying scripts

Useful info. Thank you! I imagine a road snapping algorithm would be very interesting for uses beyond this script. It’d be an interesting - and definitely non trivial - challenge. If I have the time I may investigate algorithms to do this.

By the way, I think Mapzen had an API for this.