Automated incremental Bus Stop (GTFS) updates

A log of all imports can be found here: https://wiki.openstreetmap.org/wiki/User:SafwatHalaby/scripts/gtfs#Changesets

The last import can be inferred from the log.

There is no defined frequency as of now. zstadler and I discussed this in private and we think the frequency should be well defined, and perhaps the entire import process should be automated. (It is currently semi-automated).

I think we can infer “Which stations does each bus route stop on” from the GTFS data. This should allow us to automatically maintain bus route relations (https://wiki.openstreetmap.org/wiki/Buses#Services) based on the GTFS data.

Let’s take Egged’s Route 480 as an example. That route is from Tel Aviv to Jerusalem (and back) with one intermediate stop. In routes.txt we have:

route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_color
7020,3,480,מסוף 2000/עליה-תל אביב יפו<->תחנה מרכזית ירושלים/הורדה-ירושלים-1#,10480-1-#,3,
7022,3,480,מסוף רידינג-תל אביב יפו<->ממילא/קריב-ירושלים-1ק,10480-1-ק,3,9933FF
7023,3,480,תחנה מרכזית ירושלים קומה 3/רציפים-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-3#,10480-3-#,3,
7024,3,480,רמות/מסוף-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-31,10480-3-1,3,
7026,3,480,מסוף אגד/הרב פרדס-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-33,10480-3-3,3,
7027,3,480,כניסה ראשית/הדסה עין כרם-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-34,10480-3-4,3,
7028,3,480,מסוף אגד/קורן-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-35,10480-3-5,3,
7030,3,480,תחנה מרכזית ירושלים קומה 3/רציפים-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-3ד,10480-3-ד,3,
7033,3,480,מסוף אגד/צביה ויצחק-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-3ן,10480-3-ן,3,
7034,3,480,ממילא/קריב-ירושלים<->רדינג-תל אביב יפו-3ק,10480-3-ק,3,9933FF
10958,3,480,רב החובל/אדם-ירושלים<->מסוף ארלוזורוב/הורדה-תל אביב יפו-3ב,10480-3-ב,3,
15337,3,480,הראל/יסמין-מבשרת ציון<->מסוף ארלוזורוב/הורדה-תל אביב יפו-3ז,10480-3-ז,3,

which correspond to 8 variants. The primary variants are 7020 and 7023/7030, the other variants run once a day each (you can see the details on your favourite public transport app). All variants run under the same route number, 480 (third column).

We can then look up the route variant id, route_id=7020, in trips.txt where we find lines such as

route_id,service_id,trip_id,trip_headsign,direction_id,shape_id
7020,55452791,30445690_030318,ירושלים _ תחנה מרכזית,0,91509

and then we can look up the trip_id in stop_times.txt, which tells us what stops this route variant has (including the departure terminal, destination terminal, and all intermediate stops)

trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type,shape_dist_traveled
30445690_030318,18:12:00,18:12:00,41427,1,0,1,0
30445690_030318,18:48:08,18:48:08,41132,2,1,0,53910
30445690_030318,19:00:38,19:00:38,11734,3,1,0,62466

and now we look the stop_id up in stops.txt

stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,location_type,parent_station,zone_id
41427,20178,מסוף 2000/עליה, רחוב:  עיר: תל אביב יפו רציף: 2   קומה:  ,32.083251,34.795334,0,41391,5000

and 20178 is the ref=* value of the highway=bus_stop node! (See https://overpass-turbo.eu/s/vs1)

Bottom line: using this lookup process, we can build a type=route relation for every bus route variant and automatically maintain its role=stop members.

(That will allow OSM-based public transport routing apps to work in Israel)

Kudos to whomever got the ministry to release this information publicly under a permissive license. :slight_smile:

By the way, the CSV lines above appear to have misordered columns because of how Unicode infers text direction. If you actually parse them with split(‘,’) they would look right.

I am aware I can extract a lot more data, including routes, but right now I don’t have the time or motive to extract routes. Note that the trickiest part is thinking of how to deal with conflicts.

Regarding motive: No one really uses OSM for bus stop routing in Israel. Even if I import everything, there isn’t a convenient app or web interface that allows the layman to actually use the data. Professional users can already use the MOT data directly. So it seems like wasted effort. Please do correct me if I’m wrong in this regard.

Hi Safwat!

Do you accept patches? so someone else could do the work? (I’m interested, but I can’t promise I’d have time)

I know you’ve thought a lot about this, so I might be overlooking something simple, but why don’t we just let the MOT data override anything else until somebody complains that MOT data is outdated?

There are some private/unregulated bus services, e.g., a Shabbat bus from Ramat Gan HaYarden to the sea, but those should be easy to distinguish from official services by, e.g., the value of the ‘operator’ tag.

It’s a chicken and egg problem. Someone has to make the first move.

The code is open source and patches are welcome, but I’d strongly recommend discussion first.

Overriding is the easy, trivial option. But when I wrote the import scripts, I didn’t like that. It would have destroyed legitimate edits. The MOT data is not perfect, and I’ve confirmed this on multiple occasions.

My basic premise was that the vast majority of OSM edits are good edits that improve the dataset. So, when an OSM editor moves a bus stop to the other side of the road, or deletes a stop, or adds a missing stop, it probably means MOT has it wrong. Every such change makes the OSM data a tiny bit better than MOT data.

If we mirror the MOT data as-is and override all mapper data, we get an identical dataset. If we incorporate mapper data, we get an enhanced, superior dataset.

At least that’s the theory. In practice, I’ve observed certain kinds of edits users degrade the data unintentionally. And I am considering an update.

Namely:

  • Unintentionally moving the bus stops a tiny distance (<3m) while editing something unrelated. Creating needless differences with MOT.
  • Renaming bus stops. I now believe that even if there’s a typo or an incorrect name in the MOT set, it should remain as-is, because it is the name used by all other navigation apps and it’s the name used in the bus speaker systems, and it’s likely the name on the physical bus stopsign. Names should be fixed MOT-side.

The update would override osm-side name edits or movements smaller than 3 (maybe 5) meters, but would retain other user edits.

Bus stops are optional members of a route=bus relations. The route’s roads (ways), on the other hand, are mandatory according to the route relations wiki.

An automatic process for identifying the route’s member ways from the MOT data is non-trivial at best:

  1. The MOT data includes route shapes, but most likely they are different than the shapes of the OSM roads. There will be a need to write code which reliably identifies participating OSM roads from a given MOT shape.

  2. This code will need to split existing OSM roads when a bus route travels only parts of the an OSM road.

____Some updates:

  • I will now run the script weekly, on Saturday evenings.
  • A page dedicated to changeset history has been created: https://wiki.openstreetmap.org/wiki/User:SafwatHalaby/scripts/gtfs/changesets
  • I will now focus my OSM time mainly on scripts, and less on map editing or monitoring. This should prevent me from “spreading too thin” and allows me to adequately maintain and update the script.
  • I now use SafwatHalaby and not SafwatHalaby_bot for applying scripts

Useful info. Thank you! I imagine a road snapping algorithm would be very interesting for uses beyond this script. It’d be an interesting - and definitely non trivial - challenge. If I have the time I may investigate algorithms to do this.

By the way, I think Mapzen had an API for this.

I believe Mapzen has closed its business in January.

Fortunately, Israel Hiking has a routing API and its OSM data is updated on a daily basis.
See “/api/Routing” at https://israelhiking.osm.org.il/swagger

Notes:

  • In order to improve accuracy of the road snapping, I believe it would be best to run the routing API between every pair of adjacent points in the MOT shape. This approach could easily overload the server, so please restrict the rate of requests, if you decide to use.

  • The API does not have a PSV (Public Servive Vehicle) mode, so some discrepancies are expected.

I think importing bus routes is probably a bad idea.

The MOT data is missing accurate shapes for a lot of the intercity lines (it has direct lines between stops instead of accurate shapes), and it’s a lot of data to load on OSM without any clear usage. Israel’s “spaghetti” bus lines would not make a good-looking (or usable) map, and there’s no apps using OSM data directly for transit navigation today (it’d be very inefficient).

The only thing loading the bus lines would do is making the “transport map” layer (which at the moment is good for clearly seeing railroad tracks) basically unusable.

afaik, no other country imported ALL bus lines in the country to OSM, even when they do have GTFS.

@Safwat - I’m happy to defer to you on how to handle conflicts. You have far more experience than I there.

@zstadler - This may sound heretical, but why don’t we import the routes without importing the ways? I realise that the schema specifies that members ways are mandatory, but (1) adding the stops serves a real use-case (it enables apps to be written that can’t be written without it; travellers wouldn’t care whether the precise path is represented in the dataset, so long as they see where the busses stop), (2) adding the stops would be an accurate, incomplete edit, which is generally a good thing (as opposed to an inaccurate edit, which would be frowned upon); which is to say, we shouldn’t let the perfect (having routes that have both ways and stops) be the enemy of the good (having routes that have just the stops); (3) since OSM is a wiki, data consumers already have to be defensive and validate the relations they work on.

@anonymous_gushdan_mapper - As I said, the thinking is that if we import the data, somebody will write a smartphone app that uses it for navigation. That app would be usable in any country that has bus routes defined in OSM. At the moment it’s not possible to have such an app for Israel because OSM doesn’t have the necessary data for Israel (though we have the bus stops with gtfs:id’s — that really is fantastic, but doesn’t enable navigation apps to be written based on OSM data alone). Re good-looking, I don’t think that’s a valid argument. OSM should map the world as it is. If you don’t find the shapes of bus routes aesthetically pleasing, ask the MOT to change the routes… but if the world is spaghetiish, then OSM’s map data should contain spaghetti. Regarding your argument about the transport map, isn’t the right answer to that to ask the maintainers of the osm.org slippy map (carto) to add a mode that shows only railroads but not bus lines? Again, the primary criterion for map data is accuracy.

Like @anonymous_gushdan_mapper, I see no value in entering data into OSM that would not be used. This is also applicable to the proposal to enter routes without their way segments. Existing sites and applications assume that bus routes follow the required scheme. As a result, their handling of ill-constructed routes is unpredictable. This is why standards are created.

Indeed OSM allows using any tags you like. However, when using existing tags, it is expected to comply with the standards. Not using the standards can be considered bad mapping because of its affects on users of these tags/schemes.

@dsh4,
You can create a relation that includes just the bus stops of a route, and use it in a new application or site that you will build, but please avoid use the route=bus tag.

@dsh4 - it’s not about being “ugly”, it’s about being usable. Just like we don’t map individual trees (unless they have special significance). OSM is not meant to collect every possible detail about the world, just ones that would be usable.

This episode of Map Men is a good explanation of this philosophy in a broader context https://www.youtube.com/watch?v=kwprznh3d-o

If every street in Tel Aviv is covered in a red line that indicates a bus line, it won’t be too usable.

There exists apps that support bus routing in many countries. Saying that adding this data to Israel specifically will enable such apps is a bit far fetched, as it’s much easier for a developer to just consume GTFS feeds than interact with the OSM API, or keep an updated copy of the entire OSM database.
Also, bus routing that is based on OSM only would not be very useful, as it won’t contain the actual schedule - which is very important when doing public transit routing.

route=bus data exists for some European cities, but I haven’t seen an app that uses it. Do you know of any such app? And if not, why do you think adding Israel specifically would cause these apps to be created?
what would be the usecase of a transit routing app that doesn’t have the actual schedule?
What would be the usecase of a bus route map so dense that it can’t be possibly used for navigation?

None, but that’s a presentation layer problem, which is a different kettle of fish to the “which data should be added to the map” question.

Agreed with your good points about schedule and GTFS feeds.

@zstadler Yes, I thought that counter-argument might be offered :slight_smile: I suppose I should try and convince the tagging list to make the way members optional (or, more generally, to invent some incomplete=yes tag to facilitate the “accurate, incomplete survey” case).

Thanks everyone for all the enlightening feedback. It’s clear there’s no consensus on proceeding so I’ll drop the matter (and seek some non-OSM-based solution to my original problem).

Can you elaborate on the original problem? Perhaps we could assist…

The discussion on routes was essentially about having a partial copy of MOT GTFS information within the OSM DB. The idea to link this data with other OSM data, such as roads, was dropped during the conversation. As such, I wonder what value could OSM bring to your original problem.

The Saturday updates were postponed due to algorithm changes. Although this change has been proposed several times before with no objections, I will propose it once more and wait 1-3 weeks in case someone has comments.

The change is based on ground experience. Are user edits better than original mot data? The answer appears to be NO for tags, and YES for bus stop locations. Therefore, the algorithm will change as follows:

  • The bot will OVERRIDE all name tag changes users make. As explained before, the reasoning is that a bus stop’s MOT name is the one true official name, and the name used on digital monitors, bus stop voice systems, etc. Therefore, even if it is logically incorrect or has some spelling errors, it is the correct name of the stop as long as it is not fixed upstream in MOT.
  • As before, the bot WILL NOT override bus stop location changes, (unless MOT has a more recent update), however, if a user moves a stop only slightly (less than 4 meters), then it is assumed to be an accident and the bot will OVERRIDE the location, snapping it back to the original MOT position.
  • The rest of the behavior remains identical. e.g. the bot will not re-add user deleted stops (unless MOT has updated a stop after deletion), and so on.

edit: removed redundant points that have already been made before.

Sounds good!

I thought that if imported the set of stops in each bus route into the OSM DB, then bus routing apps could be written that would work both in Israel and in other countries, without depending on the peculiarities of each country’s upstream bus route formats. If we don’t do such an import, I’ll keep using per-country public transport routing apps, that’s all.

Cheers.

I apologize for the incremental updates delay. The code requires certain updates before the next run which I hadn’t had the time to do. I will do this eventually, as soon as I can.

An incremental update has been applied with the new code: https://www.openstreetmap.org/changeset/59589851

The “description” tag has been refined, but the tag was intentionally not updated and will be updated in a separate changeset.