Automated incremental Bus Stop (GTFS) updates

I apologize for the incremental updates delay. The code requires certain updates before the next run which I hadn’t had the time to do. I will do this eventually, as soon as I can.

An incremental update has been applied with the new code: https://www.openstreetmap.org/changeset/59589851

The “description” tag has been refined, but the tag was intentionally not updated and will be updated in a separate changeset.

Version 2 is out. As a mapper, here is what you should know:

https://wiki.openstreetmap.org/wiki/User:SafwatHalaby/scripts/gtfs#Information_for_mappers

Version 2 now adds “addr:address, addr:number”, a much cleaner description tag, and overriding user changes for certain tags. Additionally, the PTv2 is now used in addition to highway=bus_stop. Lastly, certain parts of the source code have been cleaned up.

No runs were performed with version 2 as of now. (The run described in the previous post was v1 + name overrides). But the source code can be found at the repository. I am making sure everything works as expected first. Also, this is a final opportunity for anyone to leave comments prior to running it.

Version 2 first run: #59668811, #59670297, #59670983

@Safwat Appreciate the continued work on this. What’s to be done if the name in MoT’s database doesn’t match the name on the ground? (The station’s name is included in the black-ruled-yellow-background station sign and usually also in the in-station maps.) The wiki page says that the MoT database name wins, but I think we should map both names: the MoT name for interoperability with other services, the on-the-ground name due to https://wiki.openstreetmap.org/wiki/Good_practice#Map_what.27s_on_the_ground .

Hello dsh4,

I am glad you like my work :slight_smile:

I wish you’ve brought up the issue earlier. This has been decided after several messages and after I waited a long time for comments. Nevertheless, I am open for change. Let us discuss this. As of now, I believe it’s not worth the effort to handle this problem. Here are some reasons.

This change is based on past experience. Nearly all mapper bus stop edits are based on armchair fixes. They are not the “on the ground name”; the mapper sees a typo, or a stop having the name of a long gone road, and changes it. Even if the change makes sense, it is not OK to change it so long as the actual stop name has not changed on the sign and in the MOT data. It seems most mapping nowadays is armchair mapping, and that kind of mapping cannot possibly tell the physical name differs from the MOT name.

Experience also shows that we as a community are incapable of maintaining the stops. There are 27k stops that change rapidly every day. Maintenance requires a full time job for several people. The staleness of the stops prior to the introduction of the script, and the “gtfs:verified” tag, which I removed yesterday, are both testimony of this. That tag was effectively dead. We don’t have the power to physically survey the stops and verify them and flip the gtfs:verified flag or update the stops. An “on_the_ground_name” tag would be the new stale “gtfs:verified”, consuming 27KiB multiplied by ~10 or more with little use.

The physically printed bus stop name on the sign is increasingly becoming less important. Digital systems screens, which are becoming increasingly common right next to the stops, voice systems in buses, other apps and services, they all use the digital MOT name. Consequently, the digital names are highly maintained.

MOT has an active support E-mail. They are willing to change their data if it has mistakes. Highly confusing mistakes can be fixed on their part, and trivial mistakes too. Since the data is heavily relied on, perhaps they are even willing to physically change the on-the ground name for critical mistakes, though I have never tried this.

If this is a solely theoretical problem, then I think we ought to ignore it. A “third dataset” in addition to the two existing ones worsens everything - the code, mapper confusion, consumer confusion, the amount of warning logs I get, the debugging time required, the Israel map size, and more. However, if actual problems are arising because of this, then that’s something to consider.

So I think it should be added only if it’s really needed, and right now, I am not convinced it is. But I am open minded and would love to hear other opinions.

By the way, if there are specific examples of MOT-ground discrepancies, I would love to hear about them.

On further thought, I think there’s a way that already works: Manually add different name tag such as alt_name whenever a ground name differs from the MOT name. This doesn’t require 2 name tags per stop and doesn’t require any script changes because the script would ignore alt_name or similar tags.

Adding the on-the-ground name in the “name” tag and the MOT in another does complicate things.

Would adding something like this to the Wiki be sufficient?


If the on the ground name is not the same as the MOT name, please:
1. add it to alt_name.
2. Optionally add a note that mentions this difference.
3. Optionally report the problem to MOT.


Good approach!
I’d like to suggest adding the appropriate email or web address to number 3 above.

The on-the-ground notes have been added to the Wiki.

From now on, updates will usually run at Saturday evenings.

Most bus stops have an “addr:street” and an “addr:number” now. This covers most of Israel. I wonder how this affects navigation apps such as Osmand. Does it improve the search?

What would be the suitable OSM equiavlan of “רציף”? E.g.


55137,ת. רכבת כרמיאל/רציפים, רחוב:  עיר: כרמיאל רציף: 2   קומה:  ,32.923817,35.298353
55137,ת. רכבת כרמיאל/רציפים, רחוב:  עיר: כרמיאל רציף: 3   קומה:  ,32.923817,35.298353
55137,ת. רכבת כרמיאל/רציפים, רחוב:  עיר: כרמיאל רציף: 4   קומה:  ,32.923817,35.298353
55137,ת. רכבת כרמיאל/רציפים, רחוב:  עיר: כרמיאל רציף: 5   קומה:  ,32.923817,35.298353
63111,מסוף אורנית, רחוב:  עיר: שומרון רציף: 1   קומה:  ,32.107438,34.999197

I have made a stupid mistake. I added “addr:number”, which is a nonexistent tag.

But what should be used instead? “addr:housenumber” doesn’t seem right either.

https://wiki.openstreetmap.org/wiki/Key:addr

addr:number changed to addr:housenumber in https://www.openstreetmap.org/changeset/59727988

Osmose now jumps on "suspicious tag combination
highway together with addr:* "

I don’t think the address should be saved in the bus stop node in OSM. This might cause duplicate addresses when there’s already an address node (or tag) on a nearby building.

I apologize for the problematic tagging. I am not very familiar with address tagging and should have studied this further first.

What do you propose? It is possible to revert this and put “addr:housenumber” and “addr:street” in the description tag. But I would have preferred something which allows the clients like Osmand to use the addresses even in areas where no one has tagged the houses. That would be much better for usability.

Is it OK to keep “addr:street” and put the housenumber in the description?

Isn’t Osmose wrong here? (Assuming we keep addr:street only)

Since address duplication is dangerous and may cause unknown behavior in client address lookup, I have decided to move addr:housenumber to gtfs:addr:housenumber in the meantime, even before the discussion is finished. Please feel free to post your opinions on what should be done with addr:housenumber and addr:street.

Changesets: #60009323, #60010547

I’m investigating ways to increase the “bus factor”. Currently, when fetching new MOT GTFS files, the script needs to compare them with the files fetched the previous run. This means that if I lose both my PC and my PC backup, or if I ever get hit by a bus, running the script properly would be a bit fiddly, because one would have to reconstruct the lost file.

I would like to make the script completely stateless locally. This should be technically possible: Fetch latest bus stop changeset by SafwatHalaby_bot, and use that as the “old gtfs file”. This would make the script solely dependent on OSM servers, and not on any local hard drive.

This might be a major overwrite, so while we’re at it, I would like to rewrite the script such that it does not depend on JOSM. This should make running it headless natural and pave the way for complete automation.