Israel GTFS release

No replies, seems that I’m late for couple of years for main Israel OSM activity :slight_smile:

In the meantime, I downloaded current GTFS data and found that it’s much more accurate (at least in the places where I could check the real situation) than data uploaded to OSM in 2012.

I managed to run GO_Sync app mentioned in the first posts here, solving encoding problem with -Dfile.encoding parameter:

java -Xmx1200M -Dfile.encoding=UTF-8 -jar gtfs-osm-sync-1.0-SNAPSHOT-jar-with-dependencies.jar

(added -Xmx1200M because it is crashed with out of memory after analyzing all stops)
But as current OSM data has no specific GO_Sync tags, GO_Sync can’t help a lot with updating current data.

Probably all current bus_stop POIs that are sourced from GTFS should be removed from OSM and new POIs uploaded in GO_Sync-compatible manner? After that, we will be able update the stops with GO_Sync in the future.
On the other side, if some of the POIs were fixed manually, we’ll lose this work.

I also played a little with stops CSV file and extracted data from GTFS description into different columns (city, address, floor, platform), they can be put into separate tags as well.

I’ve never made such big uploads to OSM so I need an advise from more experienced mappers.

The current OSM bus stops are 5 years old and are significantly different from the most up to date Israel GTFS in some areas. I am looking into ways of updating this in a way that guarantees easy periodic syncing.

OSM statistics:

  • 33,519 total bus stops

  • 1,622 have neither gtfs:id nor ref

  • 31,932 have either gtfs:id or ref

  • 31,930 have ref

  • 2 bus stops have gtfs:id but not ref

  • 35 bus stops have ref but not gtfs:id

SwiftFast, last time when I did a bulk update the rule was simple: gtfs:id is the internal id for back reference to gtfs data, and ref is actual number of stop written on yellow signs. Bus stops w/o any of these ids are usually old and added manually, in many cases they can be merged with gtfs ones.

Thanks. So correlation appears to be quite simple, but I think merging should be a bit more sophisticated.

The simple way to merge is to just remove all existing nodes and re-add. I think this is not suitable for frequent periodic updates. (It destroys history, manually added tags like “sheltered”, and needlessly causes the entire IHM to re-render).

My initial plan (subject to many changes) is roughly like this:

  • Turn the GTFS into a JOSM bus_stop layer using GO_Sync

  • Grab the current OSM bus_stops to a different layer using Overpass API.

  • Merge the two layers

  • Use the JOSM Scripting plugin to detect and merge the different identical nodes (based on ref/gtfs:id and even distance), preserving node IDs and manual tags.

  • Somehow delete the bus stops that are no longer in the GTFS.

The script should be reusable, allowing it to be shared with the community. Also, before starting I need to make sure nobody has already solved this before.

Also, I might be underestimating GO-Sync since I have not tried it. It might save me the trouble of writing a script, if it can perform syncs in a smart way.

Another case you should think about: gtfs coordinates appeared as not very accurate, so many bus stops were moved manually to their real location. I marked all imported stops with fixme tag asking to verify location and existance. So, stops with fixme are free to be moved, but stops without fixme tag should be handled separately.

Can I remove all the auto-added bus-stop fixmes? They are shadowing real fixmes.

Thanks. I will consider that too.

Can I remove the GTFS fixmes? They’re shadowing legitimate fixmes and there’s 30k of them.

+1 from me (for what it’s worth), I agree that they’re shadowing. Will there be another way to identify all the stops that are currently marked fixme? E.g. would they be added to a new ‘stops for review’ relation or something?

They was added to draw an attention of editors to low quality data import. I don’t see any better way of doing this and IMHO, they are not less important than any other fixme. If current gtfs data will be more accurate - we can easily clear all fixmes.

Were there any specific examples of seriously inaccurate GTFS data (e.g. more than 5-10 meters)? Or was it just a general suspicion of the import accuracy?

I can’t give you any exact statistics, but I saw many stops on wrong side of the road and too far from their real location. Even moovit added option to report correct location.

I’ve fixed many stops in Rishon that was on the wrong street side or at another block (it may be 10-20 meters, but if you coming from the side street it will be the right turn instead of left one, and pedestrian turn-by-turn routing will fail in that case)

Is there a way to tag the nodes as ‘position is inaccurate’ in a way that routers, etc will understand? Something like a precision=10m tag?

The vast majority of the GTFS fixmes are false positives. On the other hand the other fixmes almost always require attention.

Maybe we could use another tag, e.g. gtfs:accuracy_fixme/source=“Auto-imported GTFS, check for accuracy!”/precision=10m.

Preferably something already being used.

I’ve asked about accuracy=* on help.osm.org:

https://help.openstreetmap.org/questions/57436/how-to-tag-approximately-positioned-in-a-machine-parseable-way

The discussion is ongoing.

Worth noting this comment by SomeoneElse:

We can combine the methods, e.g., add gtfs:verified=no and accuracy=20m (or whatever the worst-case error is known to be).