How to extract relation members from .osm xml files

All,

I’ve been trying to build a website (in Django) which is to be an index of all MTB routes in the world. I’m a Pythonian so wherever I can I try to use Python.

I’ve successfully extracted data from the OSM API (https://stackoverflow.com/questions/68248846/display-relation-trail-in-leaflet) but found that doing this for all MTB trails (tag: route=mtb) is too much data (processing takes very long). So I tried to do everything locally by downloading a torrent of the entire OpenStreetMap dataset (from https://planet.openstreetmap.org/) and filtering for tag: route=mtb using osmfilter (part of osmctools in Ubuntu 20.04), like this:

osmfilter $unzipped_osm_planet_file --keep="route=mtb" -o=$osm_planet_dir/world_mtb_routes.osm

This produces a file of about 1.2 GB and on closer inspection seems to contain all the data I need. My goal was to transform the file into a pandas.DataFrame() so I could do some further filtering en transforming before pushing relevant aspects into my Django DB. I tried to load the file as a regular XML file using Python Pandas but this crashed the Jupyter notebook Kernel. I guess the data is too big.

My second approach was this solution: https://stackoverflow.com/questions/45771809/how-to-extract-and-visualize-data-from-osm-file-in-python. It worked for me, at least, I can get some of the information, like the tags of the relations in the file (and the other specified details). What I’m missing is the relation members (the ways) and then the way members (the nodes) and their latitude/longitudes. I need these to achieve what I did here: https://stackoverflow.com/questions/68375034/plotting-openstreetmap-relations-does-not-generate-continous-lines.

I’m open to many solutions, for example one could break the file up into many different files containing 1 relation and it’s members per file, using an osmium based script. Perhaps then I can move on with pandas.read_xml(). This would be nice for batch processing en filling the Database. Loading the whole OSM XML file into a pd.DataFrame would be nice but I guess this really is a lot of data. Perhaps this can also be done on a per-relation basis with pyosmium?

Any help is appreciated.

Anyone? Perhaps my question could also be as simple as:

How can I extract node level info from a .OSM file, preferably using pyosmium but also possibly using osmium directly?

Don’t know how it works in pyosmium, but the general algo with OSM is always the same:

  1. pass: Collect the relations with type=route + route=mtb, store the way members
  2. pass: Collect all ways which might be interesting because of tags like mtb_scale=* AND all ways which appear in the stored list of members
  3. pass: Collect the nodes for the ways

Don’t use XML format (.osm) as input, osm.pbf or .o5m are much better for this.

Thanx! I’ll try this when I get home from my holiday!

Hmm, I downloaded the PBF file, which is the only alternative to .osm here: https://planet.openstreetmap.org/, right? But osmfilter reports:

osmfilter Error: .pbf format is not supported. Please use .o5m.

Any way to fix that? Is there an o5m world file anywhere?

Edit, ok, wait, I’m now doing this:

osmconvert --verbose --drop-version $osm_planet_file -o=planet.o5m

…When It’s finished, I’ll see if it works for me :slight_smile:

Ok, the conversion worked and I can do the same things with the file (while it being considerably smaller). The result is also the same thought… No lists of Members per relation. I have a hard time understanding pyosmium.

I made some progress.

When I use the following code:

    import osmium as osm
    import pandas as pd
    class OSMHandler(osm.SimpleHandler):
        def __init__(self):
            osm.SimpleHandler.__init__(self)
            self.osm_data = []

        def tag_inventory(self, elem, elem_type):
            for tag in elem.tags:
                if elem_type == 'relation':
                    members = elem.members
                else:
                    members = 'None'
        
                self.osm_data.append([elem_type, 
                                   elem.id, 
                                   elem.version,
                                   elem.visible,
                                   pd.Timestamp(elem.timestamp),
                                   elem.uid,
                                   elem.user,
                                   elem.changeset,
                                   len(elem.tags),
                                   tag.k, 
                                   tag.v,
                                   members
                                   ])
            
                
        def node(self, n):
            self.tag_inventory(n, "node")

        def way(self, w):
            self.tag_inventory(w, "way")

        def relation(self, r):
            self.tag_inventory(r, "relation")

    osmhandler = OSMHandler()
    osmhandler.apply_file("../data/world_mtb_routes.o5m")

I get the error:

    RuntimeError: Relation callback keeps reference to OSM object. This is not allowed.

Strangely if I just use:

    if elem_type == 'relation':
        print(elem.members)

Instead of trying to add the the info to self.osm_data, I do see the information I want scrolling by!

What am I missing? Why can I add elem.id etc to the self.osm_data but not elem.members?

Maybe you should decide first, if you want to continue here or on help osm. It doesn’t really make sense to have the same discussion on both sites.

https://help.openstreetmap.org/questions/81521/how-to-extract-relation-members-from-o5m-files

My reasoning was that it also doesn’t really make sense to maintain 2 help fora with the same target audience. I posted this first about 3 weeks ago and recently, because there was no satisfying answer, I turned to the pyosmium github page https://github.com/osmcode/pyosmium/issues/184, shortly after posting that issue, I read that they prefer me posting questions at help.openstreetmap.org, and thus I tried it there.

Anyway, since I have no idea if there is any overlap in visitors between here and help.openstreetmap.org, and it could well be that the probability of getting a correct answer is much higher at help.openstreet.org, I tried it there as well.

In any event, my plan was to post the outcome and the answer on both forums, helping both audiences and increasing the chance that someone DDG-ing/Googling hits the right answer.

If this is undesirable, I apologize and I would like to hear what the preferred forum is and where the largest number of knowledgeable people roam the posts.

Highest regards.

I would at least expect that you add links to other sites where you have posted the same question. This way you can avoid that people spend their time answering your question, which has already been answered elsewhere. Thank you.

Now that is sound advice, will do!

There is now an answer that helps me significantly, I will post the full details here when I implement every thing the way I need it: https://help.openstreetmap.org/questions/81521/how-to-extract-relation-members-from-o5m-files

I have finally solved this, I posted the answer on stackoverflow: https://stackoverflow.com/questions/68622198/how-to-extract-relation-members-from-osm-xml-files