I am trying to extract a single node for each building in an .osm file - preferably the centerpoint.
My aim is to create a csv file with a single lat,lon and id for each building, i.e. one row per building.
Ultimately i want to run this globally, hence i need to keep the final file as small as possible.
The following generates - for example, a file with a node at each of the four corners of a square building.
I would like to be able to filter the file to keep just the central node (which has appeared in previous attempts at this)
I downloaded the macedonia.pbf from geofabrik, and converted to .osm with osmconvert.
In general, buildings should not have a central node, if they have been mapped as an area!
I’m afraid you will need to obtain the complete outline and then post-process to derive the centroid or other definition of the centre of the building.
Note that U shaped buildings may need some thought, as the centroid is likely to be outside the structure, and different people may have different interpretations of what constitutes a constituent building, where several buildings run together (this may also changes with time as more of the true construction is discoered).
Yes i assumed that there is no central node - as i have added buildings myself to OSM and could not find any description on the wiki about such a label.
However the following command (as opposed to what i showed above which just gives me the nodes which make up the footprint) produces a node in the centre of each building, in addition to the footprint nodes.
this generates nodes on building walls (i.e. four for a square house, plus one in the centre).
Be aware that this step will output not only the buildings’ “ways” but also every corner node which is referred to by any of these ways. That is because OSM way objects do not have geocoordinates. The way objects rely on coordinates stored in node objects.
Now all way objects (and relation objects of complex buildings) will be replaced by node objects. And: the above-mentioned corners will also be in the output file.
Your assumption is right: Buildings in OSM raw data do not have centre nodes. These nodes are artificially created by osmconvert. The program uses a very simple (and fast) method: it takes the centre of the building’s bounding box. That means of course, the calculated centre node does not necessarily lie within the building’s walls if the building is U-shaped, for example.
If you should encounter any speed problems when processing OSM data, you might want to to avoid .osm format. The toolchain above will be much faster if .o5m format is used (SK53 already suggested this). The following command converts OSM data from .osm to .o5m format:
Thanks Marqqs and SK53!
Special thanks to Markus for the steps which works well!
Good point to switch to o5m if i want to upscale this.
It looks like this is a good approach to approximate building locations over large areas, for ingestion into R for example.