OpenStreetMap Forum

The Free Wiki World Map

You are not logged in.

#1 2019-08-20 19:46:50

ofamuyibo
New Member
Registered: 2019-08-20
Posts: 2

Join osm_ids between points file and multi-poly files

Hello,

Im trying to do a dump into tableau of the central america data on openstreetmap to create some custom visualisation.
For that, ive converted the .pbf file on central america from openstreetmap into an excel form and now tried to do a simple left join in sql between the dataset files containing points (which has the longitude and lattitude coordinates) and the other dataset containing multipolygons.

However when i do the left join, im using the osm_id across the 2 files to make the join (this seems to be the only unique identifier i could find in the 2 datasets)
On executing the script, the join finds no pairings.
I also tried doing an inner join and got no results.

From what i can see, it seems it cannot find any links between the 2 datasets. Is the osm_id not the right field to be using to make the join?
Is it actually possible to join the 2? If so, how?
Im trying to do this because i want to pull in some info from the multipolygons dataset around, but since that one doesnt have coordinates i have to join it with the points dataset to get the longitude and latitude.


For context, the points dataset has about 800k rows, and the multipolygons dataset has about 1.4million rows.

Appreciate the help  many thanks

Offline

#2 2019-08-20 21:44:50

alester
Member
Registered: 2011-09-21
Posts: 252

Re: Join osm_ids between points file and multi-poly files

I'm not familiar with the joins you're trying to do, but I suspect you may be skipping a step. A multipolygon relation would typically contain ways, not nodes. If you look at one of the multipolygons, you'll likely see references to the member ways. You'd need to look up those ways first, and then look up the nodes that compose those ways. Trying to jump straight from a relation to nodes won't work for multipolygons.

Offline

#3 2019-08-20 22:24:38

OverThere
Member
Registered: 2017-02-28
Posts: 76

Re: Join osm_ids between points file and multi-poly files

Since you seem to be somewhat familiar with SQL and databases, the explanation is simplified: your are missing join conditions and tables.

You need to extract the ways (paths within the multipolygons) as well as the multipolygons and the nodes (points).

The database schema should help with the SQL table connections.

Diagram symbols for table relations

A OSM relation is unordered, mixed types collection of OSM entities, including other relations.

A way is an ordered list of nodes, which are points with latitudes, longitudes and possibly other data.

Multipolygons are a particular type of relation.

In your situation, the connections are

CURRENT_RELATIONS:id->

CURRENT_RELATION_MEMBERS:relation_id/member_type="multiploygon",member_id->

CURRENT_WAYS:id->

CURRENT_WAY_NODES->way_id (sequence_id is used to maintain order in a way)/node_id->

CURRENT_NODES with the latitude and longitude directly in the node and the other data in the key(k) and value(v) pair tables.

Last edited by OverThere (2019-08-20 22:27:24)

Offline

#4 2019-08-21 10:38:44

GerdP
Member
Registered: 2015-12-18
Posts: 843

Re: Join osm_ids between points file and multi-poly files

You also need some logic to compute the areas described by the multipolygon. This logic isn't simple and has to handle several possibilities.
I think tools like Osm2pgsql do this for you. https://wiki.openstreetmap.org/wiki/Osm2pgsql

Offline

#5 2019-08-21 11:04:54

ofamuyibo
New Member
Registered: 2019-08-20
Posts: 2

Re: Join osm_ids between points file and multi-poly files

Many thanks for the reply guys.
I have to admit, im still a bit stumped.

So in the .pbf file for the Central America region, there are 5 layer files as such:

layer name* (geometry type)*
lines (LineString)
multilinestrings (MultiLineString)
multipolygons (MultiPolygon)
other_relations (GeometryCollection)
points (Point)

the "points" file is the only one with x and y coordinates which also contains 'osm_id' field.
The "multipolygons" file has both an 'osm_id' field and an 'osm_way_id' field.

However when i do a join between the 2 osm_id fields across points file and multipolygons file i dont get a match.

You said before i was missing a step and a table?
Do i need to bring in another one of those tables and do a join with osm_way_id first before trying to join with the osm_id?

Many thanks for your help.

Offline

#6 2019-08-21 15:23:57

OverThere
Member
Registered: 2017-02-28
Posts: 76

Re: Join osm_ids between points file and multi-poly files

Extract from PBF Format of the missing PBF table:

Ways and Relations
For ways and relations, which contain the IDs of other nodes in the field refs, I exploit the tendency of consecutive nodes in a way or relation to have nearby node IDs by using delta compression, resulting in small integers. (I.E., instead of encoding x_1, x_2, x_3, I encode x_1, x_2-x_1, x_3-x_2, ...). Except for that, ways and relations are mostly encoded in the way one would expect. Tags are encoded as two parallel arrays, one array of string-IDs of the keys, and the other of string-IDs of the values.

message Way {
   required int64 id = 1;
   // Parallel arrays.
   repeated uint32 keys = 2 [packed = true];
   repeated uint32 vals = 3 [packed = true];

   optional Info info = 4;

   repeated sint64 refs = 8 [packed = true];  // DELTA coded
}

Another extract from PBF Format:

Relations use an enum to represent member types.

message Relation {
  enum MemberType {
    NODE = 0;
    WAY = 1;
    RELATION = 2;
  } 
   required int64 id = 1;

   // Parallel arrays.
   repeated uint32 keys = 2 [packed = true];
   repeated uint32 vals = 3 [packed = true];

   optional Info info = 4;

   // Parallel arrays
   repeated int32 roles_sid = 8 [packed = true];
   repeated sint64 memids = 9 [packed = true]; // DELTA encoded
   repeated MemberType types = 10 [packed = true];
}

Relations may contain relations. It is a problem if a relation contains itself directly or indirectly.

A multipolygon is represented by a relation which has among its key value pairs a pair of type=multiipolygon.

A multipolygon contains only ways as I remember. Therefore the Membertype is alwyas WAY and the memid is of the ways.

All the ways in a multipolygon ought closed ways, i.e., form a loop.

The relation's memid field is equal to the id field of the way id field.

The way refs field contain the ordered list of the node id fields.

The nodes contain the location and other data in key value pairs.

Offline

#7 2019-08-21 17:11:54

alester
Member
Registered: 2011-09-21
Posts: 252

Re: Join osm_ids between points file and multi-poly files

ofamuyibo wrote:

the "points" file is the only one with x and y coordinates which also contains 'osm_id' field.
The "multipolygons" file has both an 'osm_id' field and an 'osm_way_id' field.

However when i do a join between the 2 osm_id fields across points file and multipolygons file i dont get a match.

I think this is the source of your confusion. While these fields are labelled the same, they're not the same IDs.

There are three different object types in the OSM database: nodes, ways, and relations. Each of the object types has its own ID numbering starting at 1. For example, you can have node #1, way #1, and relation #1, and these all refer to different objects in the database.

For the multipolygon relations you're looking at, each relation has its own ID ("osm_id") within the relation numbering. This multipolygon relation contains a number of member way objects, each referenced by its ID within the way numbering ("osm_way_id"). Going further, each of these ways contains an ordered list of the nodes that compose the way, each referenced by its ID within the node numbering. Therefore, if you want the nodes that ultimately compose a multipolygon, you'll first need to do a join to look up the ways that compose the relation, and then look up the nodes that compose those ways.

Offline

Board footer

Powered by FluxBB