The current design of osm data structure wastes a lot of space

vgps · July 26, 2007, 4:49am

I have studied osm data stucture.
osm data stucture contains 3 main components (nodes,segments,ways):
nodes (id, lat, lon, tag)
segments (id, fromnodeid, tonodeid)
ways (id, segmentid[0],segmentid[1],…,segmentid[n], tag)

Actually, to store a map data we need only 2 components: nodes and ways.
nodes (id, lat, lon, tag)
ways (id, nodeid[0],nodeid[1],…,nodeid[n], tag)

I wonder why osm foundation introduce segment component in between node and way component? So I read the osm’s wiki to find the answer.
osm foundation said that they need to use segment component because ways can share segment!
Well, it may be true in theory but in the real life how many segments have been shared among ways? I guess not so many.

So, by introducing segment component in between node and way to save some spaces (sharing segments between ways), osm foundation make osm data structure design waste a lot of space when they output the osm data to planet.osm in xml format!

Users (who use planet.osm like me) need to write our own program (perl, java or whatever language) to process a 4GB planet.osm text file. And this 4GB will grow up very fast in the near furture.

Furthermore, the current osm data structure design (nodes,segments,ways) make thing very dificult to process planet.osm xml file.

Forexample: if I want to get the data from London city from planet.osm, what I need to do?
My program need to do the following tasks:

Open and read planet.osm xml file
Trim all the nodes that are outside London’s bounds → write result to file
Trim all segments that are outside London’s bounds → write result to file
Trim all ways that are outside London’s bounds → write result to file

Task 2 is very simple, just check the node’s lat and lon within the bounds → after task 2 finish we have a set of nodes within bounds
Task 3 is a little bit complicated, we need to check segment’s fromnodeid and tonodeid against the set of nodes within bounds → after task 3 finish we have a set of segments within bounds
Task 4 is extremely complicated, we need to check way’s segment against the set of segments within bounds.

If we can implement a new data structure like this:
nodes (id, lat, lon, tag)
ways (id, nodeid[0],nodeid[1],…,nodeid[n], tag)

Then we can save a lof of space in planet.osm xml file and it is also easy for other programs to process planet.osm file.

Other programs just need to do fewer tasks:

Open and read planet.osm xml file
Trim all the nodes that are outside London’s bounds → write result to file
Trim all ways that are outside London’s bounds → write result to file

The program will require less memory and run faster because it does not need to allocate memory to store and process segment components.

Ben · August 7, 2007, 10:01pm

If the editors treated untagged ways as segments, and displayed them in the same way segmetns currently are, then I think dropping segments would work. I don’t think its correct that nobody ever uses segments for multiple ways though. I have done this many many times, and I’m awair that others do also. Am I correct to asume this would still be posible though as the nodes could still be used many times, and therefore be linked by many ways?