Searchable compressed planet.osm

Has anyone tried to seek on bz2 files, or know if the planet.osm.gz files are seekable? You can use dictzip to create seekable gzip files, which is really handy when you want to compress databases, though it’s going to be a bit bigger than a normally compressed gzip file.

I want this because I’m going to run a decompress of a planet.osm file on a machine that will abort the process, so I need some way of resuming what I did and I could do that with dictzip.

PS. this is one of those information in question posts :slight_smile: DS…

So I downloaded the planet.osm.gz and no it’s not compressed with dictzip… A bit sad really, it’s really usefull have it compressed that way.

Most applications working on OSM data are able (or should be able) to work with bz2 or gz compressed files because of the amount of data involved. Perhaps having those zipped files searchable would be nice, but if that eliminates Windows users then that would be a shame really. Do you know if dictzip is supported by Windows tools like decompress or 7z?

Another question, is a planet file being searchable really going to be helpful? I.e. are you able to find random stuff without having to scan the entire file?

All Windows programs support dictzip, since dictzip is the same as gzip with some extra options that can be ignored at decompression if you don’t need to seek. Cool huh… :slight_smile: Being able to start decompressing at “any point” in the file can be very useful considering that the uncompressed version is 91GB(!). E.g. that would allow you to jump straight to the way data, or the relations data.

If someone wants that would also make it possible with “some” work to compress the planet.osm by geographic regions. So you can create an index like this;

pos in file  place name
150MB        Scandinavia
2GB          Estonia, Lithuania and Latvia
4GB          islands of the world

EDIT: did a check on how big planet.osm was

I was already wondering how on earth you’re going to find anything in a zip file. But dictzip apparently makes a dictionary, oh duh!

Well, I can see why OSM doesn’t use dictzip as there are a heck of a lot of ways in which one might want to organize such a zip (by country is only one example) which would mean that we need a lot of copies (and the accompanying diskspace requirements)…