acquiring planet data

(1) I’m using osm2pgsql_latest (command line) attempting to parse planet-080507.osm.bz2 (also tried with planet-080423.osm.bz2) but get to node 211610 (node 211730 with planet-080423.osm.bz2) and trip an error ‘allocating nodes’. would appear to be a size ralated issue? any ideas?
(2) I’m using those older downloads because using IE (click link - save/cancel dialog) any newer files result in an much smaller file download (184m → 15m) than is indicated (~4g). Tried right-click save target with same result on newer files. Something else I can use (FTP)?

At first glance this looks like you run out of disk space…

No - got 500g disk space

saw some reference in the mapnik wiki to a wget 2 g limit?

Nah, countless people use wget daily to download the latest planet dump so at least an reasonably up-to-date wget would certainly go beyond 2 GB.

Other options:

  • Is your computer overclocked? I used a slightly overclocked machine once and experienced planet file corruption on download.
  • Are you using Windows 98 with FAT16/32?

Thanks for the push-back Lambertus - we’re 8-9 hours apart geospatially speaking - hence my less than prompt response.

Primary box is a fresh install of win 2003 svr (sdt-sp2) on intel duo-core 2g ram 500g NTFS raid1 storage so hardware shouldn’t be a constraint. Just standing it up as a map server and have sub-set of OSM and sharpmap/openlayers running on it OK.

Having the same issue (error-out ~same node) on two other machines - an older HP win 2003 server and a Dell XP - though storage could be a factor on those.

Could items 1&2 (original post) be related as in resulting downloaded files? I’m seeing planet-080507.osm.bz2 as 4,117,547KB and planet-080423.osm.bz2 as 4,025,070KB BZ2 files downloaded from and OSM mirror.

Well the error message can be comming from Postgres or the osm2pgsql tool. In this case it seems to be the tool that is at fault. Apparently you need more memory, or you should switch to using postgres as storage for the nodes while you are importing.

Am I making sense? :-/

About downloading, there are several tools that help with downloading and if you are using wget you can use the -c option to make it continue a failed download. I don’t think that will correct a corrupted file, with garbage in the end, though…

Thanks for the push emj - I’m not really following I think -
do I need more than 2g ram to process planet files with osm2pgsql?
I’m pushing into postgres/postgis - is there a flag to use with osm2pgsql to write as each node/way is it is read (or batch write)?

I was using IE to download the planet files - is FTP a possibility?

I’m sorry I have no clues left :frowning:

Our NL tileserver is quite modest in ‘size’ and I must say it copes quite well (could do with a faster I/O subsystem though)… So ‘only’ 2 GB should not be a problem. Our tileserver worked fine with just 1 GB before.

Thanks Lambertus -

re ‘NL tileserver’ - I’m looking for data representing the other side of the globe though (US-Alaska in particular) that I can get into a database. I’ll look for other sources…

I would hope that perhaps another forum member might have some thoughts given I’m starting with a clean install (os/postgres/postgis etc…) and therefore all components are at default values and therefore I shouldn’t be the firest guy to climb this hill.

We’ll see. If I come upon a solution elsewhere I’ll share it back here.

I don’t know the options, I don’t use the tool. But the error you get should only happen when you don’t have enough memory.

from my link:

RAM:

/* Implements the mid-layer processing for osm2pgsql
 * using several arrays in RAM. This is fastest if you
 * have sufficient RAM+Swap.
 *
 * This layer stores data read in from the planet.osm file
 * and is then read by the backend processing code to
 * emit the final geometry-enabled output formats
*/

Postgres:

/* Implements the mid-layer processing for osm2pgsql
 * using several PostgreSQL tables
 *
 * This layer stores data read in from the planet.osm file
 * and is then read by the backend processing code to
 * emit the final geometry-enabled output formats
*/

thanks emj - on the primary box this script is consuming ~50% of available ram (so < 1g) and no file swapping (have it running now) so don’t think it’s a memory problem.
Don’t see any configuration issues with postgres/postgis that would preclude that many records being inserted but will try shutting off vaccum and give it a try.

Well, the US and Canada data is many times bigger then the NL data, so Emj could have a point here. But like I said before, there are lots of people downloading/importing the entire planet file into their PostGIS databases. Please join the dev mailinglist and send your question there as there are usually more developers and hard-core users present.

will do - thanks Lambertus

You could try using firefox, some (maybe all) versions of Internet Explorer can’t handle downloads larger then 2g. Maybe that’s your problem?

That’s a good suggestion. Most browser’ standard download tools aren’t the best implementations. Have a proper download tool to download the planet file…(see the link in the previous sentence). Most users working with planet files use Linux hence use a good tool: wget or curl.

That’s understating it, they should be banned completely… :slight_smile: “wget -c” to continue failed transfers is very useful… Though I’ve actually managed to get a corrupted planet.osm, using “wget -c” or the curl equivalent.

I had that too but it was a result of my computer being overclocked which caused the network interface hardware to flip some bits occasionally.