How big are the data?

lucabert · January 5, 2022, 6:35pm

Hi!

I already did it last year, but I really forgot how long it worked…

So, I downloaded https://download.geofabrik.de/europe-latest.osm.pbf and I’m importing the data in a PostGIS database.
It works since 22 days…

Currently I see:


Using lua based tag processing pipeline with script /home/lucabert/src/osm-icao/osm-icao.lua
Using projection SRS 3857 (Spherical Mercator)
Setting up table: planet_osm_point
Setting up table: planet_osm_line
Setting up table: planet_osm_polygon
Setting up table: planet_osm_roads
Allocating memory for dense node cache
Allocating dense node cache in one big chunk
Allocating memory for sparse node cache
Sharing dense sparse
Node-cache: cache=8500MB, maxblocks=136000*65536, allocation method=11
Mid: pgsql, cache=8500
Setting up table: planet_osm_nodes
Setting up table: planet_osm_ways
Setting up table: planet_osm_rels

Reading in file: /home/lucabert/data/europe-latest.osm.pbf
Using PBF parser.
Processing: Node(2875001k 186.4k/s) Way(348143k 0.26k/s) Relation(3113580 5.35/s)

but I don’t have any idea how many relations I need to import, so I cannot know if the import will need other 20 days or 20 minutes…
Do someone have an idea how many data I need to import or how can I check it from the downloaded file?

Thanks a lot
Luca

Hartmut_Holzgraefe · January 5, 2022, 9:58pm

22 days definitely doesn’t sound right, europe should definitely be possible within less than a day …

Can you share your actual osm2pgsql command line, RAM size of your machine, and whether you applied any tuning to the PostgreSQL configuration?

Hartmut_Holzgraefe · January 5, 2022, 10:22pm

According to osmium-tool the current europe-latest file has:


  Number of nodes: 2889660244
  Number of ways: 350057550
  Number of relations: 5929492

Seeing that you are only at half of that relation count it’s probably not worth waiting, and we should try to get it properly tuned and then restart instead …?

Full output:


$osmium fileinfo -e europe-latest.osm.pbf 

File:
  Name: europe-latest.osm.pbf
  Format: PBF
  Compression: none
  Size: 26258482781
Header:
  Bounding boxes:
    (-34.49296,29.635548,46.75348,81.47299)
  With history: no
  Options:
    generator=osmium/1.8.0
    osmosis_replication_base_url=http://download.geofabrik.de/europe-updates
    osmosis_replication_sequence_number=3205
    osmosis_replication_timestamp=2022-01-04T21:21:12Z
    pbf_dense_nodes=true
    timestamp=2022-01-04T21:21:12Z
[======================================================================] 100% 
Data:
  Bounding box: (-74.2324546,27.6272558,68.5788605,83.718314)
  Timestamps:
    First: 2005-05-03T13:27:18Z
    Last: 2022-01-04T21:21:12Z
  Objects ordered (by type and id): yes
  Multiple versions of same object: no
  CRC32: not calculated (use --crc/-c to enable)
  Number of changesets: 0
  Number of nodes: 2889660244
  Number of ways: 350057550
  Number of relations: 5929492
  Smallest changeset ID: 0
  Smallest node ID: 1
  Smallest way ID: 37
  Smallest relation ID: 11
  Largest changeset ID: 0
  Largest node ID: 9393065921
  Largest way ID: 1018377086
  Largest relation ID: 13635578
  All objects have following metadata attributes: version+timestamp
  Some objects have following metadata attributes: version+timestamp
  Number of buffers: 4341240 (avg 747 objects per buffer)
  Sum of buffer sizes: 272002246912 (272.002 GB)
  Sum of buffer capacities: 284720168960 (284.72 GB, 96% full)

lucabert · January 6, 2022, 7:09am

Hi Hartmut,

the PC has a Xeon(R) CPU E31225 @ 3.10GHz (4 cores) and 16GB RAM. 1 TB HDD.
I started the import with:

sudo -u _renderd osm2pgsql -d gis --create --slim  -G --hstore --tag-transform-script ~/src/osm-icao/osm-icao.lua -C 8500 --number-processes 12 -S ~/src/osm-icao/osm-icao.style ~/data/europe-latest.osm.pbf

In the standard PostgreSQL configuration I changed:

max_connections = 20
shared_buffers = 4GB
work_mem = 1GB
maintenance_work_mem = 8GB
synchronous_commit = off
track_activity_query_size = 32768

All these changes were suggested me from a friend that works as PostgreSQL expert consultat, so I hope they are correct… Surely they helped to speed up the rendering…

Thanks
Luca

lucabert · January 6, 2022, 8:56am

Hi again,

do you have any suggestion how I can tune the import to get it faster?
I already posted the data of the PC as well as the configuration changes in PostgreSQL and the command line I use to start the import.

Thanks
Luca

Hartmut_Holzgraefe · January 8, 2022, 9:18am

I’m currently running an “all of europe” import into a test VM of mine which I usually use for my MapOSMatic tests. I have given the VM 16GB and four cores, but for a first run I’ve not modified my usual settings to match yours yet.

Import has been running for about 15h so far, and is currently at:


Processing: Node(2890244k 581.0k/s) Way(199433k 4.58k/s) Relation(0 0.0/s)

compared to your

even though my setup uses a smaller cache and less processes:

Looking at “top” output I see:

Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
%Cpu(s): 12.4 us, 4.6 sy, 0.0 ni, 82.1 id, 0.1 wa, 0.0 hi, 0.7 si, 0.0 st
MiB Mem : 16013.7 total, 162.2 free, 7763.2 used, 8088.3 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 7771.2 avail Mem
PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                              
48750 postgres 20 0 218892 146180 142920 S 46.7 0.9 251:33.75 postgres
48741 maposma+ 20 0 9024672 7.3g 6500 S 38.7 46.7 319:55.48 osm2pgsql
48752 postgres 20 0 253468 161428 146500 S 23.7 1.0 70:24.05 postgres
45232 postgres 20 0 217584 139788 137704 S 1.3 0.9 12:08.62 postgres

So currently, in the way import phase, CPU load is somewhat evenly divided between osm2pgsql and the postgres processes. I will send another update on these numbers when my import reaches the relation phase.

What I suspect is that maybe your own Lua transform script may be the culprit here, spending most time in Lua processing? This is just a blind educated guess for now, but if that’s the case you should see most CPU time spent on the osm2pgsql process and only little on the actual database processes as these would mostly just be waiting on osm2pgsql to provide more processed data to store.

Also for reference, these are the only modifications I do to the postgres configuration in those test VMs:


work_mem=5466000kB
maintenance_work_mem=10932000kB
autovacuum=off

So I split those 16GB the machine has in basically 3.5G for osm2pgsql node cache, 5.5G for regular pgsql work_mem, and 10G for maintenance_work_mem.

PS: I just noticed that I don’t have --disable-parallel-indexing in my test setup. Not an issue with the usual district or state level imports I do for testing, but at this PBF size having multiple postgres processes doing indexing in parallel and each allocating several GB for this may lead to out-of-memory once the actual import is through and table reorganization and indexing starts …

lucabert · January 8, 2022, 9:25am

Hi Hartmut.
In top I see:

top - 10:24:36 up 25 days,  1:44,  1 user,  load average: 1,14, 1,17, 1,13
Tasks: 135 total,   1 running, 134 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0,3 us,  0,7 sy,  0,0 ni, 78,3 id, 20,7 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu1  :  0,7 us,  0,3 sy,  0,0 ni, 49,7 id, 49,3 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu2  :  0,3 us,  0,3 sy,  0,0 ni, 96,7 id,  2,7 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu3  :  0,0 us,  0,3 sy,  0,0 ni, 97,3 id,  2,3 wa,  0,0 hi,  0,0 si,  0,0 st
MiB Mem :  15921,2 total,    173,6 free,   9439,3 used,   6308,3 buff/cache
MiB Swap:      0,0 total,      0,0 free,      0,0 used.   1907,1 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1947 postgres  20   0 4385360   4,1g   4,1g S   1,3  26,4 599:50.28 postgres
 1942 _renderd  20   0 9534528   8,8g   1424 S   0,7  56,5 781:14.63 osm2pgsql
   47 root      20   0       0      0      0 S   0,3   0,0  11:47.77 kswapd0
    1 root      20   0  169516   5696   3320 S   0,0   0,0   0:15.58 systemd
    2 root      20   0       0      0      0 S   0,0   0,0   0:00.87 kthreadd
    3 root       0 -20       0      0      0 I   0,0   0,0   0:00.00 rcu_gp
    4 root       0 -20       0      0      0 I   0,0   0,0   0:00.00 rcu_par_gp

So I think, the Lua process is OK, isn’t it?

Thanks
Luca

Hartmut_Holzgraefe · January 8, 2022, 2:56pm

Uh, that shows very low CPU load for both postgres and osm2pgsql, and rather high IO waits on the first two CPUs.

Maybe the actual problem is a rather slow HDD (aka: “rotating rust”)?

I forgot to mention that the host machine I’m running this on has pretty decent NVMe SSDs. Classic HDDs are not really good for this kind of workload at all as they are especially bad at random disk reads and writes unfortunately …

I can’t do any tests on this as I don’t have any systems with classic rotating disks at hand anymore …

lucabert · January 8, 2022, 3:01pm

Hi Hartmut,

well, the PC I use for this job is “no more the newest”…
And it has two HDD (Western Digital Caviar Green 1TB) in RAID10…

Well, since I generate the maps just once pro year, I’ll wait… I think, tomorrow or monday the import could be finished…

Regards
Luca

lucabert · January 9, 2022, 6:32am

Hi,

so, after successfully import of the relations, the script began to create the table and crashed with the error:

FATAL:  die verbleibenden Verbindungen sind für Superuser auf Nicht-Replikationsverbindungen reserviert

Since last year I imported the data without problem (standard configuration of PostgreSQL), I decided to restore the standard configuration and restart the import.
After the import I’ll use the “new configuration” I created to speed up the render.
Hopefully it works…

Regards
Luca