Osmosis 'Unable to parse xml file' error

Hello mappers,

I’m trying to import xml extracts via the dump approach into Postgres. So far, this worked fine. As a basis, I used extracts from GeoFabrik. Now I want to switch from pre-generated extracts to my own region which is defined as a polygon. This works again fine as long as I use xml extracts from GeoFabrik as a basis. An example:


C:\Program Files\osmosis\bin>osmosis.bat --read-xml file=C:\osmosis\germany-latest.osm.bz2 --bounding-polygon file=C:\osmosis\bayern.poly --write-pgsql-dump directory=C:\osmosis\20131103_bayern
Jan 15, 2015 10:27:53 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Osmosis Version 0.43.1
Jan 15, 2015 10:27:53 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Preparing pipeline.
Jan 15, 2015 10:27:53 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Launching pipeline execution.
Jan 15, 2015 10:27:53 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Pipeline executing, waiting for completion.
[...]

I’m of course aware that a Bayern extract already exists on GeoFabrik and generating it out of Germany doesn’t make much sense. I used this (and the bayern.poly) as a simple example which should and indeed does work okay.

A problem arises when I switch from the Germany extract to a full planet.osm. This leads to an ‘Unable to parse xml file’ error. Any idea why this happens? Are the osmosis extracts on Geofabrik in some way different from the planet.osm files hosted on planet.openstreetmap.org?


C:\Program Files\osmosis\bin>osmosis.bat --read-xml file=C:\osmosis\planet-141008.osm.bz2 --bounding-polygon file=C:\osmosis\bayern.poly --write-pgsql-dump directory=C:\osmosis\20131103_bayern
Jan 15, 2015 10:34:03 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Osmosis Version 0.43.1
Jan 15, 2015 10:34:03 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Preparing pipeline.
Jan 15, 2015 10:34:03 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Launching pipeline execution.
Jan 15, 2015 10:34:03 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Pipeline executing, waiting for completion.
Jan 15, 2015 10:34:04 AM org.openstreetmap.osmosis.core.pipeline.common.ActiveTask
Manager waitForCompletion
SEVERE: Thread for task 1-read-xml failed
org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to parse xml file C:\osmosis\planet-141008.osm.bz2.  publicId=(null), ystemId=(null), lineNumber=3956, columnNumber=223.
        at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:116)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.xml.sax.SAXParseException; lineNumber: 3956; columnNumber: 223; XML document structures must start and end within the same entity.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
[...]

Because of the error ‘XML document structures must start and end within the same entity’ I assume there might be an issue with the xml in the planet extract. Any help is appreciated.

Edit:
In case someone suspects a corrupt local copy of the file, this is not the case. The FCIV tool reports the correct checksum:

//
// File Checksum Integrity Verifier version 2.05.
//
732620d02e2f14112b65df61120dcc9b c:\osmosis\planet-141008.osm.bz2

A further remark: the error even persists if the “–bounding-polygon” switch is omitted. This is very strange for me because I’m absolutely certain that I’m not the first person who attempted to create a pgsql dump for a full planet.osm file under Windows. Could my Java version be the problem? I’m using this one:


C:\Program Files\osmosis\bin>java -version
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

On the other hand, a Java issue would not explain why the Geofabrik files such as germany-latest.osm.bz2 work fine.

Regards,
Pete.

It could be a Java issue if the planet osm contains a certain XML structure that the Germany file doesn’t have. Maybe it was just an XML error in that particular planet dump and a newer one doesn’t have this problem.

It was not Java and it also was no issue with this particular OSM extract. The problem was solved by separately decompressing the data using 7-zip, see this posting in the German section of the forum.

Check this one to know more about…xml parsing

Wells