Parsing .osm files

Hello,

I am new to Open Street Map, and have enjoyed whatever time spent on it. It might be a repetitive question but could not find any answer to it on the forum.

I need to parse the data of a city say New Delhi, once after parsing I will upload the data in Neo4J which is a graph database, on which later I can perform some computations. I was wondering if already there is an API/code available in Java which will effectively parse the input .osm files.

This is so that I dont re-invent the wheel.

Thanks in advance

Hello, welcome to the forum!

Maybe the toolkit Osmium could help you parsing .osm files: https://wiki.openstreetmap.org/wiki/Osmium
You even could download and parse the file in .pbf format which allows smaller file sizes and can be read faster than .osm format.

Markus

Hello Thanks a lot for the support.

But Osmium is for C++, I needed something in Java as Neo4J currently does not have a library for C++.
Or can you please suggest an alternate way i can do it.

Thanks again

Osmosis?
https://wiki.openstreetmap.org/wiki/Osmosis

I also recommend Osmosis, as I’ve used it in my own Java programs. It’s a mature tool and has support for reading and writing all popular OSM file formats. Don’t let the command line interface irritate you, the same .jar archives can also be used as libraries.

Hello,

Thanks for the suggestion. One more advice needed please…
I sat overnight and wrote a parser myself. For city new Delhi which has:
Number of Ways 26165
Number of Relations 44
Number of Nodes 130003

And on my machine it took 58 minutes to parse.

My Question:

  1. Can this time come down using jar archives in Osmosis , because insertion in the graph database will probably take many more hours to insert. If yes, any suggestions on how should I proceed?

Best Regards,
Jatin

Parsing this file with Osmosis should take less than a minute. I don’t know how long inserting into your database takes, though.

If you want to try Osmosis: Download the latest version, add the jars to your classpath (you don’t need all of them, but I suggest that you first add them all, get it to work, and then remove those that are not actually required). The actual code will then look somewhat like this:


import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer;
import org.openstreetmap.osmosis.core.domain.v0_6.*;
import org.openstreetmap.osmosis.core.task.v0_6.*;
import org.openstreetmap.osmosis.xml.common.CompressionMethod;
import org.openstreetmap.osmosis.xml.v0_6.XmlReader;

...

File file = ...; // the input file

Sink sinkImplementation = new Sink() {
    public void process(EntityContainer entityContainer) {
        Entity entity = entityContainer.getEntity();
        if (entity instanceof Node) {
            //do something with the node
        } else if (entity instanceof Way) {
            //do something with the way
        } else if (entity instanceof Relation) {
            //do something with the relation
        }
    }
    public void release() { }
    public void complete() { }
};

boolean pbf = false;
CompressionMethod compression = CompressionMethod.None;

if (file.getName().endsWith(".pbf")) {
    pbf = true;
} else if (file.getName().endsWith(".gz")) {
    compression = CompressionMethod.GZip;
} else if (file.getName().endsWith(".bz2")) {
    compression = CompressionMethod.BZip2;
}

RunnableSource reader;

if (pbf) {
    reader = new crosby.binary.osmosis.OsmosisReader(
            new FileInputStream(file));
} else {
    reader = new XmlReader(file, false, compression);
}

reader.setSink(sinkImplementation);

Thread readerThread = new Thread(reader);
readerThread.start();

while (readerThread.isAlive()) {
    try {
        readerThread.join();
    } catch (InterruptedException e) {
        /* do nothing */
    }
}

...

This is copied from one of my own tools that use Osmosis, and can also parse .osm.pbf, .osm.gz, and .osm.bz2 files. If you only need .osm (xml) parsing, the code gets a lot shorter.

Thanks,

Well it indeed is very fast. A much better tool

Regards,
Jatin