Importing official data from Keren Kayemet LeIsrael

Friends,

following our (me and talkat) previous meetings with representatives of Keren Kayemet LeIsrael,
it is my pleasure to announce that KKL is making available for OSM the following two datasets which will no doubt enhance our map:

  1. KKL-managed forest boundaries. Overall 277 polygons/multipolygons.
  2. Public KKL sites such as camping/picnic/archaeologocical/recreational. Overall 1157 interest points.

The coverage includes all of Israel except most of the areas beyond 1949 armistice line.
Before the import, I still need to receive the final OK from KKL regarding issues concerning liability and credit, so please do not add anything to OSM yet.

The original archive with all the data is available here :
http://www.wisdom.weizmann.ac.il/~dmitryb/kkl/KKL_GIS.zip

The quickest way to look at the data is probaly through a Shapefile viewer such as Quantum GIS.

I’ve already converted the forest layer into OSM format (using http://wiki.openstreetmap.org/wiki/Ogr2osm), available here (again, please do not upload anything yet):
http://www.wisdom.weizmann.ac.il/~dmitryb/kkl/forests.osm.7z

The file is pretty big for JOSM/Merkaartor on my netbook, so I am thinking to divide the whole thing into chunks of manageable size (say, 30 forests in each).
I currently added to every polygon/multipolygon the following tags:

name: the forest name
kkl:forest_num - the field FOR_NO, will be useful for future updates
kkl:update_date - the field UPDATE_DAT
source: KKL

It appears that the conversion script has some problems, as there are ways not tagged with anything.

I didn’t have time to look into the second dataset yet, it will no doubt be more tricky to convert as there is much more variability in the data.

There are duplicates/overlaps with existing OSM data. Therefore we need to carefully decide how to proceed with the import.

I cannot fully dedicate right now to this project, so any help will be appreciated.

dimka

Great news!
How would you suggest to merge the new data with the existing one? A quick how-to will be very helpful.

I just received the final OK from KKL. We agreed that:

  1. The following notice should be added to the page http://wiki.openstreetmap.org/wiki/Contributors:
  1. Imported data should contain reference to the source of the data (such as source:KKL).

I can think about two options for the import (of forests)

  1. Bulk import everything, then go manually over each forest and delete the duplicates.
  2. Divide the import into small OSM-file chunks, then upload each chunk one at a time, taking care of the duplicates along the way.

I prefer option 1, because not many people here work with offline editors such as JOSM (required for uploading vector data). I can do the initial upload, and then the manual check will be distributed. Also, everybody could then see the real situation by themselves, at which point our collective wisdom will take over.

Thoughts?

I agree with you.
Option 1 is better.
If there are overlaps, then the forests not from KKL should be deleted.
Maybe you caould add a “note” tag that says that?

That option is also better for future imports of updates: Just delete all forests with source=KKL, and add the new ones.

talkat.

I agree, except for the automatic deletion of non-KKL forests. Some of them were painstakingly traced from aerial imagery and if their boundaries are more detailed than the KKL import, they should stay.

I know what you mean…
KKL claims (and it indeed appears that way) that the accuracy of their data is 2m. So my guess is that whenever we have a duplicate, the KKL version will be superior. I guess we’ll have to decide on individual basis.

What I meant is that AFTER the first upload, and the manual fix (deletion) of duplicates, we’ll be able to update easily (by automatic upload and delete).

If we’d like to have forests that shouldn’t be deleted - we’ll tag them accordingly.
e.g. kkl-forest=no (or similar) but that will be part of the first manual go-over on all existing forests.

After the initial upload, we could have a script that lists all non-KKL forests in the wiki, and easily go over all of them, and either delete them, or manually add a specific tag (those forests will be excluded from the list, or clearly marked)
so the work of going over all non-KKL forests should be easy enough.

talkat.

Any chance to get the details of KKL bicycle tracks around the country?

We have discussed this with KKL previously, I understand that there is a legal problem because they do not have exclusive ownership over the data.

dimka

After some problems with converting between coordinate systems, I have started to test the import scripts on the test database accessible at:
http://api06.dev.openstreetmap.org/

I have split the forest dataset into chunks of 10000 objects each, and uploaded two such chunks.
A special user was created for the import: http://api06.dev.openstreetmap.org/user/kkl_import_test

There are 6 forests uploaded thus far:
http://api06.dev.openstreetmap.org/browse/relation/5166
http://api06.dev.openstreetmap.org/browse/relation/5165
http://api06.dev.openstreetmap.org/browse/relation/5164
http://api06.dev.openstreetmap.org/browse/relation/5163
http://api06.dev.openstreetmap.org/browse/relation/5162
http://api06.dev.openstreetmap.org/browse/relation/5161

Each forest is actually a relation of type multipolygon, having one or more outer/inner ways. The ways themselves are not tagged, all the data is in the relations.

The upload script is based on http://wiki.openstreetmap.org/wiki/Bulk_upload.py and seems to work well, recovering from crashes during the import.

Please review the above data (Potlatch can be used as usual), I would like to get as much feedback as possible before I proceed to the live database. Some issues which I think need to be discussed (please add more if possible):

  1. Should the ways comprising the multipolygon be tagged somehow (at least as landuse:forest)?
  2. English names. The original data has only Hebrew names, it seems we will need to add name:en by hand after the import. After this is done, we would have a script collect the english names for future updates.

My OSM user is not valid for http://api06.dev.openstreetmap.org/, and the relations you created are not seen on http://www.openstreetmap.org/

  1. Do I need a user for http://api06.dev.openstreetmap.org/ to help?
  2. Will the data eventually reflect on the ‘general’ OSM site?

Is there a way to upload in an “atomic” way?
i.e. uploading some ways, and all their nodes (and not split into several uploads)?

They look good!
Are these “full” forests? They seem to lack some areas.
e.g. Kfar HaHoresh, Zipori, or Haifa forests.

Yes. landuse=forest is a must.

In some places (e.g. UK) every way is also named. We could discuss whether we’d like it or not. AFAIK, there’s nothing about it in the standards.
i.e. every little part of the forest has the name of the whole forest.
I guess this is another issue we could discuss.

Can you send me the Hebrew names list in a private message?
I’ll see if I can translate them all at once, so you’ll have reference for future uploads.

talkat.

  1. If you want to try to edit the data on the dev server, you’ll need a user. Otherwise, you can browse everything unregistered.
  2. No, the databases are totally disconnected, that’s why I first wanted to try the whole process on the test database.

In fact, the chunks are built so that each contains all the objects (nodes, ways, relations) belonging to only specific forests. That is, a relation in one chunk cannot have as its member a way from another chunk. In any case, the script I’m using can recover from network failures, in which case it will open a new changeset for every attempt, but will remember which objects were already uploaded. I think this is inevitable because network errors happen all the time.

Yes these are the entire forests as appear in the dataset, but these are only the forests managed by KKL. In fact, they warned us that the boundaries may not correspond to actual areas with trees, but rather they reflect the official designation by the land authorities (or something).

I agree.

Maybe this is just overkill to tag every way (there are lots of ways compared to just 278 relations, after all…) Despite the fact that right now there are no ways shared by two or more relations, this might change in the future (e.g. as a result of some automatic simplification/duplicate node removal)

I’ve put the metadata table here (editable in-place):
https://spreadsheets.google.com/spreadsheet/ccc?key=0AlhvTH5eFdJvdGk4aGFRZ3B6cUVYSWNUenBoeW8zdmc&hl=en_US&authkey=CMGMgdkK

I did a few, but right now don’t have the time to go over all of them.

I added all English names. :slight_smile:

If possible, I’d suggest adding " Forest" and "Yaar " to the English and Hebrew names.
e.g.
current name in excel: Beer Sheva
name:he (in Hebrew)=Yaar Beer Sheva
name:en (in English)=Beer Sheva Forest

talkat.

Actually, there is something about multipolygons here:
http://wiki.openstreetmap.org/wiki/Relation:multipolygon#Usage

So now I tend to think that the ways should not be tagged at all.

dimka

I’ve incorporated English names into the script (thanks talkat!), here is an example (scroll to the bottom to see the 5 new forests):
http://api06.dev.openstreetmap.org/browse/changeset/11062

If everybody agrees that the tagging is OK, I will proceed with the full import on the real database.

dimka

Tagging looks ok.

Some ways aren’t closed, and they all should be closed loops with no crosses.
Is it intentional?

Here are 2 of them:
http://api06.dev.openstreetmap.org/browse/way/289828
http://api06.dev.openstreetmap.org/browse/way/289829

talkat.

Yes, this is necessary because in OSM (more precisely, API 0.6) each way can have at most 2000 nodes.

I know of this limitation. (I cut a few ways in the past that were created before this limitation was imposed… Last time I checked, the longest way was around 800 nodes)

How many ways have more than 2000 nodes?
Can they be simplified in some way to have less than 2000 nodes?

If not, then another option is to cut one big loop into 2 closed smaller loops with shared nodes.
But this should be manual.

talkat.