Extraction of the ISO3166-1 capital cities

Hello everyone!

We’re trying to make a world political map from the raw OSM data. We have successfully and quickly extracted administrative boundaries of the all ISO3166-1 territories (with osmfilter and ogr2ogr), but there’s a problem with the extraction of their capital cities.

Although the information about capital city is often stored in the country relation as “admin_centre”, it’s hard to extract it because of the large data amount, which enormously slows the whole process of the map making (osm2geojson can do it, but it would last for few days or even weeks). Other tags that could have had potential to satisfy our needs (for example, “admin_level”=”2” and “capital”=”yes”) are too heterogeneous because there are different “admin_levels” that range from 2 to 6 for ISO3166-1 territories, and tag “capital”=”yes” is sometimes added on all “admin_levels”.

We don’t want to be dependent on the existing services because they can crush or to be hacked. Some of them aren’t free and others have limitations in the amount of the extracted data (Overpass). Additionally, sometimes it’s not clear how they have extracted the offered data (geometrical and topological operations, generalization etc.).

We think that the easiest and the fastest option would be to introduce a key “capital_ISO3166-1” directly on the nodes which represent only the capitals of the ISO3166-1 territories.

How would you extract capital cities of the ISO3166-1 territories and differ them from the other big cities which should also appear on the world political map (e.g. New York, Shangai, Sao Paulo etc.) from the raw OSM data?

Thanks in advance for your opinion!

To be honest, I don’t know the exact state of current country relations and admin_centres in OSM, but the chances of such relations being broken is probably much larger than via the simple node data representing capitals (which are likely largely part of such relations anyway).

I am not sure what you are actually trying to achieve. On the one had you are talking about ISO3166-1 territories, which, according to this Wiki page (https://en.wikipedia.org/wiki/ISO_3166-1) seems to almost exclusively indicate “country level” data, on the other hand you start referring to capitals with admin_levels lower than 2, which are almost certainly not “country level” capitals, see the boundary=administrative page (http://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative).

I see few problems with a simple query like this:

place IN (‘city’,‘town’) AND (capital IN (‘yes’) AND admin_level IN (‘2’))

to extract the main capital cities of most countries on a table containing node data.

Thank you for the answer.

We have decided to show all ISO3166-1 territories and all of their administrative centers on the map.

According to the Wiki page: “ISO 3166-1 is part of the ISO 3166 standard published by the International Organization for Standardization (ISO), and defines codes for the names of countries, dependent territories, and special areas of geographical interest.”

In OSM dataset, most of the countries have admin_level=2 and boundary=administrative, but not all of them. It would be good to correct it for the countries that are missing these tags. On the other hand, some dependent territories (Greenlad) are tagged with admin_level=2 and boundary=administrative, though they shouldn’t be (if this tag is reserved only for independent countries). As I wrote, admin_levels range from 2 to 6 for ISO3166-1 territories, what isn’t good, because we had to dig through this heterogenous data. The confusion with admin_centres is even worse, so they can’t be clearly connected with their territories. That’s why I’ve proposed a tag for the administrative centres of ISO3166-1 territories, no matter which admin_level they belong to, so that they can be simply identified.

It extracts the capitals of the most countries, but not of all countries, neither the capitals of the other ISO3166-1 dependent territories.

By this approach, that most all of the data should be correct, and not all of them, OSM will never be useful and uniform on the global level.

Wikipedia says that Greenland is an autonomous country. https://en.wikipedia.org/wiki/Greenland, so level 2 is OK.
The “inverse” case is also true. England is a country and has no admin_level=2. The level 2 boundary is in UK. That is also OK,

Could you list which ISO3166 territories has admin_level<>2?

There are special cases in which countrys and territories aren’t like mos people expect (I remember France, Hong Kong, Macau, maybe there are more)
It seems there are slightly different meanings of “country”. I think this should be clarified, both the details of your requirements, and the details of how the territories should be mapped in OSM data are very important, so we eliminate the risk of your last sentence.

The first thing that is worth mentioning at this stage is that “everyone” is not here…

The “Questions and Answers” section of this forum is pretty low volume. If you’re suggesting tagging changes to countries I’d do it somewhere that people in those countries are likely to see. The main “talk” list https://lists.openstreetmap.org/listinfo/talk is probably the best place, and I’d also raise it in the “Internationale Admingrenzen 2016” section of the German forum http://forum.openstreetmap.org/viewtopic.php?id=53173 which is where a number of people who keep an eye on such things are to be found.

I also wouldn’t trust a single wikipedia language as a source directly. It can be a helpful summary, but you’ll need to understand (a) who wrote what and what the original source was and (b) what that Wikipedia’s definition of a “country” is and how it might differ from OSM’s.

Fundamentally there is not just one world political map - countries have disputes, and some countries think that other countries belong to them, even though the people living there disagree. Wikipedia’s answer to this problem is to have different answers by language - compare the map on the Serbia wiki’s article on Serbia https://sr.wikipedia.org/wiki/%D0%A1%D1%80%D0%B1%D0%B8%D1%98%D0%B0 , the Albanian one https://sq.wikipedia.org/wiki/Serbia and the English one https://en.wikipedia.org/wiki/Serbia for example. The OSMF has very few policy documents, but it has one on this: http://wiki.osmfoundation.org/w/images/d/d8/DisputedTerritoriesInformation.pdf You’ll have to consider the target market for your political map and how you choose to represent those disputes on it.

Full disclosure - I’m a member of OSM’s Data Working Group, and we’re the people who often mediate in disputes like the example above.

The problem isn’t in the definition of country nor in the map (What was my question?). Country is just one type of political-territorial entities, along with dependent territories, and special areas of geographical interest. ISO and UN are the most relevant organizations for clarifying the international status of all main territorial enities. Of course, when we’re talking about politics, there’ll never be consensus. But who’d you trust more than to UN and ISO? OpenStreetMap obviously shares my opinion (http://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative#National). ISO3166-1 codes are reliable and logical indentificators of these areas. One of the reasons why I want to show all ISO territories is because it’s easier than connecting dependent territories to their parent countries.

I’ve already made some changes, but some of the minor territories that have ISO code are disconnected, so they’re duplicated (Caribisch Nederland (admin_level=4) consists of Bonaire, Saba and Sint Eustatius (admin_level=8), and they are separated in OSM). Guadeloupe is entered 3 times, always with different admin_level. Problems with these duplicates is that their boundaries are sometimes at the sea and sometimes on the coastline. Monaco has 2 objects, first boundary=administrative and second boundary=land_area. Martinique, Reunion, Sint Maarten, Wallis and Futuna, Mayotte are also problematic. In my opinion, these territories doesn’t have to have admin_level=2. The bigger problem is that they’re sometimes duplicated.

So, I’ve decided to show all territories that have ISO3166-1 codes on my map. I don’t say it’s the best solution, but it’s maybe the simplest. Someone else would do it differently, and that’s his right.

By the way, we wrote our own algorithm that extracts admin_centres based on osm_id. I can delete my changes that I’ve done on the nodes representing cities (adding tag capital_ISO3166-1), if it’s necessary.

Not really. OSM follows the “on the ground rule” - see https://wiki.osmfoundation.org/w/images/d/d8/DisputedTerritoriesInformation.pdf for more details.

ISO3166-1 codes are indeed reliable identifiers of areas to which some bureaucrat has handed out a code. If that’s what you want to measure; great - but don’t expect other OSM tags to follow that when it comes to “edge cases” - see for example https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Reserved_code_elements and https://en.wikipedia.org/wiki/ISO_3166-2:EH .

IMHO, Caribish Nederland http://www.openstreetmap.org/relation/1216720 is ok with admin_level=4, and they are mapped as outer members of Nederland ( http://www.openstreetmap.org/relation/47796 ) which is ok.

As you stated, not all ISO 3166-1 codes should have in OSM a relation with admin_level=2.

Anyway, there are several keys about the ISO3166-1 code, which seems redundant and aren’t consistently mapped.
country_code_iso3166_1_alpha_2
ISO3166-1
ISO3166-1:alpha2

In OSM the territories are represented as a relation (which represents its boundaries) and also as node (with key “place=*” and with role “label” in the relation). Both OSM objects represents the same real world entity. Any ISO3166-1 related key, should be in the node, in the relation, or in both?

The ISO3166-2 code also is not mapped consistently. There are OSM objects named Guadelope that not all represents the same real world entity, and only one should have the ISO 3166-1 alpha2 code (GP), the other should have only ISO-3166-2 code (FR-GP).

It seems that first should be clarified about which OSM entities should have which ISO3166 related keys and their meaning in the context of OSM, and then check if the OSM data is right or needs to be corrected.

I’m not discussing about your solution, it’s seems that there are bigger problems that needs to be solved first. If the data were OK you should have no problem extracting what you require, it should be obvious from the well mapped data, so we should try to get all this fixed.

An oldie, but goldie: https://www.youtube.com/watch?v=4AivEQmfPpk

And UN and ISO defintely do -not- define what a country is.