Best practice for sub-tag key names – A case for :type

I am currently working on a proposal for a new sub-tag specifying the type of an object and was wondering about the best approach for key naming.

Small excurse into OSM sub-tags for types
Historically in OSM there was a tendency to use a generic type=* tag to specify the type of things ranging from pipelines (nowadays using substance=*), via trees to aerodromes. Nowadays the general usage of this tag has moved to be used to denote the type of a relation.
Then there’s the OSM typical “tagging chain”, e.g. highway=service + service=parking_aisle, amenity=parking + parking=multi-storey or generically foo=bar + bar=baz + baz=main with sub-tags picking up the value of their parent tag.

The tagging scheme I like most has evolved around power infrastructure, particularly generators where a power=generator is specified by a set of generator:source, generator:method and generator:type values. The beauty here is that through the semicolon pattern the sub-tags indicate to human readers that they require a parent =generator value to be present. This would decrease the likelihood of these tags being used alone (something I’ve seen happening with the parking= sub-tags quite a few times). Quite a few new type sub-tags lately are going into that direction, e.g. aerodrome:type and fire_station:type.

“OSM tag chain” versus “bar:type” scheme
Now back to my original challenge I am wondering whether my proposal should go for the “OSM tag chain” model (foo=bar + bar=baz) or the “bar:type” approach (foo=bar + bar:type=baz).

Curious to learn from you what your preference would be and if there are any arguments for one or the other I haven’t mentioned.

but if some tag values contain localizable strings, then how can it be applied to the “x:type” approach?

I have no established preference for one or the other. Your argument about making the requirement for a parent tag more obvious through use of :type is a good one, though. While I haven’t really thought about it from that perspective before, there are indeed examples of keys (e.g. information=*) where this might have avoided mistagging.

Overall, however, I feel that one should stop and think before using either the tag chain or a generic “:type”: Both make the assumption that there’s a single way (or at least a main way) of categorizing this kind of feature. But often there are different orthogonal category hierarchies that should be available at the same time. Using a generic key makes it easy to mix aspects that ought to be separated (such as crossing mixing the presence of islands and the controlled/uncontrolled/traffic_light distinction so that you cannot easily map crossings that have both traffic lights and islands; or fence_type mixing material and shape of the fence – are metal railings fence_type=metal or fence_type=railing?) If possible, I think it’s best to make the criteria for subdivision explicit, as keys like generator:source or bridge:structure attempt to.

Subtyping is usually done for an element’s “main” tag, the one that tells you which kind of object it is (e.g. amenity=, shop=, man_made=*), rather than keys with localizable values (such as name or description). So luckily, that tends to not be an issue.

I strongly favor the namespace (“bar:type”) scheme.
Take for example the current voting for valves. Here one new top-level key (valve) is introduced to refine the top-level key pipeline. All examples and text only refer to pipeline. So it’s questionable if a valve could be unsed in any other context. Unless this question is resolved, the item (here valve) should be clearly marked as pipeline:valve.
In this proposal, it get worse. “turn_to_close” is proposed as a top-level key. But in the descriptions it is only explained as an attribute of a valve. I would prefer to have a separate proposal for this key. Then the authors would have problems to explain why “ turn_to_close=” is on the same level like “highway=”. Using pipeline:valve: turn_to_close avoids all this questions.
A ‘good’ example for this is the railway tagging schema. “railway:signal:main:states=DE-ESO:hp0;DE-ESO:ks1” can be read (ok for a model railroader), but more important the top-level namespace is not cluttered. Unless you are mapping railways, you will never be bothered with this tagging.
And I see the similarity with pipeline mapping. All keys in the proposal are for a very specific topic and this should be reflected in the naming.

Thank-you Claudius for opening this really interesting thread, and thanks to the others who are contributing.

Unhappily I’m on the verge of leaving for short journey and I’m really unsure if I will be able to post any further until about January, 6: my apologies in advance.

I’m strong opinionated about this matter: I’m totally pro a namespaced tagging scheme for OSM. and my reasons are exactly those that you all have already expressed in your posts, to which I would like to add something more:

1) Special interest groups. - Special interest groups could have their private spaces (like mailing lists or the like) where discuss their private interest, perfect, approve and adopt whatever object type and its attributes as they deem necessary to fulfill their needs in their private namespace to the benefit of their interests, without any risk to “collide” with what other interest groups are doing in other different fields.

I think the community can much benefit from this approach: dedicated interest groups could be much more palatable to individuals interested in mapping particular infrastructures.

2) Technical optimizations - It is at least conceivable that in the future (probably a very far one) particular “namespaces” can be organized with ad-hoc data structures in ad hoc tables.

Think about the the valves: once you know the maker/model-number of a valve it would be much more efficient to store its characteristics in relations (in the datatabase meaning, not the OSM one): once you know that valve XYZ is a globe valve of a certain diameter, with a certain handle, that must be turned in a particular direction to close, etc., it would make much more sense to store that information once and for all XYZ valves instead of duplicating it for every XYZ-valve-node. Same for every other similar object which typically can an does re-occurs in infrastructures.

Having that information namespaced now will surely be valuable in the future for migrating it to ad-hoc data structures.

But yes, I understand that as things stand and as far as I know, there would be no technical benefit in adopting a namespaced (chained) tagging scheme as tags are stored in the OSM database as key/value pairs in a PostgresSQL hstore. The chained, namspaced, tags will just be the “k” values in that tables and there will be no architectural benefit in having them structured in a hierarchical form.

But there will be no harm either…

3) Quality assurance. - I’m quite sure having namespaced objects/attributes can be invaluable to automated data quality assurance tools/projects.

As a last thing, let me answer to the objection that this line of thinking assumes “that there’s a single way (or at least a main way) of categorizing this kind of feature”.

I agree that often times this is a false assumption, but we have a problem only if we categorically cannot tag a single feature (node, way or relation) with two (or many) different namespaced information: if we instead assume that a single feature can be tagged “as seen” from different perspectives (interests) in different namespaces, we open the way to many ways to better organize knowledge.

Regards,

Sergio

Sorry guys, still few hours to go…

There have been contributions to the topic in the mailing list and I don’t know how to manage the situation: do we continue here?

Cheers,

Sergio

… what is concerning me is that columns are now being used to define keys but not in a correctly hierarchical way. I’m afraid that could jeopardize any attempt to correctly use namespaces…

Copying the replies from the OSM Tagging Mailing list:

Richard said:
the OSM tag chain should be imho used only for very common things because each member
of the chain will turn up as a “top level” tag in the database and taginfo.
If used extensively for attributes I would consider it pollution of the database.
It is also much less flexible as you can specify only one attribute at a time.

Then François replied:

the OSM tag chain should be imho used only for very common things because
each member
of the chain will turn up as a “top level” tag in the database and
taginfo.

We are using such chains in Power, Pipeline and Telecom groups. It works
well :
power=transformer + transformer=distribution + voltage:primary=20000 +
voltage:secondary=400
man_made=street_cabinet + street_cabinet=telecom + telecom=exchange +
telecom:medium=copper + operator=Orange

Adding power: and telecom: prefixes would be seriously bad to encourage for
contribution and extremely redundant.

Furthermore, refining of well used tags often get discouraged because of
their usage.
This doesn’t include the redundancy in namespaces’ prefixes which is worse
.

If used extensively for attributes I would consider it pollution of the
database.
It is also much less flexible as you can specify only one attribute at a
time.

If you have to define more than one attribute with the same name it may be
the attribute isn’t well defined.

Have you examples please?

To which Sergio responded:
“Transformers” is a perfect example of “namespacing done backward”. Why “voltage:secondary=220”? In a correctly namespaced world it would be “secondary:voltage=220”.

*I understand that in spoken English you can say "the voltage of the *secondary *is *220 Volt", and that’s probably why those keys have been built with the terms in that particular order. (/BTW, logic and wording is very different in different cultures and languages. I think it wouldn’t had been in that order in, say, German: can a german speaker please confirm that?/)

Transformers can have and very often have more than one secondary: you have dealt with that using things like "voltage:tertiary=" and the likes (windings:tertiary=, I suppose…). And what if the transformer has 3 secondaries? Or 4?

Isn’t “secondary:1:voltage=200” better? Don’t you see that’s more logical and expandable? Don’t you see that here we assign a quantity (220) to something that has the correct dimensions (voltage), like in the previously globally defined key "voltage="? Don’t you see how with that syntax everything related to the first (/second, third, fourth,… nth/) secondary (/wingdings, current, whatever…/) would be grouped under "secondary:n:="?

And if transformers weren’t meant to be a “/namespaced thing/”, why using the columns? Why not voltage_secondary= ?*

Don’t you see that with the transformers a new first level keyword, "rating=" have been implicitly defined and documented in the transformers page and how that keyword can be useful in other contexts… or namespaces, if you prefer?*

BTW, what is that telecom:medium=copper thing (https://wiki.openstreetmap.org/wiki/Key:telecom:medium)? /“Telecoms” /do not have a medium: local loops have. Is that meant to be a namespaced thing? Have this being debated/approved? I have seen it applied to buildings: what is the meaning of that?

Adding power: and telecom: prefixes would be seriously bad to encourage for contribution and extremely redundant.

To the contrary! Please read in the forum my rationale explaining exactly how that would be beneficial…

Reading through these comments I think we have to clarify terminology for tagging schemes:

  1. OSM tag chain: foo=bar + bar=baz

  2. Namespaced subtags: foo=bar + foo:bar=baz + foo:bar:colour=red

  3. Bastardized OSM namespacing: foo=bar + bar:type=baz + bar:colour=red

Let me know if I correctly laid out all the alternatives.

My understanding is that Sergio is favouring the true namespace model b, while Francois and myself favor c mainly for reasons of readabily and typing economy.

Maybe it would make sense to try applying it to my concrete example of an aviation infrastructure tagging scheme. There is two sub-tags I would propose and I’m curious how you would name them. My proposal currently leans towards model c:
(NB: I am currently not planning to have other sub-tags besides these “type” sub-tags. This information might be relevant for the tagging method you’d suggest.)

aeroway=aerodrome + aerodrome:type=international

aeroway=holding_position + holding_position:type=intermediate

Am I correct to assume, that you, Sergio, would suggest tagging them as the following model b?

aeroway=aerodrome + aeroway:aerodrome:type=international

aeroway=holding_position + aeroway:holding_position:type=intermediate

I find those keys simply very long. I can see value of the argument that they reveal all information about their tagging chain from the first level down, but I’m not sure it’s worth the burden it puts on staying readable to humans.

For completeness here’s model a OSM tag chain:
aeroway=aerodrome + aerodrome=international

aeroway=holding_position + holding_position=intermediate

Maybe it would be helpful that for each new top-level key a separate proposal should be written. So everyone could easy check if the key can be used in a different domain.

For the above example, holding_position would go into a proposal. Can it be used somewhere else? If not, it’s better to move it to a namespace.

The same for the https://wiki.openstreetmap.org/wiki/Proposed_features/Pipeline_valves_proposal proposal current in voting. Put the 4 new top-level keys valve, sensor, actuator, handle and turn_to_close into separate voting pages. Then we can check every item for global usefulness.

Hi all,

I’m François from @tagging ML.

It depends on logic expressed in tagging.
Regarding voltage:primary, voltage:secondary: voltage is the basis concept and it is extended with primary and additional transformer interfaces. I agree this is not the logic Sergio would have used but it’s not a wrong one anyway.
Voltage was previously documented and secondary, primary are available on some power equipments (not only transformers, but some converters also).
It was relevant to use a namespace as to get advantage of existing and suitable voltage=* definitions. According to you, I should have introduced power:transformer:secondary:voltage and power:converter:secondary:voltage?

Because voltage:primary isn’t so different from voltage=*, only referring to a given function of a particular kind of power device (not only transformers, again).

I agree this is not a perfect solution.
Initially, telecom:medium was proposed to be medium=, like material=, but we agree on telecom:medium=* as we deliberately wanted to have our very own key.
That’s not restricted to local loops, this key is useful for telecom long distance lines also.

JOSM maintainers were asked to give their point of view regarding advantages of namespaces to build presets: they don’t need them since they have the context definition
https://josm.openstreetmap.de/wiki/TaggingPresets#Attributes

I see three big issues with namespaces as you want to use them:

  • Very long key names in which most of the text is redundant with keys previously used on the same feature. This is a joke to expect anyone type pipeline:valve:actuator instead of actuator.
  • Limited re-usability of common concepts. If we see special interest groups isolation as an advantage, then we can directly split OSM in several projects. The main advantage of sharing keys is re-use what’s others take time to define and document.
    As said in ML, I’m happy to use location=* instead of any more particular key when I need it. Fire hydrant people did define fire_hydrant:position instead, what kind of benefit is this?
  • :type suffix doesn’t bring any additional information. There is no additional advantage to use power:substation:type instead of power:substation (actually we use simpler substation=* and share it between power and pipelines, this is not restricted to a single special group of interest).

As explained on valve proposal Talk :

  • actuator can be used on highway movable bollards
  • handle can be used on doors, especially wheel one.

All the best