Optimizing this overpass query

I am still learning the Overpass API, and I wonder if this query can be optimized in any way, and the Wiki doesn’t seem to have much info about performance. Although this query in particular runs fast, I have several different variations that take time, and their structure is similar.

Specific questions:

  • I am duplicating the same filters again and again, Should I avoid that, and how?

  • I want the nodes not last edited by SwiftFast, or SwiftFast_bot. Can this be negated with a unary operator rather than requiring a Difference block statement?

  • Does the order of the filters matter?
    [list=*]

  • Should I place the filters that weed out the most nodes first?

  • Are there “heavy” filters that are best placed last, such that the minimal number of nodes reaches them and is evaluated by them?

[/*] [*]Does using a difference block mean the second union has to be buffered in memory, meaning the query will use far more memory than a single union?[/*] [/list]

Here is the query:



[out:xml][timeout:500][bbox:{{bbox}}];
(
	area(3601473946);
	area(3603791785);
)->.a;

((
  node["amenity"="bank"](area.a)(changed:"2017-05-26T07:00:00Z");
  way["amenity"="bank"](area.a)(changed:"2017-05-26T07:00:00Z");
  relation["amenity"="bank"](area.a)(changed:"2017-05-26T07:00:00Z");
  node["amenity"="fuel"](area.a)(changed:"2017-05-26T07:00:00Z");
  way["amenity"="fuel"](area.a)(changed:"2017-05-26T07:00:00Z");
  relation["amenity"="fuel"](area.a)(changed:"2017-05-26T07:00:00Z");
);  - (
  node["amenity"="bank"](area.a)(user:"SwiftFast_bot","SwiftFast")(changed:"2017-05-26T07:00:00Z");
  way["amenity"="bank"](area.a)(user:"SwiftFast_bot","SwiftFast")(changed:"2017-05-26T07:00:00Z");
  relation["amenity"="bank"](area.a)(user:"SwiftFast_bot","SwiftFast")(changed:"2017-05-26T07:00:00Z");
  node["amenity"="fuel"](area.a)(user:"SwiftFast_bot","SwiftFast")(changed:"2017-05-26T07:00:00Z");
  way["amenity"="fuel"](area.a)(user:"SwiftFast_bot","SwiftFast")(changed:"2017-05-26T07:00:00Z");
  relation["amenity"="fuel"](area.a)(user:"SwiftFast_bot","SwiftFast")(changed:"2017-05-26T07:00:00Z");
));

out meta;

Currently, you cannot express this in another way. Howver, a new user/uid based filter for (if: …) is currently in development, yet not released. This would allow you to simplify the query as follows (I left out areas and only check one tag)


(
  node["amenity"="fuel"](if:user() != "SwiftFast_bot" && user() != "SwiftFast")(changed:"2017-05-26T07:00:00Z");
  way["amenity"="fuel"](if:user() != "SwiftFast_bot" && user() != "SwiftFast")(changed:"2017-05-26T07:00:00Z");
  relation["amenity"="fuel"](if:user() != "SwiftFast_bot" && user() != "SwiftFast")(changed:"2017-05-26T07:00:00Z");

);  
out meta;

Best is to provide some actual queries, otherwise it will be impossible to give you further hints.

My main interest is theoretical. I would like to understand how optimizations work, and answer the above questions so that maybe I can write something about performance in the Wiki. This query only serves as a discussion example.

Thanks!

That covers it all except the filter order question.

That’s quite a complex topic, due to the large number of combinations data wise and to some extent also configuration wise.

Your observations on overpass-api.de might be heavily skewed by the current CPU usage, and there’s still quite a number of pending performance improvements out there on Github. We run this kind of a analysis on dedicated servers without other traffic disturbing measurements.

So best approach would be to post queries you found expensive, or even create some Github issue for it.

No doubt it’s complex, yet some specifics should be clear-cut. e.g. either the order of the filters matters or it does not. And these questions are not documented as of now, hence my post.

In my original query, I wonder if the second union is simply buffered in memory in order to be compared with the first. This would mean that that query can use much more memory than your (if: ...) variation.

The query evaluation is described in more detail in one of the presentations given by Roland, I’d have to check which one that was. Iirc it is based on 9 different stages with cheap constraints being checked first. Within a query statement reordering is of limited use. However, sometimes splitting a query statement in two separate ones can be used to influence the evaluation order. There are some examples of this approach on help osm. Also you should check if you could use newer instead of changed as it is much cheaper!

There’s a default upper memory limit per query, but I haven’t checked what the memory consumption was like for your query. In any case it was still below the default cap. Nevertheless your query probably needs more memory as in the (if: ) query example, but it doesn’t matter in this case. If you exceed the limit, the query will be cancelled anyway.

To reuse results in a difference statement you should probably read https://github.com/drolbr/Overpass-API/issues/317 as well.

Thank you for your time. I believe you’re talking about the State of the Map 2013 presentation (starting at 7:20). Unfortunately, the pipeline explanation was too brief. It wasn’t explained how the “Collect ids of potential results” step works exactly, and the last three filtering steps were not discussed at all.

Are you sure? The Wiki says changed is cheaper, but requires Attic Data support.

changed currently has issues when it comes to memory consumption, especially if you’re looking at a large bbox and/or timespan, see https://github.com/drolbr/Overpass-API/issues/278 - a larger redesign is underway in this area anyway due to https://github.com/drolbr/Overpass-API/issues/346 and https://github.com/drolbr/Overpass-API/issues/322

You just sped up the above query by 6x! I will modify that Wiki page asap.