API limits #35
Replies: 14 comments
-
I believe they are limits of the data model. Treating them as configurable API limits is, in my opinion, wrong. "Documenting" them in the API has just been something we did because we don't have a proper specification of the data model, the API, or our file formats. The reason why this is more than just some API limit is that a lot more than just the API depends on these limits. There is a lot of software out there and some of that software will depend on some of those limits. Those limits were introduced so that a) software can protect itself against unreasonably large amounts of data and b) so that software can do "shortcuts" for performance reasons, for instance by allowing them to use pre-allocated fixed-sized buffers or something like that. In addition there are follow-on limits that depend on the documented limits. So for instance there are limits to how large some data blocks can become in the PBF file format. But if, for instance, a way is allowed to have a much large number of member nodes, a way containing this many nodes can not be written to a PBF file any more. (In fact this is still something that can happen in other cases, because we don't have limits for everything.) So these so-called API limits are in practice limits of the data model and file formats, too. We might be able to make them smaller, but we can not make them bigger without risking breakage somewhere else. |
Beta Was this translation helpful? Give feedback.
-
The capabilities endpoint as part of the API (https://api.openstreetmap.org/api/0.6/capabilities) is meant to communicate currently valid limits pertaining to some aspects of the data model. Consuming applications are expected to adhere to these limits. So the data model / API has some flexibility already built in to deal with varying limits. I see them as configurable limits. However, some downstream consumers typically use hardcoded constraints, and some data formats break down at some point like in the case of PBF. I think what limits the flexibility of the API is that we don't want to break such downstream data consumers for a good reason. But these are not specific limits of the API. We're only trying to be nice to the rest of the world. If you're not bound to these limits, you could run your own instance of the API with your own limits and toolchain. I think some folks are even doing that outside of the OSM mainstream tooling. Fun fact: the only limit left is the number of tags, because we simply can't agree on a maximum number to put in place. |
Beta Was this translation helpful? Give feedback.
-
I don't think that is a useful way of looking at the situation because it just leads to us being locked in to whatever joe random dev wanted to hyper-optimize on and in some cases these limits are really just random historic implementation quirks (and neither of your a) and b)s). Just consider tag key and value length restrictions. Obviously I could write a piece of software that exploits that the strings are max 255 chars and not 256 (ignoring for the argument that the actual limit is not that in bytes), should we forever be constrained by such an assumption? Actually before the recently introduced max. relation member count we ignored the fact that osm2pgsql couldn't handle larger relations, what if somebody now starts using a hardwired max 32'000 entry array to reference members, are we now no longer allowed to raise the limit again? PS: it should be further noted that the API limits are in practice input limits, and -not- guarantees that the API will not return non-conforming data. |
Beta Was this translation helpful? Give feedback.
-
My idea on this: I think it is a good idea to document de facto limits, because it is fantastic for someone developing new tools to not make something that would break. However, these limits may change over time, so the terminology for them might be different, in particular the difference between informative vs normative. They may, however, even be a different, focused document. Note that it is not absurd that some technical standards keep being used for decades, so this part is likely to be the one more prone to invalidate the entire specification by like 2040. So I think we could conciliate both the views of them being part of not of the deliverable, however the terminology be different. 1: Example: PostgreSQL limits https://www.postgresql.org/docs/current/limits.html and SQLite limits https://www.sqlite.org/limits.html |
Beta Was this translation helpful? Give feedback.
-
Right, what is currently missing is a good concept to keep track of API limit changes over time. As an example, if I'm processing a 2022 planet, it's of little help, what the capabilities endpoint reports today, if we had applied some changes to the API limits only recently. |
Beta Was this translation helpful? Give feedback.
-
Well I don't think that is necessary if we take a step back and differentiate between the API limits and what is fundamental (or at least what we consider practical limits) to the data model. An application should then be expected to handle the later, which might just mean gracefully bailing out in some circumstances. |
Beta Was this translation helpful? Give feedback.
-
What it all comes down to is writing a specification that's well thought through. That specification can then say: This part and this other part will never change, and this part here can change with a new version of the specification, and this part here can, in specific implementations, be further restricted by publishing certain limits in an API. Such a specification needs, of course, some kind of versioning. |
Beta Was this translation helpful? Give feedback.
-
Have you talked to the Rails maintainers about the specification topic already? Would they have the bandwidth to keep such a document up-to-date, or even write it in the first place (if not, who else would do that)? |
Beta Was this translation helpful? Give feedback.
-
@mmd-osm Writing such a document would, of course, be part of this "new data model" effort we are just engaging in. As I said, this is not just about the API, this has to be a larger agreement on how we interpret, transport, store, interact with, etc. the OSM data. And yes, this is a large effort. But, I believe, a necessary one. The OSM ecosystem is not just the API, everything has to fit together. |
Beta Was this translation helpful? Give feedback.
-
That is going to be a larger chore, currently there is not even agreement on something as trivial as what the version number in API output means (API version vs data model/file format version). But in my experience documentation efforts in OSM fail not because it isn't possible to document the status quo but because nobody is willing to be bound by such documents (and as a consequence adhere even to a skeleton process when things have to be / should change). |
Beta Was this translation helpful? Give feedback.
-
@joto it might be a good idea to move this issue and #26 to discussions. |
Beta Was this translation helpful? Give feedback.
-
Speaking from past experience, I already tried to generate some API documentation with the nice extra of being (somewhat) machine readable: openstreetmap/openstreetmap-website#3107 - it would move the task of updating the documentation from the WIki (where the documentation suffers from severe NIH syndrome) to the Rails repo itself. Well it wasn't all that successful, because there was no buy-in, and nobody with enough time to spend on the topic. |
Beta Was this translation helpful? Give feedback.
-
@joto @simonpoole @mmd-osm I was drafting this proposal (with a bit more context about W3C, IETF, OCG, etc) for the OSMF subforum on the discourse, however I think I can suggest the core parts here.
IMOH the Respec is fantastic for editors/authors (and if done at scale, then some conventions like copyright, disclaimers, etc need to be done in a centralized manner, but despite being HTML, the editor's draft get data with JavaScript). The older version used something called bikesheed which required a command line (I used circa 2016), but now things can pretty much do static hosting. There's even built-in validation for errors on the documents (like external references mismatch). I've been doing one draft here (live documentation here), but I've been missing how to cite all other specifications/conventions. And yes, I would help volunteer myself to set up all these things! Even if mostly to allow the more RDF-like related content to be well documented, because even these documents would need to cite others (that may not exist, but at least work as a dummy target page; think like for example the need to cite what would be the "specification of the API v0.5" that may never be created beyond the Wiki, but needs to exist to not cause massive rewrite on several repos). |
Beta Was this translation helpful? Give feedback.
-
OpenAPI 3.1 is fantastic. Used to be called Swagger. The tooling even allows generating mocha servers and simulating tests. One page to know is this one https://openapi.tools/ However, OpenAPI/Swagger is machine readable, mostly for tooling itself, not a data standard. We could also have a reference, but some tools may have small differences, so maybe we could try some sort of selective importing. The Respec I mentioned is mostly to what would be today Wiki and likely allow workflow for working groups and some top-down decision (like licensing, style guides, the statuses of official recommendation, etc) and by comments such as Tom said there, your suggestions about Swagger/OpenAPI might worth a dedicated place, because it actually is supposed to be shared by different application.
Trivia: while early draft, I'm writing tooling to parse Wiki markup into some sort of JSON-LD (that I plan to start discussions with others to define an schema) and also allow convert that JSON-LD as files (currently the implementation is mostly extract files from Wiki page as zip download). This is mostly to extract |
Beta Was this translation helpful? Give feedback.
-
Just a nitpick: on page 21 there's a table that lists some "issues" with the data model, however most of these are configurable API limits not actual attributes of the data model. The wording implies that these would go away with a data model change, but that is, obviously not a given at all.
Beta Was this translation helpful? Give feedback.
All reactions