API limits #35

simonpoole · 2023-01-04T10:23:57Z

simonpoole
Jan 4, 2023

Just a nitpick: on page 21 there's a table that lists some "issues" with the data model, however most of these are configurable API limits not actual attributes of the data model. The wording implies that these would go away with a data model change, but that is, obviously not a given at all.

joto · 2023-01-04T10:38:36Z

joto
Jan 4, 2023
Maintainer

I believe they are limits of the data model. Treating them as configurable API limits is, in my opinion, wrong. "Documenting" them in the API has just been something we did because we don't have a proper specification of the data model, the API, or our file formats.

The reason why this is more than just some API limit is that a lot more than just the API depends on these limits. There is a lot of software out there and some of that software will depend on some of those limits. Those limits were introduced so that a) software can protect itself against unreasonably large amounts of data and b) so that software can do "shortcuts" for performance reasons, for instance by allowing them to use pre-allocated fixed-sized buffers or something like that.

In addition there are follow-on limits that depend on the documented limits. So for instance there are limits to how large some data blocks can become in the PBF file format. But if, for instance, a way is allowed to have a much large number of member nodes, a way containing this many nodes can not be written to a PBF file any more. (In fact this is still something that can happen in other cases, because we don't have limits for everything.)

So these so-called API limits are in practice limits of the data model and file formats, too. We might be able to make them smaller, but we can not make them bigger without risking breakage somewhere else.

0 replies

mmd-osm · 2023-01-04T11:25:17Z

mmd-osm
Jan 4, 2023

The capabilities endpoint as part of the API (https://api.openstreetmap.org/api/0.6/capabilities) is meant to communicate currently valid limits pertaining to some aspects of the data model. Consuming applications are expected to adhere to these limits. So the data model / API has some flexibility already built in to deal with varying limits. I see them as configurable limits.

However, some downstream consumers typically use hardcoded constraints, and some data formats break down at some point like in the case of PBF. I think what limits the flexibility of the API is that we don't want to break such downstream data consumers for a good reason. But these are not specific limits of the API. We're only trying to be nice to the rest of the world.

If you're not bound to these limits, you could run your own instance of the API with your own limits and toolchain. I think some folks are even doing that outside of the OSM mainstream tooling.

Fun fact: the only limit left is the number of tags, because we simply can't agree on a maximum number to put in place.

0 replies

simonpoole · 2023-01-04T11:31:09Z

simonpoole
Jan 4, 2023
Author

I don't think that is a useful way of looking at the situation because it just leads to us being locked in to whatever joe random dev wanted to hyper-optimize on and in some cases these limits are really just random historic implementation quirks (and neither of your a) and b)s).

Just consider tag key and value length restrictions. Obviously I could write a piece of software that exploits that the strings are max 255 chars and not 256 (ignoring for the argument that the actual limit is not that in bytes), should we forever be constrained by such an assumption? Actually before the recently introduced max. relation member count we ignored the fact that osm2pgsql couldn't handle larger relations, what if somebody now starts using a hardwired max 32'000 entry array to reference members, are we now no longer allowed to raise the limit again?

PS: it should be further noted that the API limits are in practice input limits, and -not- guarantees that the API will not return non-conforming data.

0 replies

fititnt · 2023-01-04T11:41:46Z

fititnt
Jan 4, 2023

Premises (they may be wrong or incomplete analogy)

Let's assume .osm files are similar to .sql files.

Let's assume OSM Server API works similar to what a Database server would be for SQL.

Then, as database servers can impose additional limits (like number of tables, number of columns, number of rows, number of JOINS queries could be done, etc) [1] because of operational system limitations or even hardware limitations, so I think we can say that these limits are no the same constraints of the .osm files, but they're relevant.

Then, we have a problem of what is data model. It's like an SQL-92 specification? If yes, then how would they document technological limits for something that may change? Also, if they do not impose these limits, let it free for implementers to do whatever they think would be reasonable, what could go wrong?

My idea on this: I think it is a good idea to document de facto limits, because it is fantastic for someone developing new tools to not make something that would break. However, these limits may change over time, so the terminology for them might be different, in particular the difference between informative vs normative. They may, however, even be a different, focused document. Note that it is not absurd that some technical standards keep being used for decades, so this part is likely to be the one more prone to invalidate the entire specification by like 2040.

So I think we could conciliate both the views of them being part of not of the deliverable, however the terminology be different.

1: Example: PostgreSQL limits https://www.postgresql.org/docs/current/limits.html and SQLite limits https://www.sqlite.org/limits.html

0 replies

mmd-osm · 2023-01-04T11:47:30Z

mmd-osm
Jan 4, 2023

it should be further noted that the API limits are in practice input limits, and -not- guarantees that the API will not return non-conforming data.

Right, what is currently missing is a good concept to keep track of API limit changes over time. As an example, if I'm processing a 2022 planet, it's of little help, what the capabilities endpoint reports today, if we had applied some changes to the API limits only recently.

0 replies

simonpoole · 2023-01-04T11:56:22Z

simonpoole
Jan 4, 2023
Author

it should be further noted that the API limits are in practice input limits, and -not- guarantees that the API will not return non-conforming data.

Right, what is currently missing is a good concept to keep track of API limit changes over time. As an example, if I'm processing a 2022 planet, it's of little help, what the capabilities endpoint reports today, if we had applied some changes to the API limits only recently.

Well I don't think that is necessary if we take a step back and differentiate between the API limits and what is fundamental (or at least what we consider practical limits) to the data model. An application should then be expected to handle the later, which might just mean gracefully bailing out in some circumstances.

0 replies

joto · 2023-01-04T12:48:45Z

joto
Jan 4, 2023
Maintainer

What it all comes down to is writing a specification that's well thought through. That specification can then say: This part and this other part will never change, and this part here can change with a new version of the specification, and this part here can, in specific implementations, be further restricted by publishing certain limits in an API. Such a specification needs, of course, some kind of versioning.

0 replies

mmd-osm · 2023-01-04T21:26:34Z

mmd-osm
Jan 4, 2023

Have you talked to the Rails maintainers about the specification topic already? Would they have the bandwidth to keep such a document up-to-date, or even write it in the first place (if not, who else would do that)?

0 replies

joto · 2023-01-05T07:55:59Z

joto
Jan 5, 2023
Maintainer

@mmd-osm Writing such a document would, of course, be part of this "new data model" effort we are just engaging in. As I said, this is not just about the API, this has to be a larger agreement on how we interpret, transport, store, interact with, etc. the OSM data. And yes, this is a large effort. But, I believe, a necessary one. The OSM ecosystem is not just the API, everything has to fit together.

0 replies

simonpoole · 2023-01-05T09:12:36Z

simonpoole
Jan 5, 2023
Author

a larger agreement

That is going to be a larger chore, currently there is not even agreement on something as trivial as what the version number in API output means (API version vs data model/file format version).

But in my experience documentation efforts in OSM fail not because it isn't possible to document the status quo but because nobody is willing to be bound by such documents (and as a consequence adhere even to a skeleton process when things have to be / should change).

0 replies

simonpoole · 2023-01-05T09:33:42Z

simonpoole
Jan 5, 2023
Author

@joto it might be a good idea to move this issue and #26 to discussions.

0 replies

mmd-osm · 2023-01-05T09:34:19Z

mmd-osm
Jan 5, 2023

Speaking from past experience, I already tried to generate some API documentation with the nice extra of being (somewhat) machine readable: openstreetmap/openstreetmap-website#3107 - it would move the task of updating the documentation from the WIki (where the documentation suffers from severe NIH syndrome) to the Rails repo itself.

Well it wasn't all that successful, because there was no buy-in, and nobody with enough time to spend on the topic.

0 replies

fititnt · 2023-01-05T17:36:35Z

fititnt
Jan 5, 2023

@joto @simonpoole @mmd-osm I was drafting this proposal (with a bit more context about W3C, IETF, OCG, etc) for the OSMF subforum on the discourse, however I think I can suggest the core parts here.

Use the Respec https://respec.org/ for individual informative/normative specification,
Use Specref https://www.specref.org/ (or create our own, but it is better to add metadata on the bibliography used by other projects too). This also means everything needs a short identifier that will never change (at least after approved)
Have an index (can be just an HTML pointing to everything else, otherwise people will not find the things).
Have a place (can be different prefixes on some site, different output of Respec can have different goals) that would be the final URL. This should never change. The editor's drafts often are just GitHub repos themselves.
Better to split several projects than single big ones, because things can change versions individually. Also, what goes into the documents (in special the normative ones about data formats) better be focused, avoid too much explanation and even when done, mark as informative (so errors misspellings are treated differently).
Invite third parties that do not publish their relevant standards/conventions (like the vector tiles) in other places like OCG/IETF/W3C/etc to publish some documentation on whatever becomes the new place for OpenStreetMap ecosystem (Here the copyright may be different).

IMOH the Respec is fantastic for editors/authors (and if done at scale, then some conventions like copyright, disclaimers, etc need to be done in a centralized manner, but despite being HTML, the editor's draft get data with JavaScript). The older version used something called bikesheed which required a command line (I used circa 2016), but now things can pretty much do static hosting. There's even built-in validation for errors on the documents (like external references mismatch). I've been doing one draft here (live documentation here), but I've been missing how to cite all other specifications/conventions.

And yes, I would help volunteer myself to set up all these things! Even if mostly to allow the more RDF-like related content to be well documented, because even these documents would need to cite others (that may not exist, but at least work as a dummy target page; think like for example the need to cite what would be the "specification of the API v0.5" that may never be created beyond the Wiki, but needs to exist to not cause massive rewrite on several repos).

0 replies

fititnt · 2023-01-05T18:37:50Z

fititnt
Jan 5, 2023

Speaking from past experience, (...) Well it wasn't all that successful, because there was no buy-in, and nobody with enough time to spend on the topic.

OpenAPI 3.1 is fantastic. Used to be called Swagger. The tooling even allows generating mocha servers and simulating tests. One page to know is this one https://openapi.tools/

However, OpenAPI/Swagger is machine readable, mostly for tooling itself, not a data standard. We could also have a reference, but some tools may have small differences, so maybe we could try some sort of selective importing.

The Respec I mentioned is mostly to what would be today Wiki and likely allow workflow for working groups and some top-down decision (like licensing, style guides, the statuses of official recommendation, etc) and by comments such as Tom said there, your suggestions about Swagger/OpenAPI might worth a dedicated place, because it actually is supposed to be shared by different application.

it would move the task of updating the documentation from the WIki (where the documentation suffers from severe NIH syndrome)

Trivia: while early draft, I'm writing tooling to parse Wiki markup into some sort of JSON-LD (that I plan to start discussions with others to define an schema) and also allow convert that JSON-LD as files (currently the implementation is mostly extract files from Wiki page as zip download). This is mostly to extract <syntaxhighlight lang=""> on the Wiki and tables as CSV (didn't implemented other parsers, like natural language), but in theory, if people really want stick with Wiki, then I could write a synchronizations that dump documentation outside. Worst case scenario, some minimalist way to extract information from the Wiki could be used to generate documentation outside. However, not just this still requires human review at least the first time, but something such as Respec is more friendly than some templated output from the Wiki markup that intentionally would be restricted. So yes, even with NIH syndrome and extensive use of the Wiki, we could make something.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API limits #35

{{title}}

Replies: 14 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

API limits #35

Replies: 14 comments

joto Jan 4, 2023 Maintainer

simonpoole Jan 4, 2023 Author

simonpoole Jan 4, 2023 Author

joto Jan 4, 2023 Maintainer

joto Jan 5, 2023 Maintainer

simonpoole Jan 5, 2023 Author

simonpoole Jan 5, 2023 Author

joto
Jan 4, 2023
Maintainer

simonpoole
Jan 4, 2023
Author

simonpoole
Jan 4, 2023
Author

joto
Jan 4, 2023
Maintainer

joto
Jan 5, 2023
Maintainer

simonpoole
Jan 5, 2023
Author

simonpoole
Jan 5, 2023
Author