Impedence mismatches, a.k.a. The "PHP/Lua array problem" #275
awwright
started this conversation in
Specification
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In recent calls (#267, #270) a common issue was raised on how programming languages should operate when they cannot distinguish between two non-equivalent JSON values. For example, the JSON documents
{}
and[]
both map to an empty PHP array or Lua array, since they use key-value maps to represent ordered lists (they are one and the same).In database design, this category of problems is generally known as an "impedance mismatch"—in a bit of a tortured analogy to electrical engineering, for our purposes, I refer to situations where neither value space can completely represent the other, and so certain values cannot be stored then recalled correctly.
While JSON Schema is defined only over the lexical value space (the actual characters that make up an application/json document), JSON Schema implementations are generally allowed to validate values in memory if the result on the lexical serialization would be the same. A couple passages describe this:
JSON Schema Core 4.1: Instance Data Model
And JSON Schema Core 4.2.3:
What this means practically is that empty arrays that can map to both
[]
and{}
are not compliant. This is a problem for the class of applications where JSON Schema is being used to define a protocol: If I'm defining an API, and I say only booleans are allowed, it would be odd if I submitted an empty string, or number zero, and that were accepted. This would appear to me that the API is broken or not following its own schema. The behavior of a protocol should not be influenced by the underlying language or runtime.However, the vast majority of implementations are not concerned with validating lexical space (protocols, or other forms of JSON documents), but rather in validating the "application value space": The data that will actually be used, after it's been unmarshalled. From the viewpoint of the developer of an application, you are obviously more interested in the value the application sees, rather than how the value was encoded for the wire.
While I think specifications should shy away from excessive implementation guidance, guidance is obviously justified for the purpose of addressing divergent behavior or interoperability problems—as is clearly the case here for two reasons: ① different implementations will address the same problem differently, and ② that the very act of encoding or decoding JSON will lead to data loss is inherently an interoperability problem.
It seems to me that the spec should, at the minimum, expand the language on "in-memory" validation to point out that their behavior may not interoperate if those values are ever converted to lexical space (JSON); and how to avoid such problems.
Beta Was this translation helpful? Give feedback.
All reactions