Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no charset parameter on application/json #6901

Open
1 task done
reschke opened this issue Jan 3, 2024 · 5 comments
Open
1 task done

There is no charset parameter on application/json #6901

reschke opened this issue Jan 3, 2024 · 5 comments
Assignees
Labels
P3 no SLO package=net type=defect Bug, not working as expected

Comments

@reschke
Copy link

reschke commented Jan 3, 2024

Description

Ref: https://datatracker.ietf.org/doc/html/rfc8259#section-11 (last paragraph)

But see:

public static final MediaType JSON_UTF_8 = createConstantUtf8(APPLICATION_TYPE, "json");

This type should be deprecated (and users encouraged to use a variant without charset).

Example

MediaType.JSON_UTF_8 is in non-sense media type; no charset param exists for JSON.

Expected Behavior

Use should lead to a deprecation warning.

Actual Behavior

It does not.

Packages

com.google.common.net

Platforms

No response

Checklist

@reschke reschke added the type=defect Bug, not working as expected label Jan 3, 2024
@cpovirk
Copy link
Member

cpovirk commented Jan 3, 2024

Interesting, thanks. I don't know that this has come up before.

We actually had application/json without a charset before replacing it with the current constant back in 2012 (before MediaType was added to Guava). That presumably was the right move then (since it predates the 2017 RFC you've shared).

It is interesting that the RFC also says "Adding [a charset] really has no effect on compliant recipients," which suggests that including one should be harmless for compliant recipients.

But wait, https://www.rfc-editor.org/errata/eid5853 says that that sentence should be replaced :\ I'd have to read more to understand whether including a charset parameter should in fact technically be harmless.

There's additionally the question of whether the charset parameter makes things better or worse for non-compliant recipients. (And then there's the question of whether helping non-compliant recipients is a good thing or a bad thing... :))

Our internal security guidance says that it is "critical" to include the charset parameter. That said, the guidance dates from at least 7 years ago, and I don't know how recently it's been reevaluated. Some chain of other links led me to https://portswigger.net/research/json-hijacking-for-the-modern-web, which was from 2016 (with some kind of update in 2022), which likewise suggests that the charset is important (or at least was back then). However, I haven't read it nearly closely enough to have much confidence in anything.

Someone seems to be reporting that Dart needed the parameter back in 2019. Ditto some "HttpClient" in 2020.

And I've seen another report or two that some receivers reject anything that includes charset (example)....

I fear that we could end up the latest project to have "ping-ponging this back and forth, and there's always some broken client."

We could consider talking more with our security people to see what they recommend. We'd want to have a pretty solid understanding before nudging users toward a change that might break something that had previously been working (whether it was really supposed to be working or not).

@netdpb
Copy link
Member

netdpb commented Jan 3, 2024

In general, are extra, unrecognized parameters considered an error in media types?

@eamonnmcmanus
Copy link
Member

But wait, https://www.rfc-editor.org/errata/eid5853 says that that sentence should be replaced :\ I'd have to read more to understand whether including a charset parameter should in fact technically be harmless.

That's marked as "Reported", which just means that someone thought it would be a good idea to make that change. I don't think we can conclude anything from it.

(I've been fooled by RFC Errata before.)

@reschke
Copy link
Author

reschke commented Jan 3, 2024

Exactly - unless it's verified it doesn't mean anything.

@reschke
Copy link
Author

reschke commented Jan 3, 2024

In general, are extra, unrecognized parameters considered an error in media types?

Usually no.

The problem is more educational: sending "charset=UTF-8" sort of implies that "charset=UTF-16" would change the encoding detection. And that would be a bug.

As would be to require the presence of the param.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 no SLO package=net type=defect Bug, not working as expected
Projects
None yet
Development

No branches or pull requests

4 participants