Supporting alternative base64/json encoder/decoder #168

mattrobenolt · 2022-02-17T20:16:59Z

Would you be open to supporting alternative base64 implementations? The one we've been using is https://github.com/segmentio/asm/tree/main/base64

It's significantly faster than the stdlib encoding, and takes shortcuts that I don't see why wouldn't be acceptable here.

Ideally, if the token parsers were abstracted to accept a Encoding interface, we could pass in our own, with default behaviors to maintain encoding/base64.

In similar vein, similar can be said for JSON codecs. The stdlib is known for having sub-optimal performance here. Swapping encoding/json for something like, fastjson (https://github.com/valyala/fastjson) would be nice.

I'm new to this project and haven't dug in much, so I don't know all the consequences of this or how to gauge it yet, but I know for us internally, we've adopted these codec alternatives for packages we control.

The text was updated successfully, but these errors were encountered:

oxisto · 2022-02-17T22:06:36Z

In theory, that could be an interesting option to explore. However, because this library is used by many users, we basically have two major design principles maybe should write them down somewhere :):

Zero (0, nil, null) external dependencies
Strict adherence to a non-API breaking way of adding features (within the same major version)

So, if we find a way that is non-breaking (and defaults for the stdlib), has no dependencies to the mentioned project (which should be doable, if we introduce interfaces here), I would not strongly object it. Personally, I do not see a major use case though. Not having any hard numbers here, but my guess is that the most expensive operation is the crypto part, so not sure how large the performance gain really is. I do get the idea of having the other libraries in your project and having this consistent.

Testing it would be a little tricky though, without importing the other libraries and we cannot really make any guarantees that everything works with external libraries.

Would you mind drafting up a PR?

mattrobenolt · 2022-02-17T22:30:07Z

I can look through it. The nice part about working through interfaces here is we wouldn't need to add any external dependencies.

We can just define a Base64Encoder interface and JSONEncoder interface, and as long as libraries conformed to the expected methods, we'd allow them to be plugged in without strictly depending on them here.

I agree that crypto bits would likely outweigh this, but if you're operating on the order of tens of thousands of rps, these wins with encoding/decoding are likely a measurable %age gain.

I'll probably take a pass and see what a theoretical implementation would look like and throw together a benchmark to prove/disprove that this would be worthwhile.

oxisto · 2022-02-17T22:38:44Z

I had a quick look, starting with the base64 part and I fear that this is not done easily. The main encoding is done in Token.SigningString via EncodeSegment (which then calls base64.RawURLEncoding.EncodeToString). Unfortunately EncodeSegment is exported (although we want to deprecate the export), so changing it is not that easy. One way I tried is to use a function type and have this in the Token struct as a variable.

diff --git a/token.go b/token.go
index 09b4cde..d6a576e 100644
--- a/token.go
+++ b/token.go
@@ -7,7 +7,6 @@ import (
 	"time"
 )

-
 // DecodePaddingAllowed will switch the codec used for decoding JWTs respectively. Note that the JWS RFC7515
 // states that the tokens will utilize a Base64url encoding with no padding. Unfortunately, some implementations
 // of JWT are producing non-standard tokens, and thus require support for decoding. Note that this is a global
@@ -26,6 +25,9 @@ var TimeFunc = time.Now
 // Header of the token (such as `kid`) to identify which key to use.
 type Keyfunc func(*Token) (interface{}, error)

+type EncodeSegmentFunc func([]byte) string
+type DecodeSegmentFunc func(string) ([]byte, error)
+
 // Token represents a JWT Token.  Different fields will be used depending on whether you're
 // creating or parsing/verifying a token.
 type Token struct {
@@ -35,6 +37,8 @@ type Token struct {
 	Claims    Claims                 // The second segment of the token
 	Signature string                 // The third segment of the token.  Populated when you Parse a token
 	Valid     bool                   // Is the token valid?  Populated when you Parse/Verify a token
+
+	b64enc EncodeSegmentFunc
 }

 // New creates a new Token with the specified signing method and an empty map of claims.
@@ -76,15 +80,19 @@ func (t *Token) SigningString() (string, error) {
 	var err error
 	var jsonValue []byte

+	if t.b64enc == nil {
+		t.b64enc = EncodeSegment
+	}
+
 	if jsonValue, err = json.Marshal(t.Header); err != nil {
 		return "", err
 	}
-	header := EncodeSegment(jsonValue)
+	header := t.b64enc(jsonValue)

 	if jsonValue, err = json.Marshal(t.Claims); err != nil {
 		return "", err
 	}
-	claim := EncodeSegment(jsonValue)
+	claim := t.b64enc(jsonValue)

 	return strings.Join([]string{header, claim}, "."), nil
 }

This could then be configured by NewWithClaims and functional options that we explored elsewhere as well.

Problematic is the decode part. Decoding is done in 7 places (1x for the header, 1x for the claim and 5x for the individual signatures). Header and claim are probably easy because they are part of the Parser struct, which already has functional options. More tricky (or maybe close to impossible is the signature part, and you would need to extends its Verify method. Not impossible, but not one-liner either.

dcalsky · 2022-03-04T10:29:53Z

After fine-tuning the structure of Token, I tested the SigningString method that gains the significant benefit from substituting std json encoder, from std to bytedance/sonic, which is likely to improve performance for parsing and signing both.

And promising no breaking changes, I'll raise an PR to implement this idea.

func BenchmarkSigningString(t *testing.B) {
	token := &jwt.Token{
		Raw:       data.fields.Raw,
		Method:    data.fields.Method,
		Header:    data.fields.Header,
		Claims:    data.fields.Claims,
		Signature: data.fields.Signature,
		Valid:     data.fields.Valid,
	}
	t.Run("std", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			_, _ = token.SigningString()
		}
	})

	t.Run("sonic", func(b *testing.B) {
		token.JsonEncFunc = sonic.Marshal
		for i := 0; i < b.N; i++ {
			_, _ = token.SigningString()
		}
	})
}

Benchmark:

goos: darwin
goarch: amd64
pkg: github.com/golang-jwt/jwt/v4
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkStdSignString
BenchmarkStdSignString/std
BenchmarkStdSignString/std-12         	  670904	      1545 ns/op
BenchmarkStdSignString/sonic
BenchmarkStdSignString/sonic-12       	 1538072	       766.3 ns/op

oxisto · 2023-02-21T18:27:22Z

#278 is a first step towards that direction. While we will probably not solve this particular issue with v5.0.0, my goal would be to get #278 accepted for v5.0.0, since it breaks some of the API, but lays the groundwork to implement the rest of it without breaking the API again.

oxisto · 2023-03-31T06:09:15Z

@mattrobenolt @dcalsky Now that this should be possible with v5 (at least for b64), would any of you be interested in drafting a PR?

dcalsky · 2023-04-02T10:36:52Z

@mattrobenolt @dcalsky Now that this should be possible with v5 (at least for b64), would any of you be interested in drafting a PR?

I'd like to. I'll submit the PR after ending my vacation.

orangeswim · 2024-10-14T06:56:40Z

Hi everyone. Does this issue need a new PR to keep it moving for custom Json encoder/decoder?
echo repo did something similar, labstack/echo#1880

oxisto · 2024-10-14T10:07:24Z

Hi everyone. Does this issue need a new PR to keep it moving for custom Json encoder/decoder? echo repo did something similar, labstack/echo#1880

It is actually already part of #301. Progress on this is slow for two reasons:

The PR was since reworked (largely my be) from the original authors contributions, but some minor things (mostly documentation, cleanup) is still missing, it is a little bit unclear who is actually doing that (or should do that), probably me 🤣
@mfridman was not 100 % convinced of the benefits of this feature in contrast to the inner reworking that we need to do; I wanted to do some follow up investigation whether we have any detrimental effects of this (e.g. on performance), but I also didn't do that (yet)

This was referenced Feb 21, 2023

Remove package globals #277

Closed

Moving DecodeSegement to Parser #278

Merged

dcalsky linked a pull request Apr 2, 2023 that will close this issue

Support custom json and base64 encoders for Token and Parser #301

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting alternative base64/json encoder/decoder #168

Supporting alternative base64/json encoder/decoder #168

mattrobenolt commented Feb 17, 2022

oxisto commented Feb 17, 2022 •

edited

Loading

mattrobenolt commented Feb 17, 2022

oxisto commented Feb 17, 2022 •

edited

Loading

dcalsky commented Mar 4, 2022 •

edited

Loading

oxisto commented Feb 21, 2023

oxisto commented Mar 31, 2023

dcalsky commented Apr 2, 2023

orangeswim commented Oct 14, 2024

oxisto commented Oct 14, 2024

Supporting alternative base64/json encoder/decoder #168

Supporting alternative base64/json encoder/decoder #168

Comments

mattrobenolt commented Feb 17, 2022

oxisto commented Feb 17, 2022 • edited Loading

mattrobenolt commented Feb 17, 2022

oxisto commented Feb 17, 2022 • edited Loading

dcalsky commented Mar 4, 2022 • edited Loading

oxisto commented Feb 21, 2023

oxisto commented Mar 31, 2023

dcalsky commented Apr 2, 2023

orangeswim commented Oct 14, 2024

oxisto commented Oct 14, 2024

oxisto commented Feb 17, 2022 •

edited

Loading

oxisto commented Feb 17, 2022 •

edited

Loading

dcalsky commented Mar 4, 2022 •

edited

Loading