Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set of features for Thoth.Json v5 #56

Open
MangelMaxime opened this issue Feb 10, 2020 · 21 comments
Open

Set of features for Thoth.Json v5 #56

MangelMaxime opened this issue Feb 10, 2020 · 21 comments
Milestone

Comments

@MangelMaxime
Copy link
Contributor

  • Move the path construction in the Error case only
  • Make Thoth.Json(.Core) parser agnostic. The idea is to have all the decoder/encoder logic write once, and have a contract to fill in order to change the parser. Like that people could use Newtonsoft, JavaScript native API, their own parser, Parsimmon, etc. depending on their preferences
  • Force the encoder/decoder to be in sync
  • Allow the user to override the standard decoder. For example, perhaps they want to represents the DateTime in a specific manner
@MangelMaxime
Copy link
Contributor Author

Should we add the possibility to support different representation for the types? (see #57)

It should already be possible to use "override of the standard decoder" with a Decode.oneOf in it, but we can/want to provide a cleaner/easier API for this case.

@MangelMaxime MangelMaxime pinned this issue Feb 14, 2020
@MangelMaxime MangelMaxime changed the title Set of feature for Thoth.Json v5 Set of features for Thoth.Json v5 Feb 14, 2020
@MangelMaxime
Copy link
Contributor Author

Add a way to not crash the encoder/decoding process if no coder is found a type. This is need for Elmish.Debugger, because people can store anything in their model and it's not really friendly to ask them to provide encoders/decoders for using a dev-tool. (see #59)

Make a strong note in the documentation that this feature is really risky because it will weaken Thoth.Json security when this flag is active.

@njlr
Copy link
Contributor

njlr commented Mar 5, 2020

+1 for not unifying the parser into one. I had issues where JSON parsers implemented on the JS side were not as performant as the native JSON.parse.

@MangelMaxime
Copy link
Contributor Author

MangelMaxime commented Mar 5, 2020

@njlr It will never be as performant as the native JSON.parse because we are calling it and then checking the validity of the JSON data.

@MangelMaxime
Copy link
Contributor Author

Today I read this article and it made me think once again about how we deal with JSON.


I never was a big fan of the Auto modules.

The implementation is hard to maintain and people want complete control over what it is generating by customizing just the one things they don't like by default or which needs to be deal depending on the case.

Auto modules are also necessary slower than manual decoder. Indeed manual decoder just does the action needed for the conversion while auto coders have a lot of condition, use reflection, etc. before doing the conversion.

Unfortunately, people seem to prefer the auto decoders compared to the manual decoder because it is hiding the complexity and/or the boilerplate code. At least, it was the reason that leads to its creation.

I think I will remove the auto module in a next release probably the one where I refactor the whole package by trying to mutualize the code etc.

If I am to remove it I do agree that writing the coders manually is not fun. For this reason, I will seriously explore the code generation that I left on the side for a long time.

Code generation will allow people to not write "dummy" manual coders when the default generation is doing the job for them. And if they need to customize it then they can deactivate the code generation and copy/paste/modify the generated code to fit their needs.

@alexswan10k
Copy link

I wanted to just briefly weigh in on this, as I am currently using both approaches for different reasons.

For serializing to/from persistent stores, the manual encoders are the only way imo. They give you total control, and this has enabled us to use a versioning system so that historic json payloads with different shapes can be automatically transposed onto the new model structure, saving the need to mess around running data migrations and all that pain.

However, going from server to client, we have quite a complicated domain model, and using Auto coders is an extremely easy way to pass symmetrical payloads to and from the client without doing any grunt work. This is especially handy when you have shared request/response typed payloads that are forever changing. I totally appreciate that if one of the two goes out of sync, your going to get catastrophic explosions, but our release cadence mandates that the client always goes out with the server, so this is just not a problem for us (and in my experience this is not uncommon unless microservices).

It would be a shame to entirely loose Auto coders, even though they are probably abused far too often. Perhaps it could exist in a standalone package?

@MangelMaxime
Copy link
Contributor Author

MangelMaxime commented May 16, 2020

Thank you for sharing @alexswan10k

And I do agree, if auto-coders are to be kept during the big rewrite they will have to go in a separate package.

Edit:

I think I need to accept that auto coders are a thing and it's something which is selling the library. But I need to re-think the implementation, set of features and also the documentation in order to explain why/when to use them and also what is supported so people know exactly what they can do with it.

@MangelMaxime
Copy link
Contributor Author

Today, I started prototyping the next version of Thoth.Json.

I wanted to check if my idea of using an interface to abstract the implementation details about which JSON parser is being used could work. And it seems like yes 🎉 🎉

Here are the implementations for the Decode modules:

As we can see the code required to add a new parser is quite small :)

My theory is that we will have this architecture:

  • Thoth.Json.Core -> Library which provide and use the abstract interface IDecoderHelpers<'JsonValue> to implement the logic to hand JSON manipulation
  • Thoth.Json.Fable | Thoth.Json.Newtonsoft | Thoth.Json.XXX -> Libraries which implements the interface IDecoderHelpers<'JsonValue> using a specific framework and also provides the "runners" Decode.fromString, Decode.unsafeFromString and Decode.fromValue.

The next step for me is to mutualize the tests between the different implementations because right now, I duplicated them locally for testing purpose. Then, I will take the time to re-activate all the decoders because I disabled some of them as they were a bit more complex to port and I preferred to focus on building a small prototype first.

Mutualizing the tests should show us how it will look like consuming Thoth.Json from different parsers. In theory, we should not see anymore the compiler directives :D

I am also introducing another repository called https://github.com/thoth-org/Thoth.Json.Meta which aims to use submodules in order to keep each project in their own repository but still allow them to be sync.

I think it will be more generic to the whole Thoth ecosystem at some point because Thoth.Fetch could probably be added too I think. The idea behind this "meta" repository is to solve the problem of sharing the code/tests/etc. between the different implementation.

@MangelMaxime
Copy link
Contributor Author

Hum, I am facing a problem...

I had to use a generic 'JsonValue in order to make the implementation parser agnostic but the generic need to propagate in the outer scope.

This lead to code like that:

let levelOfAwesomenessDecoder<'JsonValue> =
    Decode.string<'JsonValue>
    |> Decode.andThen (fun value ->
        match value with
        | "MaxiAwesome" -> Decode.succeed MaxiAwesome
        | "MegaAwesome" -> Decode.succeed MegaAwesome
        | _ ->
            sprintf "`%s` is an invalid value for LevelOfAwesomeness" value
            |> Decode.fail
    )

Otherwise, the decoder cannot be used when using ProjectRefence it would works if people use file reference but that's not really idiomatic.

And this code doesn't work (at least I don't know how to make the generic propagate correctly):

image

A solution could be to not use module/function to declare Decode and Encode feature but use classes. However, I think the code will look to:

let levelOfAwesomenessDecoder<'JsonValue> =
    Decode<_>.string
    |> Decode<_>.andThen (fun value ->
        match value with
        | "MaxiAwesome" -> Decode.succeed MaxiAwesome
        | "MegaAwesome" -> Decode.succeed MegaAwesome
        | _ ->
            sprintf "`%s` is an invalid value for LevelOfAwesomeness" value
            |> Decode.fail
    )

Which is really ugly

I think I need to explore new ideas in order to make the implementation parser agnostic and cross-platform.

@MangelMaxime
Copy link
Contributor Author

MangelMaxime commented May 29, 2020

New idea:

Disclaimer: I am writing my though down because it helps me shape my ideas and not forget, so perhaps it is not always well explain sorry about that. As soon, as I have something working I would make a proper post to ask feedback etc. as I do in general.

I think my current rewrite is trying to solve several things at the same times:

  1. Make contribution easier especially for the tests of Thoth.Json.Net
    -> Use a meta-repository with submodules to organise the project structure

  2. Mutualize the code between the different implementation of Thoth.Json
    -> I tried using generic because it was looking like the solution but it doesn't work out

  3. Make it easier to write parser agnostic code
    -> Related to point 2

  4. Improve the performance by moving the path construction on the error path and not success case
    -> This is just an implementation detail and it is working in my local repo

  5. Try to improve the user experience by removing the compiler directives
    -> Related to point 2

The goal of using generics was to have a contract :

type IDecoderHelpers<'JsonValue> =
    abstract GetField : FieldName : string -> value : 'JsonValue -> 'JsonValue
    abstract IsString : value : 'JsonValue -> bool
    abstract IsBoolean : value : 'JsonValue -> bool
    abstract IsNumber : value : 'JsonValue -> bool
    abstract IsArray : value : 'JsonValue -> bool
    abstract IsObject : value : 'JsonValue -> bool
    abstract IsNullValue : value : 'JsonValue -> bool
    abstract IsIntegralValue : value : 'JsonValue -> bool
    abstract IsUndefined : value : 'JsonValue -> bool
    abstract AnyToString : value : 'JsonValue -> string

    abstract ObjectKeys : value : 'JsonValue -> string seq
    abstract AsBool : value : 'JsonValue -> bool
    abstract AsInt : value : 'JsonValue -> int
    abstract AsFloat : value : 'JsonValue -> float
    abstract AsFloat32 : value : 'JsonValue -> float32
    abstract AsString : value : 'JsonValue -> string
    abstract AsArray : value : 'JsonValue -> 'JsonValue[]

were each implementation could implement the contract. The problem as explained in my previous message is that the generics are escaping the bound of the "parser related library".

If we think in term of macro generic is just a feature of the compiler which allows us to re-use code. In my case, the goal was to have Thoth.Json.Core, Thoth.Json.Fable, Thoth.Json.Newtonsoft, etc.

What if we were going back a bit and integrate directly Thoth.Json.Core into each of the library-specific implementation.

The idea would be in that case to share the code not by using generics code but by referencing directly the implementation files and just having the "contract" written in another file.

Something like that:

<!-- Thoth.Json.Fable -->
<?xml version="1.0" encoding="utf-8"?>
<Project Sdk="Microsoft.NET.Sdk">
  <ItemGroup>
    <Compile Include="Contract.fs" /> <!-- The file would probably be called Helpers.fs -->
    <Compile Include="Thoth.Json.Core/Shared/Decode.fs" />
    <Compile Include="Thoth.Json.Core/Shared/Encode.fs" />
  </ItemGroup>
</Project>

<!-- Thoth.Json.Newtonsoft -->
<?xml version="1.0" encoding="utf-8"?>
<Project Sdk="Microsoft.NET.Sdk">
  <ItemGroup>
    <Compile Include="Contract.fs" /> <!-- The file would probably be called Helpers.fs -->
    <Compile Include="Thoth.Json.Core/Shared/Decode.fs" />
    <Compile Include="Thoth.Json.Core/Shared/Encode.fs" />
  </ItemGroup>
</Project>

The idea is that files under Thoth.Json.Core/Shared/* are re-used directly inside the different library. The end-user would still need to use compiler directives in it's shared code but at least the implementation would be the same for each specific library.

We could always use a few compiler directives inside the shared implementation when there are runtime specific requirements:

image

For example, in order to support enum on Fable runtime, we need to inline the decoder in order to get the type information used for the reflection.

This approach is similar to how it is down today. But today, we kind of copy/paste the code between the repo and adapt the code in Thoth.Json.Net especially for Auto module, the goal would be to have as much as possible the same implementation so adding new parser support would "just" be about implementing the Contract.fs file and releasing a new package (and of course running the tests etc.)

@MangelMaxime
Copy link
Contributor Author

Work for the next version of Thoth.Json has started in #76

The goal of this work is to mutualize as much as possible the code between the different implementation. Right now, I have ported the code for:

  • Thoth.Json.Fable -> this is using the native JSON.Parse API
  • Thoth.Json.Newtonsoft -> equivalient of Thoth.Json.Net which use Newtonsoft API

The goal of mutualizing the code is that it should be possible to add a new target like JSON.Net if someone prefers this library or for performance reason.

These packages are already working and follows the same philosophy as before for sharing the code. User need to use compiler directives currently THOTH_JSON_FABLE and THOTH_JSON_NEWTONSOFT.

Current thought:

One goal of the new rewrite was also to make it easier to share code, for this reason, I plan on converting the package Thoth.Json to be platform-independent #38 so people can share code using NuGet and avoid compiler directives.

However, the performance will probably not be as good as when using native JSON.Parse or Newtonsoft or any performance specialized JSON parser. This one will be to provide easy to use code sharing.

If people really need highly performant parser they will need to use dedicated package using compiler directives. I think that for most users it should be ok to use Thoth.Json.

@MangelMaxime MangelMaxime added this to the Beyond milestone Jul 29, 2021
@laurentpayot
Copy link

Hi, I’m using a web worker to transmit JS objects (MessageEvent.data) to Fable. Thanks to the structured clone algorithm, these objects should be the same as if they were the result of a successful JSON.parse() operation (no functions, etc.).
But to leverage all the great Thoth.Json features (structure validation, getting F# types), I have first to convert my JS object to a JSON string, then use Decode.fromString that will internally convert the JSON string back to a JS object.

Is there a way I could avoid these unneeded JS to JSON + JSON to JS transformations?
Could something like a Decode.fromObj be conceivable?

@njlr
Copy link
Contributor

njlr commented Jun 14, 2024

Hi, I’m using a web worker to transmit JS objects (MessageEvent.data) to Fable. Thanks to the structured clone algorithm, these objects should be the same as if they were the result of a successful JSON.parse() operation (no functions, etc.). But to leverage all the great Thoth.Json features (structure validation, getting F# types), I have first to convert my JS object to a JSON string, then use Decode.fromString that will internally convert the JSON string back to a JS object.

Is there a way I could avoid these unneeded JS to JSON + JSON to JS transformations? Could something like a Decode.fromObj be conceivable?

Perhaps worth opening a fresh issue for this?

Decode.value and Encode.value may be what you need.

@laurentpayot
Copy link

@njlr I created the dedicated issue #196 for this.

@dansowter
Copy link

dansowter commented Nov 8, 2024

@MangelMaxime Thanks for your hard work on Thoth.

I'm one of the founders at https://stacktracehq.com/ (small software consultancy) and https://cubcare.com.au/ (my main job. Stack is F# + Elm), and we use Thoth quite heavily (always from dotnet, never Fable). The vast majority of our Encoders/Decoders are manual (human authored + code-gen), and we use them to serialize over the wire to Elm, and to store our domain events into PostgreSQL. To rehydrate the domain without snapshotting (ie replaying all of history), we might deserialize up to 10M rows, and we regularly transcribe all of these events into a new DB table/schema for a few reasons, one of which is so that we can migrate JSON schema versions over time.

I'd like to understand what performance improvements might be possible in our systems if we keep Thoth, but swap Newtonsoft for System.Text.Json, Utf8Json, SpanJosn etc. Ideally, I'd prefer to test this first (on the assumptions that it wouldn't require changes to the encoding schemas, and that it would be helpful to other folks using the library). If it's not enough of an improvement, we'd try alternative JSON approaches or something binary, but would be doing so having explored the cheaper option first (and we could share the performance implications for anyone else here working with large datasets).

Am I correct in understanding that main in this repo has at least some of the changes you're hoping to make for #38 , but that today, the two repos are separate, and may have drifted?

We'd be open to either contributing (if you're happy to give us some pointers on how you'd like things laid out) or providing some funding to you or someone you nominate in order to see if these can be reconciled so we can answer the peformance questions.

Happy to chat here or over [email protected] for commercials, if you prefer.

Thanks again - it's a great library, especially for folks targeting Elm.

@MangelMaxime
Copy link
Contributor Author

Hello @dansowter,

Sorry for the late answer I was taking a few day break.

Thank you for using Thoth.Json and trusting it.

I'd like to understand what performance improvements might be possible in our systems if we keep Thoth, but swap Newtonsoft for System.Text.Json, Utf8Json, SpanJosn etc.

In general, I always expect Thoth.Json to always be slower compared to using Newtonsoft or System.Text.Json, etc. directly. This is because Thoth.Json does more validation work under the hood compared to these frameworks. This is what gives us the good error message that Thoth.Json provides.

But I never confirmed it using a benchmark because perhaps because we are using manual API and not reflection things are different in reality.

In regards of performance, the new version of Thoth.Json (the unified implementation) should be faster than the previous version because we moved some of error related computation to the error path only.

Ideally, I'd prefer to test this first (on the assumptions that it wouldn't require changes to the encoding schemas, and that it would be helpful to other folks using the library).
Am I correct in understanding that main in this repo has at least some of the changes you're hoping to make for #38 , but that today, the two repos are separate, and may have drifted?

With the new version of Thoth.Json, supporting a new JSON library is much easier. In the past, you needed to fork the library making it difficult to maintain and also introducing others problems for people using both Fable and .NET.

Now all libraries target have a unified API provided by Thoth.Json.Core, and they just need to implement 2 interfaces and 2-3 functions.

You can see it done for Thoth.Json.Newtonsoft for example.

And the API between Thoth.Json.Net and Thoth.Json.Core + Thoth.Json.Newtonsoft should be the same at least in regard of the manual API. Auto API is not yet supported but you don't use it so you should be fine.
You just need to adjust the open statements.

Thanks to the new architecture testing out different frameworks and benchmarking them should not be too difficult.

@MangelMaxime
Copy link
Contributor Author

You can see an example of how to implement a new library in this PR #210

It implements System.Text.Json, I will release later this weekend.

@MangelMaxime
Copy link
Contributor Author

@dansowter

Thoth.Json.System.Text.Json package has been released.

I didn't do any benchmark on it's performance yet.

@MangelMaxime
Copy link
Contributor Author

@dansowter

I did some basic benchmark and as expected Thoth.Json is slower than standard libraries. But thanks to the benchmark, I was able to make Thoth.Json.System.Text.Json a little bit faster.

You can see the whole benchmark result here

@dansowter
Copy link

Hi @MangelMaxime,

Thank you so much for all your recent efforts, both with the code changes and the thoughtful replies you’ve shared. The benchmarks you provided, along with your explanations, have been incredibly helpful in steering our next steps.

After reviewing everything, we've decided to explore code-generation targeting SpanJson directly, rather than continuing to use Thoth in our specific context. Your work has made it clear that while Thoth offers excellent validation and error messages, direct use of SpanJson aligns better with our performance requirements for large-scale deserialization tasks.

That said, I want to express how much we’ve appreciated working with Thoth and the effort you’ve put into making it such a great library. It’s been a critical part of our workflow for a long time, and I’ll continue recommending it to others where the use case fits.

Thank you again for everything!

If there's anything that would be helpful for us to share (some of the code-gen or performance gains, maybe?), please don't hesitate to ask.

@MangelMaxime
Copy link
Contributor Author

@dansowter

Indeed, your use case with over 10M rows is special and definitely falls under the exception of

For most of the use cases, Thoth.Json should be fast enough, not everyone needs to parse JSON at the speed of light.

I think Thoth.Json could be made faster, if instead of making an extensible library like done now with Thoth.Json.Core and Thoth.Json.JavaScript, Thoth.Json.Newtonsoft, etc. but this would make the code harder to work with, with a lot of compiler directives + there could be issues with the dependency list.

If Thoth.Json was to use its own parser, it would solve the issue of the dependency list and would allow the reader/parser to be optimise with Thoth.Json approach of handling JSON. Like it would not need to parse 2 the JSON structure which is kind of what is happening now.

But I don't think I would be able to compete with performance of others libraries as I don't have a lot of experience in optimising code or using Span, byref code.

If there's anything that would be helpful for us to share (some of the code-gen or performance gains, maybe?), please don't hesitate to ask.

Yes for sure, if you have something to share regarding your experience regarding the pro/cons of using Thoth.Json you can write a blog post somewhere or an issue. And we can add it to the list of linked blog post from the readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants