Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simd_json::from_reader (and from_slice) end in error when trying to deserialize to a serde_json::RawValue #365

Open
bassmanitram opened this issue Jan 24, 2024 · 4 comments

Comments

@bassmanitram
Copy link

The error is "invalid type: newtype struct, expected any valid JSON value".

The use case is demonstrated by this GIST. Changing the version of lambda_http to 0.8 makes the use case work.

The difference is that the AWS code is now trying to deserialize to a RawValue where it previously serialized to a serde::__private::de::Content. Obviously the new code is far cleaner, but it doesn't work with the simd deserializer.

Any help would be appreciated.

(Equivalent issue in AWS world - awslabs/aws-lambda-rust-runtime#792)

@Licenser
Copy link
Member

I'm currently on vacation so I won't have time to look deeply into it for a while but at a first glance I think RawValue is the issue. Unrelated to that this is going to be fairly slow so not much benefit from the decode is to be had.

The slowness comes from that the bytes first goes to a nested structure (RawValue) this is expensive, then is translated to a struct using the serde traits (expensive again).

If you were out for performance I'd say use to_tape, try to introspect the content for the type, then turn that in a sturct. That would be no overhead to directly translating it to a struct.

@bassmanitram
Copy link
Author

Hey - have a good vacation. Yeah, I too spotted the slowness of the approach and am talking to the AWS guys about it - thanks for the tape tip!

I have narrowed the problem down to this basic code that fails (yes it is RawValue):

	let mut payload3 = "{}".to_string();
	let payload3 = unsafe {payload3.as_bytes_mut()};
	//let mut de = serde_json::Deserializer::from_slice(payload3);
	let mut de = simd_json::Deserializer::from_slice(payload3).unwrap();
	let raw_value: Box<RawValue> = Box::deserialize(&mut de).expect("Boxed RawValue");

As above, it fails. Uncomment serde_json and comment out simd_json and it works.

@bassmanitram
Copy link
Author

With respect to speed - you'lld be surprised - well, I was! Ok, not using "tape" but String->RawValue->&str->type is by FAR the fastest algorithm in serde JSON - compared to AWS v0.8 which used serde::__private::de::Content::deserialize and an attempt I made to speed things up, which just used serde_json::Value, the RawValue route was by far the fastest. Trying to use the simd_json deserializer in the two of those three contexts that work also proved to be slower than serde_json ... back to the drawing board, then.

@bassmanitram
Copy link
Author

bassmanitram commented Jan 26, 2024

Ok, I've got me a tape and it should be able to do what I want ... but I don't see how to deserialize that into the target struct - I'm hoping there is bridge to serde_json here ?

What it seems I need is to reconstruct a Deserializer from the tape. Since the Deserializer is simply a tape and an index, would it be possible to add a from_tape constructor function? Even it that's unsafe (i.e. I have to be certain the tape is valid before using it- which, in my case I would - 'cause I just got the tape from a deserializer)?

Any help will be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants