Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: Skip req and beresp trailers #4125

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

dridi
Copy link
Member

@dridi dridi commented Jun 18, 2024

This is the first step of the following plan:

  • process but drop incoming trailers to preserve HTTP framing
  • forward verbatim trailers for pass transactions
    • guarded by an experimental::pass_trailers flag
  • manipulate trailers in VCL

The first step, this pull request, should allow applications producing trailers to work with Varnish when the trailers are not strictly required.

The second step should allow protocols like gRPC to pass through Varnish, with the caveats that body filters could break the meaning of trailers (for example checksums), hence the experimental flag.

It will require internal changes, and from the look of it we may not be able to directly use the existing data structures and we may need to a dedicated OA_TRAILERS object attribute.

The last step will likely require changes to the VCL state machines in addition to new symbols.

Most of the work on trailers was done by @walid-git under my supervision.

This pull request includes work from other pull requests too:

This is a lot of commits, but they should all have a reasonable size for reviewers. I already reviewed Walid's work, and already addressed my own review items while Walid is away. I added Walid or myself as a co-author when I made significant changes (at the scale of individual commits).

@dridi
Copy link
Member Author

dridi commented Jun 18, 2024

All failing platforms appear to overflow the workspace used in the h2 test case, except the sanitizers job that fails on something else.

I will have a closer look later.

@daghf daghf self-requested a review June 18, 2024 14:52
@dridi
Copy link
Member Author

dridi commented Jun 24, 2024

As it turned out, the failing tests were exposing a race revolving around the request body status. Transitioning to BS_TAKEN would compete against the new transition to BS_TRAILERS (happening in different threads in the h2 case).

I took care of that and added a couple patches next to the ones picked from #3798 at the beginning of this patch series. I'm no longer able to stress test cases and cause failures and I believe the race is gone.

I have not yet studied the remaining CI failure, triggering a panic from the workspace emulator. I suspect either a logic error in the workspace pipelining (expanding from req keep-alive only to req and beresp trailers) or an error in how it translates in the workspace emulator.

I also started an effort to improve feature parity between HTTP/1 and h2 txreq in varnishtest but this is not ready.

@dridi
Copy link
Member Author

dridi commented Jun 25, 2024

As I suspected, the emulated WS_Pipeline() was incorrect. There was a missing workspace release too.

@dridi
Copy link
Member Author

dridi commented Jul 2, 2024

bugwash:

The following commits can be merged: 23ec1fc...9875258

The workspace refactoring commits can be submitted independently.

@bsdphk
Copy link
Contributor

bsdphk commented Jul 2, 2024

I'm fine with you commiting those first six commits while we wait for @nigoroll

@dridi
Copy link
Member Author

dridi commented Jul 3, 2024

As per bugwash:

nigoroll and others added 13 commits July 8, 2024 19:26
Until now, we read the (CR)?LF at the end of a chunk as part of the
next chunk header (see: /* Skip leading whitespace */).

For a follow up commit, we are going to want to know if the next chunk
header is available for read, so we now consume the chunk end as part
of the chunk itself.

This also fixes a corner case: We previously accepted chunks with a
missing end-of-chunk (see fix of r01729.vtc).

Ref: https://datatracker.ietf.org/doc/html/rfc7230#section-4.1
... which we are going to need in a follow up commit.

No functional changes, diff best viewed with -b
While working on request body caching improvements, it was noticed
that, for chunked encoding, we did not know the request body size
before we attempted to read at least one additional byte because
the chunked fetch did not return VFP_END with the last chunk, but
rather only as a zero-length additional chunk.

This commit changes chunked encoding processing to return VFP_END
opportunistically if the next chunk header is available.

Implementation:

Unless there is pipeline readahead data available, the test for
available data implies a poll(2) call. Relative to the existing cost
of the current implementation, this cost is not considered relevant.

To improve efficiency, we should consider a generic readahead for
chunked encoding, probably by making the file descriptor non blocking
and adding a readahead buffer.
It seems this test now shows no data loss more frequently, which I
hope should be fine for the purpose of the test?
Now that WS_Pipeline() can be used on backend side too, we may have
situations where we run out of workspace when copying pipelined data.

We should gracefully handle these failures by failing the task instead
of panicking. For existing fail-safe call sites, we check the result.
This will be the method responsible for reading and parsing http trailers.
This commit only prepares the structure, the method is implemented in follow
up commits.
This will allow us to check that we have received a complete end of
chunked body, possibly including HTTP traillers.

Co-authored-by: Dridi Boukelmoune <[email protected]>
It will be used to convey that the body was fully fetched and trailers
are expected or were already encountered.
For now, we simply do nothing about them.
It cannot be reused for req headers, but it will apply to beresp and req
trailers.
At this point trailer fields are just checked for correctness and
discarded.

Co-authored-by: Walid Boudebouda <[email protected]>
dridi and others added 10 commits July 8, 2024 19:26
It is no longer the responsibility of the VFP to read the final
(CR)?LF in chunked bodies, because trailers are not part of the
request or response body. For backend responses trailers are
handled separately.

The only reason why this does not break keep-alive for the client
is that (CR)?LF sequences are legitimately ignored between requests,
so an empty trailer list just turns into a no-op (CR)?LF for the
next client request.

This should also be the case for backend responses, but having extra
unread empty lines breaks connection pooling instead. We can probably
as part of recycling connections clear remaining (CR)?LF sequences
and close the connection with SC_JUNK if something remains.
This is not allowed for HEADERS frames containing trailers.
This is currently the only state where HEADERS frames are processed, but
only a subset of the operations will be shared by trailers.

Better diff with the --ignore-all-space option.
It will not necessarily be a new stream once trailers are involved.
It is now possible to receive a HEADERS frame on an OPEN stream. The
HPACK block is processed but trailers are skipped. Therefore a workspace
overflow is not a failure. Trailers are processed from the h2 session,
and the request might be processed concurrently in a different worker,
so we can't use the request workspace, even as a temporary buffer.

Instead of relying on scratch space on the stack, we can use the h2
session's req workspace. This will make things easier once we collect
trailers instead of merely skipping them.
@dridi
Copy link
Member Author

dridi commented Jul 8, 2024

Force-pushed to rebase against trunk after merging #4130.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants