-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pq writer can become stuck after precisely filling queue max events #16172
Labels
Comments
We can exhibit this in a pipeline-to-pipeline with a simpler config:
and:
The
|
yaauie
added a commit
to yaauie/logstash
that referenced
this issue
May 21, 2024
A PQ is considered full (and therefore needs to block before releasing the writer) when its persisted size on disk _exceeds_ its `queue.max_bytes` capacity. This removes an edge-case preemptive block when the persisted size after writing an event _meets_ its `queue.max_bytes` precisely AND its current head page has insufficient room to also accept a hypothetical future event. Fixes: elastic#16172
4 tasks
yaauie
changed the title
singleton pq writer can become stuck after precisely filling head page
pq writer can become stuck after precisely filling queue max events
May 21, 2024
yaauie
added a commit
that referenced
this issue
May 22, 2024
* pq: avoid blocking writer when queue is precisely full A PQ is considered full (and therefore needs to block before releasing the writer) when its persisted size on disk _exceeds_ its `queue.max_bytes` capacity. This removes an edge-case preemptive block when the persisted size after writing an event _meets_ its `queue.max_bytes` precisely AND its current head page has insufficient room to also accept a hypothetical future event. Fixes: #16172 * docs: PQ `queue.max_bytes` cannot be less than `queue.page_capacity`
github-actions bot
pushed a commit
that referenced
this issue
May 22, 2024
* pq: avoid blocking writer when queue is precisely full A PQ is considered full (and therefore needs to block before releasing the writer) when its persisted size on disk _exceeds_ its `queue.max_bytes` capacity. This removes an edge-case preemptive block when the persisted size after writing an event _meets_ its `queue.max_bytes` precisely AND its current head page has insufficient room to also accept a hypothetical future event. Fixes: #16172 * docs: PQ `queue.max_bytes` cannot be less than `queue.page_capacity` (cherry picked from commit ea93086)
4 tasks
yaauie
added a commit
that referenced
this issue
May 22, 2024
* pq: avoid blocking writer when queue is precisely full A PQ is considered full (and therefore needs to block before releasing the writer) when its persisted size on disk _exceeds_ its `queue.max_bytes` capacity. This removes an edge-case preemptive block when the persisted size after writing an event _meets_ its `queue.max_bytes` precisely AND its current head page has insufficient room to also accept a hypothetical future event. Fixes: #16172 * docs: PQ `queue.max_bytes` cannot be less than `queue.page_capacity` (cherry picked from commit ea93086) Co-authored-by: Ry Biesemeyer <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Logstash information:
Please include the following information:
bin/logstash --version
):8.8.0
..main
Plugins installed: (
bin/logstash-plugin list --verbose
)N/A
JVM (e.g.
java -version
):Bundled, but N/A
OS version (
uname -a
if on a Unix-like system):MacOS, but N/A
Description of the problem including expected versus actual behavior:
When a pipeline is configured with a
queue.max_bytes
that matches itsqueue.page_capacity
,and writes an event whose serialized form precisely fits in the remaining capacity of the queue,
the single writer can become blocked indefinitely.
After writing the serialized bytes to the head page, if there is not room for more events,
the writer will enter a loop in which it blocks until notified on the
notFull
condition variable, repeating the loop until such time as there is room for more bytes.The
notFull
condition is only signaled when:But signalling the
notFull
condition is not enough; the queue will remain "full" (insufficient capacity to write a single byte) until capacity is recovered by a tail page becoming fully-acked and removed.In some conditions a second writer could arrive and effectively break the block by rolling the current head page over into a tail page before the events had been acknowledged (temporarily exceeding
queue.max_bytes)
, but once that window is missed the block is permanent.Steps to reproduce:
event.original
)watch
to observe the PQ until itsqueue_size_in_bytes
exactly matches itsmax_queue_size_in_bytes
, observing that the events also stop flowing in a state where (a) the queue backpressure is complete and (b) the workers are functionally starved.Provide logs (if relevant):
Workarounds
I have thus far been unable to replicate the issue when
queue.max_bytes`` is at least _double_
queue.page_capacity(whose default is 64MiB, or
67108864`).The text was updated successfully, but these errors were encountered: