Replies: 2 comments 1 reply
-
This is a really good idea. |
Beta Was this translation helpful? Give feedback.
-
I also think this is a really good idea. However today WASM stands as a mechanism that can transform one topic into another. The source topic (or topics) are left unmodified. To me this idea lends on WASM to customize how redpanda itself behaves in certain situations w.r.t. compaction in this case. This could be the future of WASM and it isn't the first time I've heard of these types of ideas, I know @rystsov had ideas about using WASM to customize replication strategies or something of the sort. I think this warrants maybe a wider discussion on the future scope of WASM. |
Beta Was this translation helpful? Give feedback.
-
Compaction in redpanda and other broker (kafka, pulsar, ...) has only one strategy that could be summarized as
group messages by key, keep the last one
or as a SQL query
SELECT
key,
LAST_VALUE(body) OVER (PARTITION BY key, ORDER BY offset) AS body
FROM
messages
GROUP BY
key
that's fine for most use cases but not all of them.
As an exemple in case of messages in the form of
(data, timestamp, multiplicity)
with multiplicity as a relative integer (+1 means insert once, -1 delete once)
In this case the right compaction method should be :
group by data (or hash(data)) then keep higest timestamp and sum all multiplicities.
or as a SQL query
SELECT
HASH(body->data) as key, -- in this case key is optional and should only be used for partitioning
STRUCT(
body->data AS data,
MAX(body->timestamp) AS timestamp,
SUM(body->multiplicity) AS multiplicity
) AS body
FROM
messages
GROUP BY
body->data
there can be even weirder case if for example we use the materialize CDC format with progress messages
in this case we should also compact progress messages which means knowing how many messages the new compacted transaction will contain.
A good solution for this would be to expose a set of customizable functions that the user would provide as WASM so that compaction could be customized to any one need. each topic would have the possibility to use a different strategy.
Beta Was this translation helpful? Give feedback.
All reactions