-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amazonka-2.0: Better streaming of parts? #26
Comments
I think I agree on all points, though I think that in the branch which will target amazonka >= 1.6.1 I'll stick to the current implementation of building a buffer and writing that out when its full (needing some the changes we've spoken about regarding capping the chunk size etc.) For Amazonka 2.0, I wouldn't mind playing with concurrency for the chunk sending, but what I'm not too sure about is what constraints I am happy to place on the user so that I can fork threads and still manage resources and exceptions. Lifted-async seems useful but I don't know how onerous MonadBaseControl would be for a user of the lib. I could potentially offer a concurrent-stream-in-and-send-to-s3 version with a MonadBaseControl constraint and one with the current constraints that buffers and sends the chunk before moving onto the next one. |
You will have to - chunked uploads are broken on 1.6.1. For Amazonka-2.0, if you want streaming directly from the source conduit into amazonka (without assembling the buffer yourself), I can't see a way that doesn't involve some kind of rendezvous structure. I'd stay away from anything demanding |
For -- | Extract a single chunk from a stream, reallocating as little as possible.
streamChunk :: forall m a.
MonadIO m
=> ChunkSize
-> SealedConduitT () ByteString m a
-> SealedConduitT () ByteString m (SealedConduitT () ByteString m a)
streamChunk size (SealedConduitT pipe) =
SealedConduitT $ SealedConduitT <$> pipeChunk pipe
where
pipeChunk
:: Pipe () () ByteString () m a
-> Pipe () () ByteString () m (Pipe () () ByteString () m a)
pipeChunk = loop size
where
loop !n = \case
HaveOutput p b -> case compare n bLen of
-- Emit 'n' bytes from 'b' and push the rest onto the return value
LT ->
let (bN, bRest) = BS.splitAt n b
in HaveOutput (pure $ HaveOutput p bRest) bN
-- Emit 'b' and then we're done
EQ -> HaveOutput (pure p) b
-- 'b' fits entirely in the stream we want to emit
GT -> HaveOutput (loop (n - bLen) p) b
where
bLen = BS.length b
NeedInput f _ -> loop n $ f ()
Done a -> pure $ Done a
PipeM m -> PipeM $ loop n <$> m
Leftover p () -> loop n p I can't figure out how to plug this in, because I am increasingly of the opinion that having "input" in a streaming library's core transformer type is a misfeature, and you should instead consume an input stream as a function argument. |
At a high level,
amazonka-s3-streaming
drawsByteString
s from aconduit
and assembles them into the parts of an S3 Multi-Part Upload:amazonka-s3-streaming/src/Network/AWS/S3/StreamingUpload.hs
Lines 88 to 91 in b5e41dc
At the moment, chunked uploads are broken on the Hackage 1.6.1 release of amazonka (see e.g. https://github.com/brendanhay/amazonka/issue/596 https://github.com/brendanhay/amazonka/issue/547), but once amazonka-2.0 comes out, better streaming would become possible - but for some frustrating limitations:
UploadPart
operations without buffering each part in memory or unnecessarily rechunking the streamstreaming
looks nice,pipes-group
looks completely unusable,streamly
... does something, I guess?) but I doubt I could ram through a change to such a core part of the library without causing disproportionate amounts of pain.SealedConduitT
which we can't provide toamazonka
as achunkedBody
. Even if we could, there's no way for the request to know how much to take, nor any way to return the leftover stream (see haddock for e.g., (($$++)
).TBQueue
, along wtih end-of-part/end-of-stream markers.stm-conduit
has been unmaintained for years, so maybe we need to write something like:ByteString
s around without copying them, and with a reasonable size on theTBQueue
we'd start preparing the next upload before the first one finishes.The text was updated successfully, but these errors were encountered: