Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite streamUpload #22

Closed
wants to merge 6 commits into from
Closed

Rewrite streamUpload #22

wants to merge 6 commits into from

Conversation

domenkozar
Copy link
Contributor

@domenkozar domenkozar commented Mar 24, 2020

I wanted to rewrite streaming to improve performance of chunking. DList is quite inefficient compared to vectorBuilder. See https://www.fpcomplete.com/blog/2014/07/vectorbuilder-packed-conduit-yielding

A few improvements along the way:

  • control flow is still a mess, bit I think a bit better now
  • concurrent uploads during streaming

Need testing.

Out of scope: rewriting concurrentUpload in terms of streamUpload

@domenkozar domenkozar force-pushed the rewrite branch 2 times, most recently from 79a50ac to 59bc3b4 Compare March 24, 2020 18:41
@domenkozar domenkozar changed the title Rewrite Rewrite streamUpload Mar 24, 2020
@domenkozar domenkozar marked this pull request as ready for review March 24, 2020 18:43
@domenkozar
Copy link
Contributor Author

cc @axman6

@domenkozar
Copy link
Contributor Author

Something is wrong, it's leaking memory.

Copy link
Owner

@axman6 axman6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @domenkozar, despite the mostly negative comments below, I do very much like the restructuring of this - splitting things into conduits is something I probably should have already done, and the use of concurrentMapM_ quite nice, I like the idea of being able to build up Parts while others are being uploaded.

It looks like there's very serious issue with vectorBuilder - unless I'm really not understanding the code, it looks like it completely avoids streaming the data, which will lead to huge memory problems, and defeat the purpose of this library.

src/Network/AWS/S3/StreamingUpload.hs Outdated Show resolved Hide resolved
import qualified Data.Vector.Storable as ScopedTypeVariables
import Conduit ( MonadUnliftIO(..), mapC, PrimMonad )
import Data.Conduit ( ConduitT, Void, await, catchC, handleC, (.|), leftover, yield, awaitForever)
import Data.Conduit.Combinators (vectorBuilder, sinkList)
import Data.Conduit.List ( sourceList )
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the formatting of imports consistent? (Looks like I wasn't very consistent, but I though this repo had a stylish-haskell config so it should be fixed automatically if you run that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (I hope, there weren't consistent beforehand).

Comment on lines +113 to +127
startUpload :: (MonadUnliftIO m, MonadAWS m, MonadFail m, MonadResource m) =>
ConduitT
(Int, ByteString)
Void
m
(Either
(AbortMultipartUploadResponse, SomeException)
CompleteMultipartUploadResponse)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
startUpload :: (MonadUnliftIO m, MonadAWS m, MonadFail m, MonadResource m) =>
ConduitT
(Int, ByteString)
Void
m
(Either
(AbortMultipartUploadResponse, SomeException)
CompleteMultipartUploadResponse)
startUpload :: (MonadUnliftIO m, MonadAWS m, MonadFail m, MonadResource m)
=> ConduitT (Int, ByteString) Void m
(Either (AbortMultipartUploadResponse, SomeException)
CompleteMultipartUploadResponse)

@@ -85,74 +92,107 @@ See the AWS documentation for more details.

May throw 'Network.AWS.Error'
-}
streamUpload :: (MonadUnliftIO m, MonadAWS m, MonadFail m)
streamUpload :: (MonadUnliftIO m, MonadAWS m, MonadFail m, MonadResource m, PrimMonad m)
Copy link
Owner

@axman6 axman6 Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have been talking about replacing thew MonadAWS m constraint with AWSConstraint r m, if possible it'd be good to do that here too. That would bring in the MonadResource constraint IIRC. Can any of these constraints be removed easily enough? MonadFail is a bit of a pain, I had thought about making the return type of this conduit a sum type instead of the Either, which could include a constructor for failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate if that's out of scope here.

src/Network/AWS/S3/StreamingUpload.hs Outdated Show resolved Hide resolved
src/Network/AWS/S3/StreamingUpload.hs Show resolved Hide resolved
src/Network/AWS/S3/StreamingUpload.hs Outdated Show resolved Hide resolved
src/Network/AWS/S3/StreamingUpload.hs Outdated Show resolved Hide resolved
- more performance chunking
- concurrent upload
- a tiny bit better flow
@domenkozar
Copy link
Contributor Author

domenkozar commented May 18, 2020

@axman6 I've updated the PR to use raw memory buffer that should become part of the conduit api at some point snoyberg/conduit#438

The last blocking thing is amazonka release as it needs some extra monad instances for AWST. See brendanhay/amazonka#574

mpickering and others added 3 commits May 20, 2020 10:30
This reduces GC time on the 100mb upload from 2s to 0.1s.

(cherry picked from commit 72c4628)
Signed-off-by: Domen Kožar <[email protected]>
This is approximately 80x faster.

(cherry picked from commit 5824b37)
Signed-off-by: Domen Kožar <[email protected]>
@domenkozar
Copy link
Contributor Author

I can confirm with this code I can upload to s3 bucket with ~80MB/s.

@domenkozar
Copy link
Contributor Author

@axman6 ping?

@axman6
Copy link
Owner

axman6 commented Jun 5, 2020

Sorry, I hadn't seen the updates here. One of the goals I'd had for the conduit side of things was to avoid copying data if possible, which is why the I was using DLists of bytestrings. I'm not sure that is a great idea though, and the use of a preallocated buffer is nice, but it makes me think that maybe we should just be using byte string builders if we're going to go that route, and be more sure that we're not going to have dangerous memory accesses. I might make a fork of this branch and play with it to see if I can come up with something I'm happier with.
Again sorry this has taken so long, life been hectic and maintaining libraries has been far from the front of my mind.

Copy link
Owner

@axman6 axman6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a go at rewriting this at the moment to use Bytestring.Builder. Initial results are very promising, but I'm worried something is going wrong with my benchmarks - I'm unable achieve the 80MB/s speeds you mentioned, for some reason I'm only getting 8MB/s, and there's definitely a memory leak, it seems to be holding onto all of the input data.

With the Builder version, I'm achieving 80MB/s from an EC2 host in the same region as the bucket. I'm also seeing a similar memory leak though.

Comment on lines +311 to +326
processChunk :: ChunkSize -> ByteString -> S -> IO ([ByteString], S)
processChunk chunkSize input =
loop id 0
where
loop front idxIn s@(S fptr ptr idxOut)
| idxIn >= B.length input = return (front [], s)
| otherwise = do
pokeByteOff ptr idxOut (unsafeIndex input idxIn)
let idxOut' = idxOut + 1
idxIn' = idxIn + 1
if idxOut' >= chunkSize
then do
let bs = PS fptr 0 idxOut'
s' <- newS chunkSize
loop (front . (bs:)) idxIn' s'
else loop front idxIn' (S fptr ptr idxOut')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's quite a convoluted way to implement memcpy, which already exists in as copyBytes. Also does line 316 mean that the buffer might end up being filled to less than chunkSize? the multipart upload API requires that parts are at least 5MB, so we need to be sure that we always send at least chunkSize bytes in each request.
I'm struggling to assess the correctness of this code, for example I can't tell without thinking very hard whether it's possible that chunks might get not be included in the output - what happens to input in the idxIn >= B.length input case? I think this is another good reason to use Builders - we can accumulate input bytestrings until we have allocated at least chunkSize bytes, and then output the builder, and its length, so we can use https://hackage.haskell.org/package/bytestring-0.10.10.0/docs/Data-ByteString-Builder-Extra.html to produce a lazy byte string with a single chunk. This avoids doing all the pointer mangling by hand.

@domenkozar
Copy link
Contributor Author

@axman6 what would you like to do with this?

@domenkozar domenkozar closed this Sep 9, 2021
@domenkozar
Copy link
Contributor Author

domenkozar commented Sep 10, 2021

I'm using this in production for http://cachix.org/.

I understand the code is not ideal, but it improves effectiveness and CPU/memory usage over the current code.

I also understand you want to keep the code tidy and correct.

Closing as I don't see a way forward. That said, thanks a lot for this library!

@ghost
Copy link

ghost commented Oct 1, 2021

Hey @domenkozar, would you be interested in taking over this package? I don't have a use for it or time to maintain it (not that I ever did really). If so, I am happy to add/make you the maintainer on Hackage and you can keep it going with releases if you like. It's great to hear it's being used in production for something so useful, and I'm even happier to know it's not using my atrocious code! 😅

@domenkozar
Copy link
Contributor Author

Hey @axman6 / @axman6-da, I gave this a thought and I'm not sure I'm in a good place to maintain this library as well. Already have too many packages on my plate :(

Maybe you could make a call for maintainers and see if someone steps up?

@endgame
Copy link
Collaborator

endgame commented Jul 5, 2024

If the memory leak is in request bodies like the Glacier issue linked upthread, snoyberg/http-client#538 seems pretty suspicious as a cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants