Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Blob storage throttling upload blobs #394

Open
ajs97 opened this issue Apr 8, 2021 · 1 comment
Open

Azure Blob storage throttling upload blobs #394

ajs97 opened this issue Apr 8, 2021 · 1 comment

Comments

@ajs97
Copy link

ajs97 commented Apr 8, 2021

Hi, I am facing some performance issues while uploading blobs to Azure blob storage using this particular SDK: I am uploading 64MB sized blobs, experimenting around with various values of parallelism_factor (4/8/16). When I upload around 1GB of data for parallelism = 8/16, I get around 110MBps, but when I increase the total to about 5GB, the total throughput drops to around 50MBps. I checked the intermediate throughput, and I see that for the initial few blobs I get 80-90MBps, but for the subsequent blobs the throughput drops to 40-50MBps, and it even drops down to 20MBps.

Note that I am uploading these blobs sequentially.

Do you know what the possible reason could be for the difference in throughput for total size, and if there is some configuration which would lead to better throughput for large amount of data uploaded?

Note that for my use case, it is important to upload data in 64MB blobs, and the total amount of data uploaded will be in 10s of GBs, and would like to optimize for this use case.
Thanks.

@ajs97 ajs97 changed the title Azure Blob storage throttling puts Azure Blob storage throttling upload blobs Apr 8, 2021
@ljluestc
Copy link

import asyncio
from azure.storage.blob import BlobServiceClient, BlobClient, BlobType

async def upload_blob(container_name, blob_name, data):
blob_service_client = BlobServiceClient.from_connection_string("<your_connection_string>")
container_client = blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(blob_name)
await blob_client.upload_blob(data, blob_type=BlobType.BlockBlob)

async def main():
container_name = "<your_container_name>"
total_size = 5 * 1024 * 1024 # 5GB
blob_size = 64 * 1024 * 1024 # 64MB

tasks = []
for i in range(total_size // blob_size):
    blob_name = f"blob_{i}"
    data = b"Your 64MB data here"  # Replace with your actual data
    task = asyncio.ensure_future(upload_blob(container_name, blob_name, data))
    tasks.append(task)

await asyncio.gather(*tasks)

if name == "main":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants