Skip to content

Storage

William Silversmith edited this page Jul 2, 2019 · 10 revisions
# Standard Threading

from cloudvolume import Storage

with Storage('s3://testbucket/dataset/layer') as stor:
    files = stor.get_files([ ... ])

# Cooperative (Green) Threading

import gevent.monkey
gevent.monkey.patch_all(threads=False)
from cloudvolume.storage import GreenStorage

with GreenStorage('gs://testbucket/dataset/layer') as stor:
    files = stor.get_files([ ... ])

# No Threading

from cloudvolume.storage import SimpleStorage 

stor = SimpleStorage('file:///path/to/dataset/')
content = stor.get_file('filename')
content = stor['filename']
stor['filename'] = content

Storage reads and writes local files and buckets on AWS S3 and Google Cloud Storage. It is threaded, robust to network interruption using random exponential backoff with seven retries, and uses a connection pool to avoid connection initiation overhead.

Storage object multi-threading can offer a performance advantage compared to using raw libraries. The with statement above ensures that Storage object threads are cleaned up (otherwise you'll need to call .kill_threads().

Several varieties of Storage are available. Storage by default uses 20 preemptive Python threads. GreenStorage uses gevent cooperative thread pools (requires monkey patching). SimpleStorage is single threaded and is suitable for using with other threading solutions or for avoiding overhead associated with starting and stopping threads.

Additionally, SimpleStorage supports a dictionary interface, and using the with statement is not necessary as no threads need to be destructed. It will release its connection when the destructor is called.