-
Notifications
You must be signed in to change notification settings - Fork 48
Storage
# Standard Threading
from cloudvolume import Storage
uploadable_files = [ (filename, content) for filename, content in ... ]
with Storage('s3://testbucket/dataset/layer') as stor:
files = stor.get_files([ ... ])
stor.put_files(
uploadable_files,
content_type='application/octet-stream',
compress='gzip',
cache_control='no-cache'
)
info = stor.get_json('info')
stor.put_json('info', info)
result = stor.exists([ filenames ... ])
stor.delete_files([ filenames ... ])
# Cooperative (Green) Threading
import gevent.monkey
gevent.monkey.patch_all(threads=False)
from cloudvolume.storage import GreenStorage
with GreenStorage('gs://testbucket/dataset/layer') as stor:
files = stor.get_files([ ... ])
# No Threading
from cloudvolume.storage import SimpleStorage
stor = SimpleStorage('file:///path/to/dataset/')
content = stor.get_file('filename')
content = stor['filename']
stor['filename'] = content
Storage
reads and writes local files and buckets on AWS S3 and Google Cloud Storage. It is threaded, robust to network interruption using random exponential backoff with seven retries, and uses a connection pool to avoid connection initiation overhead.
Storage object multi-threading can offer a performance advantage
compared to using raw libraries. The with
statement above ensures that
Storage object threads are cleaned up (otherwise you'll need to call .kill_threads()
.
Several varieties of Storage are available. Storage by default uses 20 preemptive Python threads. GreenStorage uses gevent cooperative thread pools (requires monkey patching). SimpleStorage is single threaded and is suitable for using with other threading solutions or for avoiding overhead associated with starting and stopping threads.
Additionally, SimpleStorage supports a dictionary interface, and using the with
statement is not necessary as no threads need to be destructed. It will release its connection when the destructor is called.