plateau

flat files, flat land

plateau is a Python library to manage (create, read, update, delete) large amounts of tabular data in a blob store. It stores data as datasets, which it presents as pandas DataFrames to the user. Datasets are a collection of files with the same schema that reside in a blob store. plateau uses a metadata definition to handle these datasets efficiently. For distributed access and manipulation of datasets plateau offers a Dask interface.

Storing data distributed over multiple files in a blob store (S3, ABS, GCS, etc.) allows for a fast, cost-efficient and highly scalable data infrastructure. A downside of storing data solely in an object store is that the storages themselves give little to no guarantees beyond the consistency of a single file. In particular, they cannot guarantee the consistency of your dataset. If we demand a consistent state of our dataset at all times, we need to track the state of the dataset. plateau frees us from having to do this manually.

The plateau.io module provides building blocks to create and modify these datasets in data pipelines. plateau handles I/O, tracks dataset partitions and selects subsets of data transparently.

Installation

Installers for the latest released version are availabe at the Python package index and on conda-forge.

# Install with pip
pip install plateau

# Install with conda/micromamba, optionally add conda-forge as a source
# conda config --add channels conda-forge
conda install plateau
micromamba install plateau

Name		Name	Last commit message	Last commit date
Latest commit History 1,090 Commits
.github		.github
asv_bench		asv_bench
docs		docs
plateau		plateau
reference-data		reference-data
tests		tests
.codecov.yml		.codecov.yml
.coveragerc		.coveragerc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CHANGES.rst		CHANGES.rst
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

plateau

Installation

About

Releases 11

Packages

Contributors 11

Languages

License

data-engineering-collective/plateau

Folders and files

Latest commit

History

Repository files navigation

plateau

Installation

About

Resources

License

Stars

Watchers

Forks

Releases 11

Packages 0

Contributors 11

Languages

Packages