Optimus

Overview

Optimus is an opinionated python library to easily load, process, plot and create ML models that run over pandas, Dask, cuDF, dask-cuDF, Vaex or Spark.

Some amazing things Optimus can do for you:

Process using a simple API, making it easy to use for newcomers.
More than 100 functions to handle strings, process dates, urls and emails.
Easily plot data from any size.
Out of box functions to explore and fix data quality.
Use the same code to process your data in your laptop or in a remote cluster of GPUs.

See Documentation

Try Optimus

To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges:

Installation (pip):

In your terminal just type:

pip install pyoptimus

By default Optimus install Pandas as the default engine, to install other engines you can use the following commands:

Engine	Command
Dask	`pip install pyoptimus[dask]`
cuDF	`pip install pyoptimus[cudf]`
Dask-cuDF	`pip install pyoptimus[dask-cudf]`
Vaex	`pip install pyoptimus[vaex]`
Spark	`pip install pyoptimus[spark]`

To install from the repo:

pip install git+https://github.com/hi-primus/[email protected]

To install other engines:

pip install git+https://github.com/hi-primus/[email protected]#egg=pyoptimus[dask]

Requirements

Python 3.7 or 3.8

Examples

You can go to 10 minutes to Optimus where you can find the basics to start working in a notebook.

Also you can go to the Examples section and find specific notebooks about data cleaning, data munging, profiling, data enrichment and how to create ML and DL models.

Here's a handy Cheat Sheet with the most common Optimus' operations.

Start Optimus

Start Optimus using "pandas", "dask", "cudf","dask_cudf","vaex" or "spark".

from optimus import Optimus
op = Optimus("pandas")

Loading data

Now Optimus can load data in csv, json, parquet, avro and excel formats from a local file or from a URL.

#csv
df = op.load.csv("../examples/data/foo.csv")

#json
df = op.load.json("../examples/data/foo.json")

# using a url
df = op.load.json("https://raw.githubusercontent.com/hi-primus/optimus/develop-23.5/examples/data/foo.json")

# parquet
df = op.load.parquet("../examples/data/foo.parquet")

# ...or anything else
df = op.load.file("../examples/data/titanic3.xls")

Also, you can load data from Oracle, Redshift, MySQL and Postgres databases.

Saving Data

#csv
df.save.csv("data/foo.csv")

# json
df.save.json("data/foo.json")

# parquet
df.save.parquet("data/foo.parquet")

You can also save data to oracle, redshift, mysql and postgres.

Create dataframes

Also, you can create a dataframe from scratch

df = op.create.dataframe({
    'A': ['a', 'b', 'c', 'd'],
    'B': [1, 3, 5, 7],
    'C': [2, 4, 6, None],
    'D': ['1980/04/10', '1980/04/10', '1980/04/10', '1980/04/10']
})

Using display you have a beautiful way to show your data with extra information like column number, column data type and marked white spaces.

display(df)

Cleaning and Processing

Optimus was created to make data cleaning a breeze. The API was designed to be super easy to newcomers and very familiar for people that comes from Pandas. Optimus expands the standard DataFrame functionality adding .rows and .cols accessors.

For example you can load data from a url, transform and apply some predefined cleaning functions:

new_df = df\
    .rows.sort("rank", "desc")\
    .cols.lower(["names", "function"])\
    .cols.date_format("date arrival", "yyyy/MM/dd", "dd-MM-YYYY")\
    .cols.years_between("date arrival", "dd-MM-YYYY", output_cols="from arrival")\
    .cols.normalize_chars("names")\
    .cols.remove_special_chars("names")\
    .rows.drop(df["rank"]>8)\
    .cols.rename("*", str.lower)\
    .cols.trim("*")\
    .cols.unnest("japanese name", output_cols="other names")\
    .cols.unnest("last position seen", separator=",", output_cols="pos")\
    .cols.drop(["last position seen", "japanese name", "date arrival", "cybertronian", "nulltype"])

Need help? 🛠️

Feedback

Feedback is what drive Optimus future, so please take a couple of minutes to help shape the Optimus' Roadmap: http://bit.ly/optimus_survey

Also if you want to a suggestion or feature request use https://github.com/hi-primus/optimus/issues

Troubleshooting

If you have issues, see our Troubleshooting Guide

Contributing to Optimus 💡

Contributions go far beyond pull requests and commits. We are very happy to receive any kind of contributions
including:

Documentation updates, enhancements, designs, or bugfixes.
Spelling or grammar fixes.
README.md corrections or redesigns.
Adding unit, or functional tests
Triaging GitHub issues -- especially determining whether an issue still persists or is reproducible.
Blogging, speaking about, or creating tutorials about Optimus and its many features.
Helping others on our official chats

Backers and Sponsors

Become a backer or a sponsor and get your image on our README on Github with a link to your site.

Name		Name	Last commit message	Last commit date
Latest commit History 6,411 Commits
.github		.github
binder		binder
conda/recipes		conda/recipes
docs		docs
examples		examples
images		images
optimus		optimus
readme		readme
requirements		requirements
tests		tests
.gitignore		.gitignore
.pyup.yml		.pyup.yml
.readthedocs.yaml		.readthedocs.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.coiled		Dockerfile.coiled
Dockerfile.gpu-coiled		Dockerfile.gpu-coiled
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_for_devs.md		README_for_devs.md
README_server.md		README_server.md
Readme.txt		Readme.txt
inlinecss.js		inlinecss.js
install-spark.sh		install-spark.sh
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
sonar-project.properties		sonar-project.properties
troubleshooting.md		troubleshooting.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimus

Overview

Try Optimus

Installation (pip):

Requirements

Examples

Start Optimus

Loading data

Saving Data

Create dataframes

Cleaning and Processing

Need help? 🛠️

Feedback

Troubleshooting

Contributing to Optimus 💡

Backers and Sponsors

About

Releases 87

Packages

Contributors 22

Languages

License

hi-primus/optimus

Folders and files

Latest commit

History

Repository files navigation

Optimus

Overview

Try Optimus

Installation (pip):

Requirements

Examples

Start Optimus

Loading data

Saving Data

Create dataframes

Cleaning and Processing

Need help? 🛠️

Feedback

Troubleshooting

Contributing to Optimus 💡

Backers and Sponsors

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 87

Packages 0

Contributors 22

Languages

Packages