Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs #31

Merged
merged 2 commits into from
Dec 17, 2023
Merged

Docs #31

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@ num = "0.4.1"
faer = {version = "0.15", features = ["ndarray", "nightly"]}
ndarray = "0.15.6" # see if we can get rid of this
hashbrown = {version = "0.14.2", features=["nightly"]}
# Try realfft instead, which is a wrapper around rustfft but specializes for reals and seems to have better perf
# rustfft = "6.1.0"
itertools = "0.12.0"
aho-corasick = "1.1"
rand = {version = "0.8.5"} # Simd support feature seems to be broken atm
Expand Down
3 changes: 3 additions & 0 deletions docs/docs/complex_ext.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Extension for Complex Numbers

::: polars_ds.complex_ext
44 changes: 44 additions & 0 deletions docs/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Polars-ds

A Polars Plugin aiming to simplify common numerical/string data analysis procedures. This means that the most basic data science, stats, NLP related tasks can be done natively inside a dataframe, without leaving dataframe world. This also means that for simple data pipelines, you do not need to install NumPy/Scipy/Scikit-learn, which saves a lot of space, which is great under constrained resources.

Its goal is NOT to replace SciPy, or NumPy, but rather it tries reduce dependency for simple analysis, and tries to reduce Python side code and UDFs, which are often performance bottlenecks.

## Getting Started
```bash
pip install polars_ds
```

and

```python
import polars_ds
```
when you are using the namespaces provided by the package.

## Examples

Generating random numbers, and running t-test, normality test inside a dataframe
```python
df.with_columns(
pl.col("a").stats_ext.sample_normal(mean = 0.5, std = 1.).alias("test1")
, pl.col("a").stats_ext.sample_normal(mean = 0.5, std = 2.).alias("test2")
).select(
pl.col("test1").stats_ext.ttest_ind(pl.col("test2"), equal_var = False).alias("t-test")
, pl.col("test1").stats_ext.normal_test().alias("normality_test")
).select(
pl.col("t-test").struct.field("statistic").alias("t-tests: statistics")
, pl.col("t-test").struct.field("pvalue").alias("t-tests: pvalue")
, pl.col("normality_test").struct.field("statistic").alias("normality_test: statistics")
, pl.col("normality_test").struct.field("pvalue").alias("normality_test: pvalue")
)
```

Blazingly fast string similarity comparisons. (Thanks to [RapidFuzz](https://docs.rs/rapidfuzz/latest/rapidfuzz/))
```python
df2.select(
pl.col("word").str_ext.levenshtein("world", return_sim = True)
).head()
```

And a lot more!
3 changes: 3 additions & 0 deletions docs/docs/num_ext.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Extension for General Numerical Features/Metrics/Quantities

::: polars_ds.num_ext
3 changes: 3 additions & 0 deletions docs/docs/stats_ext.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Extension for Statistical Tests and Samples

::: polars_ds.stats_ext
3 changes: 3 additions & 0 deletions docs/docs/str_ext.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Extension for String Manipulation and Metrics

::: polars_ds.str_ext
20 changes: 20 additions & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
site_name: Polars-ds Docs

nav:
- Home: index.md
- Complex Extension: complex_ext.md
- Numerical Extension: num_ext.md
- Stats Extension: stats_ext.md
- String Extension: str_ext.md

theme:
name: material

plugins:
- search
- mkdocstrings:
handlers:
python:
paths: [../python]
selection:
docstring_style: numpy
4 changes: 4 additions & 0 deletions docs/requirements-docs.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
mkdocs
mkdocstrings[python]
mkdocs-material
pytkdocs[numpy-style]
Loading