Releases: abstractqqq/polars_ds_extension
v0.4.5
Breaking Changes
Previously, if you want to compare Edit distance between one column and a single string, you would do
df.select(
pds.str_leven(pl.col("c"), "word")
)
But now you have to do
df.select(
pds.str_leven(pl.col("c"), pl.lit("word"))
)
The previous behavior will now look for a column named "word" instead of using the word "word."
What's Changed
- refactored psi and add benford law by @abstractqqq in #158
- Skip some rows in KNN by @DGolubets in #157
- Added data_mask in knn_ptwise by @abstractqqq in #161
- Pretty Code Snippets by @s1lvester in #162
- Tests for entrophies by @abstractqqq in #163
- Woe encoding by @abstractqqq in #164
- Added Median Absolute Deviation and renamed str2 to string by @abstractqqq in #165
New Contributors
- @DGolubets made their first contribution in #157
- @s1lvester made their first contribution in #162
Full Changelog: v0.4.4...v0.4.5
v0.4.4
What's Changed
- added par option in convolve by @abstractqqq in #146
- More dia plots by @abstractqqq in #147
- fixed corr method bug by @abstractqqq in #148
- Add string pre-preprocessing code by @CangyuanLi in #150
- String Cleaning by @CangyuanLi in #152
- Pipeline by @abstractqqq in #155
New Contributors
- @CangyuanLi made their first contribution in #150
Full Changelog: v0.4.3...v0.4.4
v0.4.3
What's Changed
- Kendall tau correlation coefficient by @abstractqqq in #137
- added combined pds.corr expression by @abstractqqq in #139
- new pca plot and lstsq plot by @abstractqqq in #140
- fixed a bug in lstsq regarding group by by @abstractqqq in #143
- added null dist by @abstractqqq in #144
- Improve convolve perf, especially when filter size is small. added method option by @abstractqqq in #145
Full Changelog: v0.4.2...v0.4.3
v0.4.2
Highlights
In Diagnosis, you can now generate 2d principal component graphs. This can help you visualize higher dimensional data.
dia = DIA(df)
dia.plot_pc2(pl.all().exclude("species"), by = "species")
In addition, the following PCA related queries are available:
df.select(
pds.query_pca("a", "b") # singular values and weight vectors
).unnest("a")
df.select(
pds.query_singular_values("a", "b", center = True, as_explained_var=True)
)
The Xi - Correlation is also implemented:
df.select(
pds.xi_corr("x", "y")
)
What's Changed
- try new maturin config by @abstractqqq in #128
- added basic pca exprs by @abstractqqq in #129
- added infer_prob by @abstractqqq in #131
- More pca by @abstractqqq in #132
- xi_corr by @abstractqqq in #134
Full Changelog: v0.4.1-release...v0.4.2
v0.4.1-release
What's Changed
- upgraded download action in CI by @abstractqqq in #124
- removed unncessary test-deps by @abstractqqq in #126
Full Changelog: v0.4.1-fix-release...v0.4.1-release
v0.4.1-fix-release
v0.4.1
What's Changed
- updated packages, OLS now falls back to LU decomp when X^TX is not strictly positive definite. by @abstractqqq in #122
v0.4.0 was yanked due to incomplete upload
v0.4.0 Breaking Changes
For almost all old functions that were invoked like
pl.col("a").name_space.method_call(, ...)
You can now call them via
import polars_ds as pds
pds.method_call(...)
Linters now should recognize the package and all its methods.
v0.4.0 What's Changed
- added meta and str_stats for Diagnosis by @abstractqqq in #110
- added iqr in Diagnosis by @abstractqqq in #111
- Refactor F stats by @abstractqqq in #112
- Sampling functionalities by @abstractqqq in #114
- Added weighted corr, etc. by @abstractqqq in #116
- Fix regex
SyntaxWarning
on Python 3.12+ by @jorenham in #118 - Clean up codebase by @abstractqqq in #119
- added heat map for corr in Diagnosis by @abstractqqq in #121
New Contributors
Full Changelog: v0.3.5...v0.4.0
Full Changelog: v0.4.0...v0.4.1
v0.4.1-test-new-ci
add glob pattern
v0.4.0
Breaking Changes
For almost all old functions that were invoked like
pl.col("a").name_space.method_call(, ...)
You can now call them via
import polars_ds as pds
pds.method_call(...)
Linters now should recognize the package and all its methods.
What's Changed
- added meta and str_stats for Diagnosis by @abstractqqq in #110
- added iqr in Diagnosis by @abstractqqq in #111
- Refactor F stats by @abstractqqq in #112
- Sampling functionalities by @abstractqqq in #114
- Added weighted corr, etc. by @abstractqqq in #116
- Fix regex
SyntaxWarning
on Python 3.12+ by @jorenham in #118 - Clean up codebase by @abstractqqq in #119
- added heat map for corr in Diagnosis by @abstractqqq in #121
New Contributors
Full Changelog: v0.3.5...v0.4.0
v0.3.5
Breaking Changes:
- Now there are two types of random column generation methods: 1. With a column reference. These methods are renamed from sample_xxx to rand_xxx. They behave the same as the old way and must be called with a reference column. New methods generate random columns without any reference, therefore new methods won't respect null and won't use reference's statistics. But new methods are easier to use. Using new methods to get a random df.
import polars as pl
import polars_ds as pds
df = pds.random_data(size=100_000, n_cols = 1).select(
pds.random(0.0, 12.0).alias("uniform_1"),
pds.random(0.0, 1.0).alias("uniform_2"),
pds.random_exp(0.5).alias("exp"),
pds.random_normal(0.0, 1.0).alias("normal"),
pds.random_normal(0.0, 1000.0).alias("fat_normal"),
)
df.head()
What's Changed
- Knn entropy by @abstractqqq in #104
- diagonsis basics by @abstractqqq in #105
- Better stats by @abstractqqq in #106
- Add profile by @abstractqqq in #107
- added exclude in dependency plots by @abstractqqq in #108
- add Transfer Entropy and related measures by @remiadon in #84
New Contributors
Full Changelog: v0.3.4-fix-release...v0.3.5