Releases: Zeroto521/my-data-toolkit
Releases · Zeroto521/my-data-toolkit
v0.0.20
Highlights of this release
Hightly support H3 (Hexagonal hierarchical geospatial indexing system) via .to_h3
and .H3.*
.
>>> import dtoolkit.geoaccessor
>>> import pandas as pd
>>> df = pd.DataFrame({"x": [122, 100], "y": [55, 1]}).from_xy('x', 'y', crs=4326)
>>> df
x y geometry
0 122 55 POINT (122.00000 55.00000)
1 100 1 POINT (100.00000 1.00000)
# GeoDataFrame -> h3 cell
>>> df_with_h3 = df.to_h3(8)
>>> df_with_h3
x y geometry
612845052823076863 122 55 POINT (122.00000 55.00000)
614269156845420543 100 1 POINT (100.00000 1.00000)
# Calculate h3 cell area
>>> df_with_h3.h3.area
612845052823076863 710781.770906
614269156845420543 852134.191671
dtype: float64
# h3 cell -> GeoDataFrame
>>> df_parent_cell = df_with_h3.h3.to_parent()
>>> df_parent_cell
x y geometry
608341453197803519 122 55 POINT (122.00000 55.00000)
609765557230632959 100 1 POINT (100.00000 1.00000)
>>> df_parent_cell.h3.to_points()
x y geometry
608341453197803519 122 55 POINT (122.00991 55.00606)
609765557230632959 100 1 POINT (100.00504 0.99852)
New features and improvements
- #739, #800, #817, #825: New geoaccessor
dtoolkit.geoaccessor.geoseries.to_h3
to convert geometry to h3 index. - #778: Speed up
dtoolkit.accessor.series.textdistance_matrix
. - #779, #811, #819: New geoaccessor
dtoolkit.geoaccessor.dataframe.H3
to handle h3's geohash. - #784: New accessor
dtoolkit.accessor.series.to_zh
. - #794, #797: New geoaccessor for GeoDataFrame
dtoolkit.geoaccessor.geodataframe.xy
. - #801: New accessor for Series
dtoolkit.accessor.series.invert_or_not
. - #803: New geoaccessor
dtoolkit.geoaccessor.geoseries.select_geom_type
. - #804: New geoaccessor
dtoolkit.geoaccessor.geoseries.radius
. - #809: New accessor for Index
dtoolkit.accessor.index.len
.
Small bug-fix
- #780: Fix
dtoolkit.geoaccessor.dataframe.to_geoframe
's geometry isGeoSeries
. - #816: Fix
dtoolkit.geoaccessor.dataframe.to_geoframe
result CRS is missing. - #822:
dtoolkit.geoaccessor.dataframe.to_geoframe
supports replacing old geometry. - #824: Fix inputting
GeoDataFrame
butdtoolkit.accessor.dataframe.repeat
returnDataFrame
.
API changes
v0.0.20rc1
Highlights of this release
Hightly support H3 (Hexagonal hierarchical geospatial indexing system) via .to_h3
and .H3.*
.
>>> import dtoolkit.geoaccessor
>>> import pandas as pd
>>> df = pd.DataFrame({"x": [122, 100], "y": [55, 1]}).from_xy('x', 'y', crs=4326)
>>> df
x y geometry
0 122 55 POINT (122.00000 55.00000)
1 100 1 POINT (100.00000 1.00000)
# GeoDataFrame -> h3 cell
>>> df_with_h3 = df.to_h3(8)
>>> df_with_h3
x y geometry
612845052823076863 122 55 POINT (122.00000 55.00000)
614269156845420543 100 1 POINT (100.00000 1.00000)
# Calculate h3 cell area
>>> df_with_h3.h3.area
612845052823076863 710781.770906
614269156845420543 852134.191671
dtype: float64
# h3 cell -> GeoDataFrame
>>> df_parent_cell = df_with_h3.h3.to_parent()
>>> df_parent_cell
x y geometry
608341453197803519 122 55 POINT (122.00000 55.00000)
609765557230632959 100 1 POINT (100.00000 1.00000)
>>> df_parent_cell.h3.to_points()
x y geometry
608341453197803519 122 55 POINT (122.00991 55.00606)
609765557230632959 100 1 POINT (100.00504 0.99852)
New features and improvements
- #739, #800, #817, #825: New geoaccessor
dtoolkit.geoaccessor.geoseries.to_h3
to convert geometry to h3 index. - #778: Speed up
dtoolkit.accessor.series.textdistance_matrix
. - #779, #811, #819: New geoaccessor
dtoolkit.geoaccessor.dataframe.H3
to handle h3's geohash. - #784: New accessor
dtoolkit.accessor.series.to_zh
. - #794, #797: New geoaccessor for GeoDataFrame
dtoolkit.geoaccessor.geodataframe.xy
. - #801: New accessor for Series
dtoolkit.accessor.series.invert_or_not
. - #803: New geoaccessor
dtoolkit.geoaccessor.geoseries.select_geom_type
. - #804: New geoaccessor
dtoolkit.geoaccessor.geoseries.radius
. - #809: New accessor for Index
dtoolkit.accessor.index.len
.
Small bug-fix
- #780: Fix
dtoolkit.geoaccessor.dataframe.to_geoframe
's geometry isGeoSeries
. - #816: Fix
dtoolkit.geoaccessor.dataframe.to_geoframe
result CRS is missing. - #822:
dtoolkit.geoaccessor.dataframe.to_geoframe
supports replacing old geometry. - #824: Fix inputting
GeoDataFrame
butdtoolkit.accessor.dataframe.repeat
returnDataFrame
.
API changes
v0.0.19
Highlights of this release:
- #574, #752, #757, #758: Supported python 3.11.
- #772: Simplify importing
import dtoolkit
==import dtoolkit.accessor
.
New features and improvements:
- #724: New accessor for Series to calculate text distance
dtoolkit.accessor.series.textdistance
. - #745:
dtoolkit.geoaccessor.geodataframe.duplicated_geometry
'spredicate
support to directly compare value. - #748:
dtoolkit.geoaccessor.geoseries.xy
support to return DataFrame. - #760:
dtoolkit.accessor.dataframe.repeat
support to use column as the input. - #768: New accessor
dtoolkit.accessor.dataframe.change_axis_type
.
Small bug-fix:
- #576: Fix
DataFrame.append
's FutureWarning. - #765: Fix sklearn pipeline visualization can't print
OneHotEncoder
. - #776: After v0.0.17 github release page don't have tarball file anymore.
API changes:
- #762: Drop
columns
arguments forerror_report
.
v0.0.19rc3
Highlights of this release:
- #574, #752, #757, #758: Supported python 3.11.
- #772: Simplify importing
import dtoolkit
==import dtoolkit.accessor
.
New features and improvements:
- #724: New accessor for Series to calculate text distance
dtoolkit.accessor.series.textdistance
. - #745:
dtoolkit.geoaccessor.geodataframe.duplicated_geometry
'spredicate
support to directly compare value. - #748:
dtoolkit.geoaccessor.geoseries.xy
support to return DataFrame. - #760:
dtoolkit.accessor.dataframe.repeat
support to use column as the input. - #768: New accessor
dtoolkit.accessor.dataframe.change_axis_type
.
Small bug-fix:
- #576: Fix
DataFrame.append
's FutureWarning. - #765: Fix sklearn pipeline visualization can't print
OneHotEncoder
. - #776: After v0.0.17 github release page don't have tarball file anymore.
API changes:
- #762: Drop
columns
arguments forerror_report
.
v0.0.18
Highlights of this release
Pandas accessors
- #715: New accessor
dtoolkit.accessor.series.equal
to compare pandas-object with other.
GeoPandas accessors
- #699, #701, #702, #704, #705, #706, #707, #735: New geoaccessor to generate great circle distances,
dtoolkit.geoaccessor.geoseries.geodistance
anddtoolkit.geoaccessor.geoseries.geodistance_matrix
. - #696: New geoaccessor to handle China webmap offset problem,
dtoolkit.geoaccessor.geoseries.cncrs_offset
. - #691, #703: New geoaccessor to filter geometry via spatial relationship,
dtoolkit.geoaccessor.geoseries.filter_geometry
. - #679, #680, #682: New geoaccessor to check
Polygon
whether having hole and count hole ,dtoolkit.geoaccessor.geoseries.has_hole
anddtoolkit.geoaccessor.geoseries.hole_counts
.
Pipeline
- #688: New accessor
dtoolkit.accessor.dataframe.weighted_mean
for DataFrame. - #685: Let
Pipeline
'sfit_predict
andpredict
support outputtingDataFrame
.
API changes
- #694, #695:
pygeos
isn't an optional dependency anymore. - #665: Drop
dtoolkit.geoaccessor.geoseries.utm_crs
.
Small bug-fix
v0.0.18rc3
New features and improvements
- #721: New accessor for
Series
to convert datetime type,dtoolkit.accessor.series.to_datetime
. - #715: New accessor
dtoolkit.accessor.series.equal
to compare pandas-object with other. - #712: Support use
DataFrame
's column as the distance fordtoolkit.geoaccessor.geodataframe.geobuffer
. - #711, #713: New geoaccessor for GeoSeries to return tuple of coordinates
(x, y)
,dtoolkit.geoaccessor.geoseries.xy
. - #701, #704, #705, #706: New geoaccessor to generate great circle distances matrix,
dtoolkit.geoaccessor.geoseries.geodistance_matrix
. - #699, #702, #707: New geoaccessor to calculate two coordinates distance on earth,
dtoolkit.geoaccessor.geoseries.geodistance
. - #696: New geoaccessor to handle China webmap offset problem,
dtoolkit.geoaccessor.geoseries.cncrs_offset
. - #691, #703: New geoaccessor to filter geometry via spatial relationship,
dtoolkit.geoaccessor.geoseries.filter_geometry
. - #688: New accessor
dtoolkit.accessor.dataframe.weighted_mean
for DataFrame. - #685: Let
Pipeline
'sfit_predict
andpredict
support outputtingDataFrame
. - #680, #682: New geoaccessor to check Polygon whether having hole,
dtoolkit.geoaccessor.geoseries.has_hole
. - #679: New geoaccessor to count the hole number of
Polygon
,dtoolkit.geoaccessor.geoseries.hole_counts
. - #668: Add a new option
dropna
fordtoolkit.accessor.series.values_to_dict
to handle nan value. - #667: New accessor
dtoolkit.accessor.series.dropna_index
.
API changes
- #694, #695:
pygeos
isn't an optional dependency anymore. - #665: Drop
dtoolkit.geoaccessor.geoseries.utm_crs
.
Small bug-fix
v0.0.17
Highlights of this release
- Speed up geoaccessor
geobuffer
viaUTM
CRS (#638). - Require minimal Python 3.8+ (#554).
eval
andquery
work for Series now (#492, #551).
New features and improvements
- New geoaccessor compute geographic area
geoarea
(#640). - A syntactic sugar to parallelize multi-jobs
parallelize
(#635, #641). - New geoaccessor to label / drop duplicate geometry:
duplicated_geometry_groups
,duplicated_geometry
, anddrop_duplicates_geometry
(#631, #632). - New accessor for Series
swap_index_values
(#630). - New accessor group by index
groupby_index
(#625). - New geoaccessor for GeoDataFrame
toposimplify
(#624, #649, #651). to_series
gets onlyvalue_column
also return Series from DataFrame (#620).- New accessor for Series
jenks_bin
andjenks_breaks
(#618, #629). - New accessor for Series
filter_in
(#614). - New geoaccessor for GeoDataFrame
to_geoseries
(#609). - New geoaccessor remove active geometry
drop_geometry
(#599). - New geoaccessor for Series
from_wkt
(#596). - New geoaccessor get coordinates from addresses
geocode
and get addresses from coordinatesreverse_geocode
(#591, #594, #643, #636, #652). - New
level
option for Index accessorto_set
(#586). - Speed up Series accessor
to_set
(#585). - New geoaccessor
from_wkb
(#584, #598). - New geoaccessor
to_geoframe
(#568, #642, #646).
Small bug-fix
- Avoid GeoDataFrame constructor mutating the original (inputting) DataFrame (#644).
- Avoid
fillna_regression
mutating the original dataframe (#622). - Compat with sklearn 1.2 stricter class parameters checking (#602).
geobuffer
uses the active geometry to generate buffers (#583).- Hook accessor method's attrs into both class and instance (#580).
API changes
v0.0.17rc1
Highlights of this release
- Speed up geoaccessor
geobuffer
viaUTM
CRS (#638). - Require minimal Python 3.8+ (#554).
eval
andquery
work for Series now (#492, #551).
New features and improvements
- New geoaccessor compute geographic area
geoarea
(#640). - A syntactic sugar to parallelize multi-jobs
parallelize
(#635, #641). - New geoaccessor to label / drop duplicate geometry:
duplicated_geometry_groups
,duplicated_geometry
, anddrop_duplicates_geometry
(#631, #632). - New accessor for Series
swap_index_values
(#630). - New accessor group by index
groupby_index
(#625). - New geoaccessor for GeoDataFrame
toposimplify
(#624, #649, #651). to_series
gets onlyvalue_column
also return Series from DataFrame (#620).- New accessor for Series
jenks_bin
andjenks_breaks
(#618, .#629) - New accessor for Series
filter_in
(#614). - New geoaccessor for GeoDataFrame
to_geoseries
(#609). - New geoaccessor remove active geometry
drop_geometry
(#599). - New geoaccessor for Series
from_wkt
(#596). - New geoaccessor get coordinates from addresses
geocode
and get addresses from coordinatesreverse_geocode
(#591, #594, #643, #636, #652). - New
level
option for Index accessorto_set
(#586). - Speed up Series accessor
to_set
(#585). - New geoaccessor
from_wkb
(#584, #598). - New geoaccessor
to_geoframe
(#568, #642, #646).
Small bug-fix
- Avoid GeoDataFrame constructor mutating the original (inputting) DataFrame (#644).
- Avoid
fillna_regression
mutating the original dataframe (#622). - Compat with sklearn 1.2 stricter class parameters checking (#602).
geobuffer
uses the active geometry to generate buffers (#583).- Hook accessor method's attrs into both class and instance (#580).
API changes
v0.0.16
New features and improvements
- New accessor
dtoolkit.accessor.dataframe.fillna_regression
(#556, #567). - New
unique
option fordtoolkit.accessor.dataframe.values_to_dict
(#548). - Speed up
dtoolkit.util._exception.find_stack_level
(#546). dtoolkit.accessor.dataframe.filter_in
'show
only works oncondition
DataFrame
's columns (#545).dtoolkit.accessor.series.to_set
speeds up especial to large data (#542, #543).dtoolkit.accessor.dataframe.drop_inf
'sinf
option supports+
and-
(#539).- New accessor
dtoolkit.accessor.dataframe.boolean
forDataFrame
(#537, #538). - New
complement
option fordtoolkit.accessor.dataframe.filter_in
(#533). - New
Index
methoddtoolkit.accessor.index.to_set
(#529). - New method
dtoolkit.accessor.dataframe.decompose
forDataFrame
(#488, #573).
API changes
v0.0.16rc2
New features and improvements
- New accessor
dtoolkit.accessor.dataframe.fillna_regression
(#556, #567). - New
unique
option fordtoolkit.accessor.dataframe.values_to_dict
(#548). - Speed up
dtoolkit.util._exception.find_stack_level
(#546). dtoolkit.accessor.dataframe.filter_in
'show
only works oncondition
DataFrame
's columns (#545).dtoolkit.accessor.series.to_set
speeds up especial to large data (#542, #543).dtoolkit.accessor.dataframe.drop_inf
'sinf
option supports+
and-
(#539).- New accessor
dtoolkit.accessor.dataframe.boolean
forDataFrame
(#537, #538). - New
complement
option fordtoolkit.accessor.dataframe.filter_in
(#533). - New
Index
methoddtoolkit.accessor.index.to_set
(#529). - New method
dtoolkit.accessor.dataframe.decompose
forDataFrame
(#488).