Skip to content

Commit

Permalink
Merge branch 'release/v2.8.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
sbrugman committed May 12, 2020
2 parents 60201e7 + e682ec0 commit 58d4a54
Show file tree
Hide file tree
Showing 132 changed files with 2,514 additions and 46,670 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@ on:
jobs:
lint:
name: Lint
# https://developer.github.com/v3/activity/events/types/#pushevent
# if: startsWith(github.event.ref, 'refs/heads/release/v')
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: Deploy to PyPi
on:
release:
types: [created]

jobs:
release:
if: github.event_name == 'release' && github.event.action == 'created'
Expand Down Expand Up @@ -47,4 +47,4 @@ jobs:
uses: pypa/gh-action-pypi-publish@master
with:
user: __token__
password: ${{ secrets.PYPI_TOKEN }}
password: ${{ secrets.PYPI_TOKEN }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,4 @@ examples/*/*.html
examples/*/*.csv
docs/
docsrc/_build/
docsrc/source/pages/api/_autosummary/
21 changes: 17 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,35 @@ cache:
- data/

jobs:
include:
- os: linux
name: "Python 3.9-dev on Linux"
python: 3.9-dev
env: TEST=examples PANDAS=">=1"
- os: windows
name: "Python 3.8 on Windows"
python: 3.8
env: TEST=examples PANDAS=">=1"
- os: osx
name: "Python 3.8 on osx"
python: 3.8
env: TEST=examples PANDAS=">=1"

allow_failures:
- python: 3.9-dev
- os: windows
- os: osx
- name: "Python 3.9-dev on Linux"

python:
- 3.6
- 3.7
- 3.8
# - 3.9-dev

env:
- TEST=unit PANDAS="<1"
- TEST=issue PANDAS="<1"
- TEST=console PANDAS="<1"
- TEST=examples PANDAS="<1"
- TEST=lint PANDAS="<1"
- TEST=typing PANDAS="<1"
- TEST=unit PANDAS=">=1"
- TEST=issue PANDAS=">=1"
- TEST=console PANDAS=">=1"
Expand Down
48 changes: 38 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,17 @@ For each column the following statistics - if relevant for the column type - are
* **Correlations** highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
* **Missing values** matrix, count, heatmap and dendrogram of missing values
* **Text analysis** learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
* **File and Image analysis** extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.

## Announcements

### Version v2.8.0 released

News for users working with image datasets: ``pandas-profiling`` now has build-in supports for Files and Images.
Moreover, the text analysis features have also been reworked, providing more informative statistics.

For a better feel, have a look at the [examples](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/examples.html#showcasing-specific-features) section in the docs or read the changelog for a complete view of the changes.

### Version v2.7.0 released

#### Performance
Expand Down Expand Up @@ -53,6 +61,7 @@ It's extra exciting that GitHub **matches your contribution** for the first year
Find more information here:

- [Changelog v2.7.0](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html#changelog-v2-7-0)
- [Changelog v2.8.0](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html#changelog-v2-8-0)
- [Sponsor the project on GitHub](https://github.com/sponsors/sbrugman)

*May 7, 2020 💘*
Expand All @@ -73,16 +82,23 @@ _Contents:_ **[Examples](#examples)** |
The following examples can give you an impression of what the package can do:

* [Census Income](https://pandas-profiling.github.io/pandas-profiling/examples/master/census/census_report.html) (US Adult Census data relating income)
* [NASA Meteorites](https://pandas-profiling.github.io/pandas-profiling/examples/master/meteorites/meteorites_report.html) (comprehensive set of meteorite landings) [![Open In Colab](https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/pandas-profiling/pandas-profiling/blob/master/examples/master/meteorites/meteorites.ipynb) [![Binder](https://camo.githubusercontent.com/483bae47a175c24dfbfc57390edd8b6982ac5fb3/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667)](https://mybinder.org/v2/gh/pandas-profiling/pandas-profiling/master?filepath=examples%2Fmaster%2Fmeteorites%2Fmeteorites.ipynb)
* [Titanic](https://pandas-profiling.github.io/pandas-profiling/examples/master/titanic/titanic_report.html) (the "Wonderwall" of datasets) [![Open In Colab](https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/pandas-profiling/pandas-profiling/blob/master/examples/master/titanic/titanic.ipynb) [![Binder](https://camo.githubusercontent.com/483bae47a175c24dfbfc57390edd8b6982ac5fb3/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667)](https://mybinder.org/v2/gh/pandas-profiling/pandas-profiling/master?filepath=examples%2Fmaster%2Ftitanic%2Ftitanic.ipynb)
* [NASA Meteorites](https://pandas-profiling.github.io/pandas-profiling/examples/master/meteorites/meteorites_report.html) (comprehensive set of meteorite landings) [![Open In Colab](https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/pandas-profiling/pandas-profiling/blob/master/examples/meteorites/meteorites.ipynb) [![Binder](https://camo.githubusercontent.com/483bae47a175c24dfbfc57390edd8b6982ac5fb3/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667)](https://mybinder.org/v2/gh/pandas-profiling/pandas-profiling/master?filepath=examples%2Fmeteorites%2Fmeteorites.ipynb)
* [Titanic](https://pandas-profiling.github.io/pandas-profiling/examples/master/titanic/titanic_report.html) (the "Wonderwall" of datasets) [![Open In Colab](https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/pandas-profiling/pandas-profiling/blob/master/examples/titanic/titanic.ipynb) [![Binder](https://camo.githubusercontent.com/483bae47a175c24dfbfc57390edd8b6982ac5fb3/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667)](https://mybinder.org/v2/gh/pandas-profiling/pandas-profiling/master?filepath=examples%2Ftitanic%2Ftitanic.ipynb)
* [NZA](https://pandas-profiling.github.io/pandas-profiling/examples/master/nza/nza_report.html) (open data from the Dutch Healthcare Authority)
* [Stata Auto](https://pandas-profiling.github.io/pandas-profiling/examples/master/stata_auto/stata_auto_report.html) (1978 Automobile data)
* [Vektis](https://pandas-profiling.github.io/pandas-profiling/examples/master/vektis/vektis_report.html) (Vektis Dutch Healthcare data)
* [Website Inaccessibility](https://pandas-profiling.github.io/pandas-profiling/examples/master/website_inaccessibility/website_inaccessibility_report.html) (demonstrates the URL type)
* [Colors](https://pandas-profiling.github.io/pandas-profiling/examples/master/colors/colors_report.html) (a simple colors dataset)
* [Russian Vocabulary](https://pandas-profiling.github.io/pandas-profiling/examples/master/russian_vocabulary/russian_vocabulary.html) (demonstrates text analysis)
* [Orange prices](https://pandas-profiling.github.io/pandas-profiling/examples/master/themes/united_report.html) and [Coal prices](https://pandas-profiling.github.io/pandas-profiling/examples/master/themes/flatly_report.html) (showcase report themes)
* [Tutorial: report structure using Kaggle data (advanced)](https://pandas-profiling.github.io/pandas-profiling/examples/master/kaggle/modify_report_structure.ipynb) (modify the report's structure) [![Open In Colab](https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/pandas-profiling/pandas-profiling/blob/master/examples/master/kaggle/modify_report_structure.ipynb) [![Binder](https://camo.githubusercontent.com/483bae47a175c24dfbfc57390edd8b6982ac5fb3/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667)](https://mybinder.org/v2/gh/pandas-profiling/pandas-profiling/master?filepath=examples%2Fmaster%F2kaggle%2Fmodify_report_structure.ipynb)

Specific features:
* [Russian Vocabulary](https://pandas-profiling.github.io/pandas-profiling/examples/master/features/russian_vocabulary.html) (demonstrates text analysis)
* [Cats and Dogs](https://pandas-profiling.github.io/pandas-profiling/examples/master/features/cats-and-dogs.html) (demonstrates image analysis from the file system)
* [Celebrity Faces](https://pandas-profiling.github.io/pandas-profiling/examples/master/features/celebrity-faces.html) (demonstrates image analysis with EXIF information)
* [Website Inaccessibility](https://pandas-profiling.github.io/pandas-profiling/examples/master/features/website_inaccessibility_report.html) (demonstrates URL analysis)
* [Orange prices](https://pandas-profiling.github.io/pandas-profiling/examples/master/themes/united_report.html) and [Coal prices](https://pandas-profiling.github.io/pandas-profiling/examples/master/features/flatly_report.html) (showcases report themes)

Tutorials:
* [Tutorial: report structure using Kaggle data (advanced)](https://pandas-profiling.github.io/pandas-profiling/examples/master/kaggle/modify_report_structure.ipynb) (modify the report's structure) [![Open In Colab](https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/pandas-profiling/pandas-profiling/blob/master/examples/kaggle/modify_report_structure.ipynb) [![Binder](https://camo.githubusercontent.com/483bae47a175c24dfbfc57390edd8b6982ac5fb3/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667)](https://mybinder.org/v2/gh/pandas-profiling/pandas-profiling/master?filepath=examples%2Fkaggle%2Fmodify_report_structure.ipynb)


## Installation

Expand Down Expand Up @@ -131,14 +147,24 @@ from pandas_profiling import ProfileReport

df = pd.DataFrame(
np.random.rand(100, 5),
columns=['a', 'b', 'c', 'd', 'e']
columns=["a", "b", "c", "d", "e"]
)
```
To generate the report, run:
```python
profile = ProfileReport(df, title='Pandas Profiling Report', html={'style':{'full_width':True}})
profile = ProfileReport(df, title="Pandas Profiling Report")
```

### Explore deeper

You can configure the profile report in any way you like. The example code below loads the [explorative configuration file](https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_explorative.yaml), that includes many features for text (length distribution, unicode information), files (file size, creation time) and images (dimensions, exif information). If you are interested what exact settings were used, you can compare with the [default configuration file](https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_default.yaml>).

```python
profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)
```

Learn more about configuring `pandas-profiling` on the [Advanced_usage](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/advanced_usage.html) page.

#### Jupyter Notebook

We recommend generating reports interactively by using the Jupyter notebook.
Expand Down Expand Up @@ -208,7 +234,7 @@ More settings can be found in the [default configuration file](https://github.co
__Example__
```python
profile = df.profile_report(title='Pandas Profiling Report', plot={'histogram': {'bins': 8}})
profile.to_file(output_file="output.html")
profile.to_file("output.html")
```

## Types
Expand All @@ -222,14 +248,16 @@ Types are a powerful abstraction for effective data analysis, that goes beyond t
- Categorical
- URL
- Path
- File
- Image

We have developed a type system for Python, tailored for data analysis: [visions](https://github.com/dylan-profiler/visions).
Selecting the right typeset drastically reduces the complexity the code of your analysis.
Future versions of `pandas-profiling` will have extended type support through `visions`!

## Contributing

Read on getting involved in the [Contribution Guide](https://pandas-profiling.github.io/pandas-profiling/docs/v2.7.0/rtd/pages/contribution_guidelines.html).
Read on getting involved in the [Contribution Guide](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/contribution_guidelines.html).

## Editor integration
### PyCharm integration
Expand Down
7 changes: 7 additions & 0 deletions docsrc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,10 @@
pages/support
pages/contribution_guidelines
pages/resources

.. toctree::
:maxdepth: 3
:caption: Reference
:hidden:

pages/api
60 changes: 54 additions & 6 deletions docsrc/source/pages/advanced_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,18 @@ General settings
:widths: 30, 200, 200, 200
:header-rows: 1

The configuration can be changed in the following way:
The configuration can be changed in the following ways:

.. code-block:: python
:caption: Configuration example
profile = df.profile_report(title='Pandas Profiling Report', pool_size=1)
profile.to_file(output_file="output.html")
# Change the config when creating the report
profile = df.profile_report(title="Pandas Profiling Report", pool_size=1)
# Change the config after
profile.set_variable("html.minify_html", False)
profile.to_file("output.html")
Variable summary settings
-------------------------
Expand All @@ -37,11 +42,21 @@ Variable summary settings
vars={
'num':{'low_categorical_threshold': 0},
'cat':{
'check_composition':False,
'length':True,
'unicode':False,
'n_obs': 5,
}
}
)
profile.set_variable('variables.descriptions',
{
'files': 'Files in the filesystem',
'datec': 'Creation date',
'datem': 'Modification date',
}
)
profile.to_file("report.html")
Expand Down Expand Up @@ -89,6 +104,12 @@ Disable all correlations:
},
)
# or using a shorthand that is available for correlations
profile = df.profile_report(
title="Report without correlations",
correlations=None,
)
Interactions
------------
Expand All @@ -97,6 +118,14 @@ Interactions
:widths: 30, 200, 200, 200
:header-rows: 1
The HTML Report
---------------
.. csv-table::
:file: config_html.csv
:widths: 30, 200, 200, 200
:header-rows: 1
Using a custom configuration file
---------------------------------
Expand All @@ -116,5 +145,24 @@ A great way to get an overview of the possible configuration is to look through
The repository contains the following files:
- `default configuration file <https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_default.yaml>`_ (default),
- `minimal configuration file <https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_minimal.yaml>`_ (optimized for performance)
- `dark themed configuration file <https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_dark.yaml>`_ (customizing styles).
- `explorative configuration file <https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_explorative.yaml>`_ (with text, file and image features enabled),
- `minimal configuration file <https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_minimal.yaml>`_ (minimal computation, optimized for performance)
- `dark themed configuration file <https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_dark.yaml>`_ and `orange themed configuration file <https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_united.yaml>`_ (example of customizing styles).
Configuration shorthands
------------------------
It's possible to disable certain groups of features through configuration shorthands.
.. code-block:: python
# Disable samples, correlations, missing diagrams and duplicates at once
r = ProfileReport(samples=None, correlations=None, missing_diagrams=None, duplicates=None, interactions=None)
# Or use the .set_variable method
r = ProfileReport()
r.set_variable("samples", None)
r.set_variable("duplicates", None)
r.set_variable("correlations", None)
r.set_variable("missing_diagrams", None)
r.set_variable("interactions", None)
2 changes: 2 additions & 0 deletions docsrc/source/pages/announcements.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
Announcements
=============

.. include:: announcements/2020-05-12-release-v2-8-0.rst

.. include:: announcements/2020-05-07-release-v2-7-0.rst

Previous announcements
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Version v2.8.0 released
-----------------------

News for users working with image datasets: ``pandas-profiling`` now has build-in supports for Files and Images.
Moreover, the text analysis features have also been reworked, providing more informative statistics.

For a better feel, have a look at the `examples <https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/examples.html#showcasing-specific-features>`_ section in the docs or read the changelog for a complete view of the changes.
12 changes: 12 additions & 0 deletions docsrc/source/pages/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
===
API
===

.. toctree::

api/profile_report
api/controller
api/model
api/report
api/utils
api/visualisation
12 changes: 12 additions & 0 deletions docsrc/source/pages/api/controller.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
==========
Controller
==========

.. currentmodule:: pandas_profiling.controller
.. toctree::

.. autosummary::
:toctree: _autosummary

console
pandas_decorator
15 changes: 15 additions & 0 deletions docsrc/source/pages/api/model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
=====
Model
=====

.. currentmodule:: pandas_profiling.model
.. toctree::

.. autosummary::
:toctree: _autosummary

base
describe
summary
messages
correlations
16 changes: 16 additions & 0 deletions docsrc/source/pages/api/profile_report.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Profile Report
**************

=============
ProfileReport
=============

.. currentmodule:: pandas_profiling
.. toctree::

.. autosummary::
:toctree: _autosummary

profile_report.ProfileReport
serialize_report.SerializeReport
config.Config
Loading

0 comments on commit 58d4a54

Please sign in to comment.