Skip to content

Commit

Permalink
DOC Adding documentation addressing issue 1231 (fairlearn#1284)
Browse files Browse the repository at this point in the history
  • Loading branch information
alliesaizan authored Oct 17, 2023
1 parent 24663c4 commit cdcd626
Show file tree
Hide file tree
Showing 2 changed files with 111 additions and 2 deletions.
13 changes: 12 additions & 1 deletion docs/user_guide/assessment/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,17 @@ fairness metrics, such as demographic parity and equalized odds.
We will show how :class:`MetricFrame` can be used to evaluate the metrics
identified during the course of a fairness assessment.

Fairlean provides two primary ways of assessing fairness: :class:`MetricFrame`,
which can be used to perform disaggregated analysis of a particular performance
metric (such as accuracy, false positive rate, etc.) across sensitive
groups, and a set of predefined fairness metrics that use :class:`MetricFrame`
internally to output an aggregate value.
:class:`MetricFrame` can also be used to output an aggregate value,
but the predefined fairness metrics can be used when direct by-group
comparison is not necessary.
In the :ref:`perform_fairness_assessment`, we will dive further into
each of these types of fairness assessments.

In the mathematical definitions below, :math:`X` denotes a feature vector
used for predictions, :math:`A` will be a single sensitive feature (such as age
or race), and :math:`Y` will be the true label.
Expand Down Expand Up @@ -52,4 +63,4 @@ metrics.
For more information on how to use it refer to
`https://github.com/microsoft/responsible-ai-toolbox <https://github.com/microsoft/responsible-ai-toolbox>`_.
Fairlearn provides some of the existing functionality through
:code:`matplotlib`-based visualizations. Refer to the :ref:`plot_metricframe` section.
:code:`matplotlib`-based visualizations. Refer to the :ref:`plot_metricframe` section.
100 changes: 99 additions & 1 deletion docs/user_guide/assessment/perform_fairness_assessment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,12 @@ for the social context of the problem you seek to solve.
In particular, be careful of falling into one of the
:ref:`abstraction traps <abstraction_traps>`.


.. _assessment_disaggregated_metrics:

Disaggregated metrics
---------------------

The centerpiece of fairness assessment in Fairlearn are disaggregated metrics,
which are metrics evaluated on slices of data.
For example, to measure gender-based harms due to errors, we would begin by
Expand All @@ -118,6 +124,16 @@ in our dataset.
If we found that males were experiencing errors at a much lower rate than
females and nonbinary persons, we would flag this as a potential fairness harm.

Note that by "errors" here, we are referring to the methods we use to
assess the performance of the machine learning model overall, for
example accuracy or precision in the classification case.
We distiniguish these model performance metrics from fairness metrics,
which operationalize different definitions of fairness
(such as demographic parity or equal opportunity).
We will review those metrics in a subsequent section of the User Guide.
For more information on fairness metrics,
review :ref:`common_fairness_metrics`.

Fairlearn provides the :class:`fairlearn.metrics.MetricFrame` class to help
with this quantification.
Suppose we have some 'true' values, some predictions from a model, and also
Expand Down Expand Up @@ -199,6 +215,13 @@ These are accessed through the :attr:`MetricFrame.by_group` property:

All of these values can be checked against the original arrays above.

.. note::

Note that :class:`MetricFrame` is intended for analyzing the disparities
between groups with regard to a base metric, and consequently cannot take
predefined fairness metrics, such as :func:`demographic_parity_difference`,
as input to the `metrics` parameter.

.. _assessment_compare_harms:

Compare quantified harms across the groups
Expand Down Expand Up @@ -276,4 +299,79 @@ overall values for the data:
count 0.222222
dtype: float64

In every case, the *largest* difference and *smallest* ratio are returned.
In every case, the *largest* difference and *smallest* ratio are returned.


.. _assessment_predefined_fairness_metrics:

Predefined fairness metrics
---------------------------

In addition to the disaggregated analysis of base metrics enabled by
:class:`MetricFrame`, Fairlearn also provides a set of predefined fairness
metrics that output a single score. These metrics take as input
`sensitive_features` to compute the maximum difference or ratio between
subgroups of a sensitive variable. The predefined fairness metrics offered
by Fairlearn are :func:`demographic_parity_difference`,
:func:`demographic_parity_ratio`, :func:`equalized_odds_difference`,
and :func:`equalized_odds_ratio`.
The ratio and difference can be calculated `between_groups`
or `to_overall`, but `to_overall` results in more than 1 value being
returned (when the `control_features` parameter is not `None`.
:class:`MetricFrame` can also calculate differences and ratios between
groups. For more information on available method of computing
ratios or differences, view the documentation for :meth:`MetricFrame.ratio`
and :meth:`MetricFrame.difference`, respectively.
Note that because these metrics are calculated using
aggregations between groups, they are meant to be
called directly, rather than used within the instantiation of a MetricFrame.

Below, we show an example of calculating demographic parity ratio using the
sample data defined above.

.. doctest:: assessment_metrics
:options: +NORMALIZE_WHITESPACE

>>> from fairlearn.metrics import demographic_parity_ratio
>>> print(demographic_parity_ratio(y_true,
... y_pred,
... sensitive_features=sf_data))
0.66666...

It is also possible to define custom fairness metrics based on any
standard performance metric (e.g., the false positive rate or AUC)
using :func:make_derived_metric.
Under the hood, the fairness assessment metrics
also use :class:`MetricFrame` to compute a particular base rate across
sensitive groups and subsequently perform an aggregation (the difference
or ratio) on the base metric values across groups. For example,
:func:`equalized_odds_ratio` uses both the :func:`false_positive_rate` and
:func:`false_negative_rate` within a :class:`MetricFrame` on the backend
to generate an output. As demonstrated below,
using :func:`equalized_odds_ratio` and :meth:`MetricFrame.ratio` method
produces the same outcome.

.. doctest:: assessment_metrics
:options: +NORMALIZE_WHITESPACE

>>> from fairlearn.metrics import equalized_odds_ratio
>>> print(equalized_odds_ratio(y_true,
... y_pred,
... sensitive_features=sf_data))
0.0
>>> my_metrics = {
... 'tpr' : recall_score,
... 'fpr' : false_positive_rate
... }
>>> mf = MetricFrame(
... metrics=my_metrics,
... y_true=y_true,
... y_pred=y_pred,
... sensitive_features=sf_data
... )
>>> min(mf.ratio(method="between_groups"))
0.0

:ref:`common_fairness_metrics` provides an overview of common metrics used
in fairness analyses. For a deep dive into how to extend the capabilities of
fairness metrics provided by Fairlearn, review :ref:`custom_fairness_metrics`.

0 comments on commit cdcd626

Please sign in to comment.