Skip to content

Commit

Permalink
Improve documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
aaronkurz committed Jun 13, 2024
1 parent 9cc3d3d commit 0b76736
Show file tree
Hide file tree
Showing 13 changed files with 161 additions and 106 deletions.
4 changes: 3 additions & 1 deletion package/AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,6 @@ Development Lead
Contributors
------------

None yet. Why not be the first?
* Ronny Seiger <[email protected]>
* Marco Franceschetti <[email protected]>
* Barbara Weber <[email protected]>
20 changes: 11 additions & 9 deletions package/CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,27 +68,26 @@ Ready to contribute? Here's how to set up `aqudem` for local development.

$ pip install -r requirements.txt

4. Create a branch for local development::
4. To get the necessary developer tools, run::

$ pip install -r requirements-dev.txt

5. Create a branch for local development::

$ git checkout -b name-of-your-bugfix-or-feature

Now you can make your changes locally.

5. When you're done making changes, check that your changes pass several requirements::
6. When you're done making changes, check that your changes pass several requirements::

$ `./code-check.sh

To get the necessary tools to execute the checks, run::

$ pip install -r requirements-dev.txt

6. Commit your changes and push your branch to GitHub::
7. Commit your changes and push your branch to GitHub::

$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature

7. Submit a pull request through the GitHub website.
8. Submit a pull request through the GitHub website.

Pull Request Guidelines
-----------------------
Expand All @@ -109,6 +108,9 @@ Tips
Deploying
---------


TODO: to change

A reminder for the maintainers on how to deploy.
Make sure all your changes are committed (including an entry in HISTORY.rst).
Then run::
Expand Down
34 changes: 17 additions & 17 deletions package/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,45 +12,44 @@ AquDeM



Activity and Sequence Detection Performance Measures: A package to evaluate activity detection results, including the sequence of events given multiple activity types.
Activity and Sequence Detection Evaluation Metrics: A Comprehensive Tool for Event Log Comparison.

* Documentation: https://aqudem.readthedocs.io. (TODO: not yet active)

Installation
------------
.. code-block:: bash
pip install .
pip install aqudem
Usage
-----
.. code-block:: python
import aqudem
aqu_context = aqudem.Context("ground_truth.xes",
"detected.xes")
aqu_context = aqudem.Context("ground_truth_log.xes", "detected_log.xes")
aqu_context.activity_names
aqu_context.case_ids
aqu_context.cross_correlation()
aqu_context.event_analysis(activity_name="Store Workpiece in HBW", case_id="case1")
aqu_context.two_set(activity_name="Store Workpiece in HBW")
aqu_context.levenshtein_distance()
aqu_context.activity_names # get all activity names present in log
aqu_context.case_ids # get all case IDs present in log
aqu_context.cross_correlation() # aggregate over all cases and activites
aqu_context.event_analysis(activity_name="Pack", case_id="1") # filter on case and activity
aqu_context.two_set(activity_name="Pack") # filter on activity, aggregate over cases
For a more detailed description of the available methods, please refer to the rest of the documentation.

Preface
--------

* Measurements and metrics to evaluate activity detection results
* Metrics to evaluate activity detection results
* Input: two XES files, one with the ground truth and one with the detection results
* Output: a set of metrics to evaluate the detection results
* Prerequisites for the input files: the XES files must...

* ... have a ``sampling_freq`` in Hz associated with each case
* ... have a ``concept:name`` attribute for each case
* ... have a ``sampling_freq`` in Hz associated with each case (only detected file)
* ... have a ``concept:name`` attribute for each case (case ID)
* ... have a ``time:timestamp`` attribute for each event
* ... have an ``concept:name`` attribute for each event (activity name)
* ... have a ``lifecycle:transition`` attribute for each event
Expand All @@ -72,12 +71,13 @@ Available SEQUENCE_METRICs are:
* Damerau-Levenshtein Distance
* Levenshtein Distance

For requests that span multiple cases, the results are aggregated. The default and only aggregation method is currently averaging.

Classifications are specified in the docstrings of the public
metric methods of aqudem.Context.
All metrics are also available in appropriately normalized versions.
For requests that span multiple cases, the results are aggregated. The default and only aggregation method is currently the mean.
For more detailed definitions of the metrics, please refer to the documentation.



Credits
-------

This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
Expand Down
94 changes: 50 additions & 44 deletions package/aqudem/aqudem.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,34 +15,33 @@


class Context:
"""Class that offers main functionality of AquDeM."""
"""Class that offers main functionality of AquDeM.
Both files are expected to be in the XES format, with special constraints:
* The log must have an attribute specifying the sampling frequency in hertz
(key: "sampling_freq") on the trace level (only the detected log).
* Must use the concept:name,
lifecycle:transition and time:timestamp standard extensions.
* Each activity instance must have an event with at least
the lifecycle transitions tart and complete.
* In one case, the same activity can only be executed once at a time.
An ACTIVITY_METRIC is a metric that is calculated for each activity type
in each case separately.
For requests that span multiple activities and/or cases, the results
are aggregated.
A SEQUENCE_METRIC is a metric that is calculated for each
case separately.
For requests that span multiple cases, the results are aggregated.
:param str ground_truth: The ground truth log file path.
:param str detected: The detected log file path.
:return: An aqudem context instance,
representing the comparison of two logs.
"""

def __init__(self, ground_truth: str, detected: str):
"""Constructor of AquDeMContext.
Both files are expected to be in the XES format, with special constraints:
- The log must have an attribute specifying the sampling frequency in hertz
(key: "sampling_freq") on the trace level.
- Must use the concept:name, concept:instance,
lifecycle:transition and time:timestamp standard extensions.
- Each activity instance must have an event with at least
the lifecycle transitions tart and complete.
- In one case, the same activity can only be executed once at a time.
An ACTIVITY_METRIC is a metric that is calculated for each activity type
in each case separately.
For requests that span multiple activities and/or cases, the results
are aggregated.
A SEQUENCE_METRIC is a metric that is calculated for each
case separately.
For requests that span multiple cases, the results are aggregated.
Classifications are specified in the docstrings of the public
metric methods of aqudem.Context.
:param str ground_truth: The ground truth log file path.
:param str detected: The detected log file path.
:return: An instance of AquDeMContext.
:rtype: AquDeMContext
"""
"""Initialize the context with the ground truth and detected logs."""
base_gt = sf.FrameHE.from_pandas(
pm4py.read_xes(ground_truth).sort_values(by="time:timestamp"))
base_det = sf.FrameHE.from_pandas(
Expand Down Expand Up @@ -74,7 +73,7 @@ def activity_names(self) -> dict[str, list[str]]:
"""Extract all the available activity names from the XES logs.
:return: A dictionary with "ground_truth" and "detected" keys, each
containing a list of activity names.
containing a list of activity names.
"""
return {
"ground_truth": list(set(self._ground_truth["concept:name"].values)),
Expand All @@ -86,7 +85,7 @@ def case_ids(self) -> dict[str, list[str]]:
"""Extract all the available case IDs from the XES logs.
:return: A dictionary with "ground_truth" and "detected" keys, each
containing a list of case IDs.
containing a list of case IDs.
"""
return {
"ground_truth": list(set(self._ground_truth["case:concept:name"].values)),
Expand All @@ -99,13 +98,14 @@ def cross_correlation(self,
case_id: str = "*") -> Tuple[float, float]:
"""Calculate the cross-correlation between the ground truth and detected logs.
ACTICITY_METRIC
ACTIVITY_METRIC
:param activity_name: The name of the activity to calculate the cross-correlation for.
If "*" is passed, the cross-correlation will be calculated and averaged for all
activities.
activities.
:param case_id: The case ID to calculate the cross-correlation for.
If "*" is passed, the cross-correlation will be calculated and averaged for all
case IDs.
case IDs.
:return: Tuple; first element: cross-correlation value, between 0 and 1.
second element: relative shift to achieve maximum cross correlation.
"""
Expand All @@ -127,13 +127,14 @@ def two_set(self, activity_name: str = "*", case_id: str = "*") -> TwoSet:
"""Calculate the 2SET metrics for a given activity. Absolute values.
ACTIVITY_METRIC
With the possibility to average over activities and cases.
Includes the absolute and rate metrics, for details see the
TwoSet class documentation.
For more info on the metrics, see:
See J. A. Ward, P. Lukowicz, and H. W. Gellersen, “Performance metrics for
activity recognition,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 1, pp. 1–23,
Jan. 2011, doi: 10.1145/1889681.1889687.; 4.1.2
For more info on the metrics, refer to the metrics overview and/or:
J. A. Ward, P. Lukowicz, and H. W. Gellersen, “Performance metrics for
activity recognition,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 1, pp. 1–23,
Jan. 2011, doi: 10.1145/1889681.1889687.; 4.1.2
:param activity_name: The name of the activity to calculate the two-set metrics for.
If "*" is passed, the two-set metrics will be calculated
and aggregated for all activities.
Expand Down Expand Up @@ -161,13 +162,14 @@ def event_analysis(self, activity_name: str = "*", case_id: str = "*") -> EventA
"""Calculate the EA metrics.
ACTIVITY_METRIC
With the possibility to average over activities and cases.
Includes the absolute and rate metrics, for details see the
EventAnalysis class documentation.
For more info on the metrics, see:
See J. A. Ward, P. Lukowicz, and H. W. Gellersen, “Performance metrics for
activity recognition,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 1, pp. 1–23,
Jan. 2011, doi: 10.1145/1889681.1889687.; 4.2
For more info on the metrics, refer to the metrics overview and/or:
J. A. Ward, P. Lukowicz, and H. W. Gellersen, “Performance metrics for
activity recognition,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 1, pp. 1–23,
Jan. 2011, doi: 10.1145/1889681.1889687.; 4.2
:param activity_name: The name of the activity to calculate the event analysis metrics for.
If "*" is passed, the metrics will be calculated
and aggregated for all activities.
Expand All @@ -193,12 +195,14 @@ def damerau_levenshtein_distance(self, case_id: str = "*") -> Tuple[Union[float,
"""Calculate the Damerau-Levenshtein distance between the ground truth and
detected logs.
Calculates both the absolute distance and the normalized distance.
SEQUENCE_METRIC
Calculates both the absolute distance and the normalized distance.
Order of activities based on start timestamps.
:param case_id: The case ID to calculate the Damerau-Levenshtein distance for.
If "*" is passed, the Damerau-Levenshtein distance will be calculated and
averaged for all case IDs.
averaged for all case IDs.
:return: The Damerau-Levenshtein distance; tuple.
The first value in the tuple represents the (average) absolute distance.
The second value in the tuple represents the (average) normalized distance.
Expand All @@ -211,9 +215,11 @@ def damerau_levenshtein_distance(self, case_id: str = "*") -> Tuple[Union[float,
def levenshtein_distance(self, case_id: str = "*") -> Tuple[Union[float, int], float]:
"""Calculate the Levenshtein distance between the ground truth and detected logs.
Calculates both the absolute distance and the normalized distance.
SEQUENCE_METRIC
Calculates both the absolute distance and the normalized distance.
Order of activities based on start timestamps.
:param case_id: The case ID to calculate the Levenshtein distance for.
If "*" is passed, the Levenshtein distance will be
calculated and averaged for all case IDs.
Expand Down
28 changes: 15 additions & 13 deletions package/aqudem/event_analysis_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,20 +17,22 @@
class EventAnalysis:
"""Data class to hold the EA metrics.
Regarding the ground truth events: d, f, fm, m.
Regarding both the ground truth and detected events: c.
Regarding the (d)etected events: md, fmd, fd, id.
If result of aggregated request, the values represent the average number of events
over the relevant log-activity pairs.
Regarding the ground truth events:
d: int, Deletions
f: int, Fragmentations
fm: int, Fragmentation and merge
m: int, Merges
Regarding both the ground truth and detected events:
c: int, Correct
Regarding the (d)etected events:
md: int, Merges
fmd: int, Fragmentation and merge
fd: int, Fragmentations
id: int, Insertions
over the relevant case-activity pairs.
Relative metrics are available as properties.
:param d: Deletions
:param f: Fragmentations
:param fm: Fragmentation and merge
:param m: Merges
:param c: Correct
:param md: Merges
:param fmd: Fragmentation and merge
:param fd: Fragmentations
:param id: Insertions
"""
d: Union[int, float]
f: Union[int, float]
Expand Down
29 changes: 17 additions & 12 deletions package/aqudem/two_set_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,23 @@
# pylint: disable=too-many-instance-attributes
@dataclass(frozen=True)
class TwoSet:
"""Data class to hold the absolute 2SET metrics.
tp: int, True Positives
tn: int, True Negatives
d: int, Deletions
f: int, Fragmentations
ua: int, Underfullings (at the start)
uo: int, Underfullings (at the end)
i: int, Insertions
m: int, Merges
oa: int, Overfullings (at the start)
oo: int, Overfullings (at the end)
"""Data class to hold the 2SET metrics.
How many of the det frames can be seen as tp, tn, d, f, ua, uo, i, m, oa, oo.
If result of aggregated request, the values represent the average number of frames
over the relevant case-activity pairs.
Relative metrics are available as properties.
:param tp: True Positives
:param tn: True Negatives
:param d: Deletions
:param f: Fragmentations
:param ua: Underfullings (at the start)
:param uo: Underfullings (at the end)
:param i: Insertions
:param m: Merges
:param oa: Overfullings (at the start)
:param oo: Overfullings (at the end)
"""
tp: Union[int, float]
tn: Union[int, float]
Expand Down
Loading

0 comments on commit 0b76736

Please sign in to comment.