Improve documentation

ics-unisg · Jun 13, 2024 · 0b76736 · 0b76736
1 parent 9cc3d3d
commit 0b76736
Show file tree

Hide file tree

Showing 13 changed files with 161 additions and 106 deletions.
diff --git a/package/AUTHORS.rst b/package/AUTHORS.rst
@@ -10,4 +10,6 @@ Development Lead
 Contributors
 ------------
 
-None yet. Why not be the first?
+* Ronny Seiger <[email protected]>
+* Marco Franceschetti <[email protected]>
+* Barbara Weber <[email protected]>
diff --git a/package/CONTRIBUTING.rst b/package/CONTRIBUTING.rst
@@ -68,27 +68,26 @@ Ready to contribute? Here's how to set up `aqudem` for local development.
 
     $ pip install -r requirements.txt
 
-4. Create a branch for local development::
+4. To get the necessary developer tools, run::
+
+    $ pip install -r requirements-dev.txt
+
+5. Create a branch for local development::
 
     $ git checkout -b name-of-your-bugfix-or-feature
 
    Now you can make your changes locally.
 
-5. When you're done making changes, check that your changes pass several requirements::
+6. When you're done making changes, check that your changes pass several requirements::
 
     $ `./code-check.sh
-
-   To get the necessary tools to execute the checks, run::
-
-    $ pip install -r requirements-dev.txt
-
-6. Commit your changes and push your branch to GitHub::
+7. Commit your changes and push your branch to GitHub::
 
     $ git add .
     $ git commit -m "Your detailed description of your changes."
     $ git push origin name-of-your-bugfix-or-feature
 
-7. Submit a pull request through the GitHub website.
+8. Submit a pull request through the GitHub website.
 
 Pull Request Guidelines
 -----------------------
@@ -109,6 +108,9 @@ Tips
 Deploying
 ---------
 
+
+TODO: to change
+
 A reminder for the maintainers on how to deploy.
 Make sure all your changes are committed (including an entry in HISTORY.rst).
 Then run::

diff --git a/package/README.rst b/package/README.rst
@@ -12,45 +12,44 @@ AquDeM
 
 
 
-Activity and Sequence Detection Performance Measures: A package to evaluate activity detection results, including the sequence of events given multiple activity types.
+Activity and Sequence Detection Evaluation Metrics: A Comprehensive Tool for Event Log Comparison.
 
 * Documentation: https://aqudem.readthedocs.io. (TODO: not yet active)
 
 Installation
 ------------
 .. code-block:: bash
 
-    pip install .
+    pip install aqudem
 
 Usage
 -----
 .. code-block:: python
 
     import aqudem
 
-    aqu_context = aqudem.Context("ground_truth.xes",
-                                 "detected.xes")
+    aqu_context = aqudem.Context("ground_truth_log.xes", "detected_log.xes")
 
-    aqu_context.activity_names
-    aqu_context.case_ids
-    aqu_context.cross_correlation()
-    aqu_context.event_analysis(activity_name="Store Workpiece in HBW", case_id="case1")
-    aqu_context.two_set(activity_name="Store Workpiece in HBW")
-    aqu_context.levenshtein_distance()
+    aqu_context.activity_names # get all activity names present in log
+    aqu_context.case_ids # get all case IDs present in log
+
+    aqu_context.cross_correlation() # aggregate over all cases and activites
+    aqu_context.event_analysis(activity_name="Pack", case_id="1") # filter on case and activity
+    aqu_context.two_set(activity_name="Pack") # filter on activity, aggregate over cases
 
 
 For a more detailed description of the available methods, please refer to the rest of the documentation.
 
 Preface
 --------
 
-* Measurements and metrics to evaluate activity detection results
+* Metrics to evaluate activity detection results
 * Input: two XES files, one with the ground truth and one with the detection results
 * Output: a set of metrics to evaluate the detection results
 * Prerequisites for the input files: the XES files must...
 
-  * ... have a ``sampling_freq`` in Hz associated with each case
-  * ... have a ``concept:name`` attribute for each case
+  * ... have a ``sampling_freq`` in Hz associated with each case (only detected file)
+  * ... have a ``concept:name`` attribute for each case (case ID)
   * ... have a ``time:timestamp`` attribute for each event
   * ... have an ``concept:name`` attribute for each event (activity name)
   * ... have a ``lifecycle:transition`` attribute for each event
@@ -72,12 +71,13 @@ Available SEQUENCE_METRICs are:
 * Damerau-Levenshtein Distance
 * Levenshtein Distance
 
-For requests that span multiple cases, the results are aggregated. The default and only aggregation method is currently averaging.
 
-Classifications are specified in the docstrings of the public
-metric methods of aqudem.Context.
+All metrics are also available in appropriately normalized versions.
+For requests that span multiple cases, the results are aggregated. The default and only aggregation method is currently the mean.
+For more detailed definitions of the metrics, please refer to the documentation.
+
+
 
-Credits
 -------
 
 This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.

diff --git a/package/aqudem/aqudem.py b/package/aqudem/aqudem.py
@@ -15,34 +15,33 @@
 
 
 class Context:
-    """Class that offers main functionality of AquDeM."""
+    """Class that offers main functionality of AquDeM.
+
+    Both files are expected to be in the XES format, with special constraints:
+    * The log must have an attribute specifying the sampling frequency in hertz
+    (key: "sampling_freq") on the trace level (only the detected log).
+    * Must use the concept:name,
+    lifecycle:transition and time:timestamp standard extensions.
+    * Each activity instance must have an event with at least
+    the lifecycle transitions tart and complete.
+    * In one case, the same activity can only be executed once at a time.
+
+    An ACTIVITY_METRIC is a metric that is calculated for each activity type
+    in each case separately.
+    For requests that span multiple activities and/or cases, the results
+    are aggregated.
+    A SEQUENCE_METRIC is a metric that is calculated for each
+    case separately.
+    For requests that span multiple cases, the results are aggregated.
+
+    :param str ground_truth: The ground truth log file path.
+    :param str detected: The detected log file path.
+    :return: An aqudem context instance,
+    representing the comparison of two logs.
+    """
 
     def __init__(self, ground_truth: str, detected: str):
-        """Constructor of AquDeMContext.
-
-        Both files are expected to be in the XES format, with special constraints:
-        - The log must have an attribute specifying the sampling frequency in hertz
-            (key: "sampling_freq") on the trace level.
-        - Must use the concept:name, concept:instance,
-            lifecycle:transition and time:timestamp standard extensions.
-        - Each activity instance must have an event with at least
-            the lifecycle transitions tart and complete.
-        - In one case, the same activity can only be executed once at a time.
-
-        An ACTIVITY_METRIC is a metric that is calculated for each activity type
-        in each case separately.
-        For requests that span multiple activities and/or cases, the results
-        are aggregated.
-        A SEQUENCE_METRIC is a metric that is calculated for each
-        case separately.
-        For requests that span multiple cases, the results are aggregated.
-        Classifications are specified in the docstrings of the public
-        metric methods of aqudem.Context.
-        :param str ground_truth: The ground truth log file path.
-        :param str detected: The detected log file path.
-        :return: An instance of AquDeMContext.
-        :rtype: AquDeMContext
-        """
+        """Initialize the context with the ground truth and detected logs."""
         base_gt = sf.FrameHE.from_pandas(
             pm4py.read_xes(ground_truth).sort_values(by="time:timestamp"))
         base_det = sf.FrameHE.from_pandas(
@@ -74,7 +73,7 @@ def activity_names(self) -> dict[str, list[str]]:
         """Extract all the available activity names from the XES logs.
 
         :return: A dictionary with "ground_truth" and "detected" keys, each
-            containing a list of activity names.
+        containing a list of activity names.
         """
         return {
             "ground_truth": list(set(self._ground_truth["concept:name"].values)),
@@ -86,7 +85,7 @@ def case_ids(self) -> dict[str, list[str]]:
         """Extract all the available case IDs from the XES logs.
 
         :return: A dictionary with "ground_truth" and "detected" keys, each
-            containing a list of case IDs.
+        containing a list of case IDs.
         """
         return {
             "ground_truth": list(set(self._ground_truth["case:concept:name"].values)),
@@ -99,13 +98,14 @@ def cross_correlation(self,
                           case_id: str = "*") -> Tuple[float, float]:
         """Calculate the cross-correlation between the ground truth and detected logs.
 
-        ACTICITY_METRIC
+        ACTIVITY_METRIC
+
         :param activity_name: The name of the activity to calculate the cross-correlation for.
             If "*" is passed, the cross-correlation will be calculated and averaged for all
-             activities.
+            activities.
         :param case_id: The case ID to calculate the cross-correlation for.
             If "*" is passed, the cross-correlation will be calculated and averaged for all
-                case IDs.
+            case IDs.
         :return: Tuple; first element: cross-correlation value, between 0 and 1.
             second element: relative shift to achieve maximum cross correlation.
         """
@@ -127,13 +127,14 @@ def two_set(self, activity_name: str = "*", case_id: str = "*") -> TwoSet:
         """Calculate the 2SET metrics for a given activity. Absolute values.
 
         ACTIVITY_METRIC
-        With the possibility to average over activities and cases.
+
         Includes the absolute and rate metrics, for details see the
         TwoSet class documentation.
-        For more info on the metrics, see:
-        See J. A. Ward, P. Lukowicz, and H. W. Gellersen, “Performance metrics for
-            activity recognition,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 1, pp. 1–23,
-            Jan. 2011, doi: 10.1145/1889681.1889687.; 4.1.2
+        For more info on the metrics, refer to the metrics overview and/or:
+        J. A. Ward, P. Lukowicz, and H. W. Gellersen, “Performance metrics for
+        activity recognition,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 1, pp. 1–23,
+        Jan. 2011, doi: 10.1145/1889681.1889687.; 4.1.2
+
         :param activity_name: The name of the activity to calculate the two-set metrics for.
             If "*" is passed, the two-set metrics will be calculated
             and aggregated for all activities.
@@ -161,13 +162,14 @@ def event_analysis(self, activity_name: str = "*", case_id: str = "*") -> EventA
         """Calculate the EA metrics.
 
         ACTIVITY_METRIC
-        With the possibility to average over activities and cases.
+
         Includes the absolute and rate metrics, for details see the
         EventAnalysis class documentation.
-        For more info on the metrics, see:
-        See J. A. Ward, P. Lukowicz, and H. W. Gellersen, “Performance metrics for
-            activity recognition,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 1, pp. 1–23,
-            Jan. 2011, doi: 10.1145/1889681.1889687.; 4.2
+        For more info on the metrics, refer to the metrics overview and/or:
+        J. A. Ward, P. Lukowicz, and H. W. Gellersen, “Performance metrics for
+        activity recognition,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 1, pp. 1–23,
+        Jan. 2011, doi: 10.1145/1889681.1889687.; 4.2
+
         :param activity_name: The name of the activity to calculate the event analysis metrics for.
             If "*" is passed, the metrics will be calculated
             and aggregated for all activities.
@@ -193,12 +195,14 @@ def damerau_levenshtein_distance(self, case_id: str = "*") -> Tuple[Union[float,
         """Calculate the Damerau-Levenshtein distance between the ground truth and
             detected logs.
 
-        Calculates both the absolute distance and the normalized distance.
         SEQUENCE_METRIC
+
+        Calculates both the absolute distance and the normalized distance.
         Order of activities based on start timestamps.
+
         :param case_id: The case ID to calculate the Damerau-Levenshtein distance for.
             If "*" is passed, the Damerau-Levenshtein distance will be calculated and
-                averaged for all case IDs.
+            averaged for all case IDs.
         :return: The Damerau-Levenshtein distance; tuple.
             The first value in the tuple represents the (average) absolute distance.
             The second value in the tuple represents the (average) normalized distance.
@@ -211,9 +215,11 @@ def damerau_levenshtein_distance(self, case_id: str = "*") -> Tuple[Union[float,
     def levenshtein_distance(self, case_id: str = "*") -> Tuple[Union[float, int], float]:
         """Calculate the Levenshtein distance between the ground truth and detected logs.
 
-        Calculates both the absolute distance and the normalized distance.
         SEQUENCE_METRIC
+
+        Calculates both the absolute distance and the normalized distance.
         Order of activities based on start timestamps.
+
         :param case_id: The case ID to calculate the Levenshtein distance for.
             If "*" is passed, the Levenshtein distance will be
             calculated and averaged for all case IDs.

diff --git a/package/aqudem/event_analysis_helper.py b/package/aqudem/event_analysis_helper.py
@@ -17,20 +17,22 @@
 class EventAnalysis:
     """Data class to hold the EA metrics.
 
+    Regarding the ground truth events: d, f, fm, m.
+    Regarding both the ground truth and detected events: c.
+    Regarding the (d)etected events: md, fmd, fd, id.
     If result of aggregated request, the values represent the average number of events
-    over the relevant log-activity pairs.
-    Regarding the ground truth events:
-    d: int, Deletions
-    f: int, Fragmentations
-    fm: int, Fragmentation and merge
-    m: int, Merges
-    Regarding both the ground truth and detected events:
-    c: int, Correct
-    Regarding the (d)etected events:
-    md: int, Merges
-    fmd: int, Fragmentation and merge
-    fd: int, Fragmentations
-    id: int, Insertions
+    over the relevant case-activity pairs.
+    Relative metrics are available as properties.
+
+    :param d: Deletions
+    :param f: Fragmentations
+    :param fm: Fragmentation and merge
+    :param m: Merges
+    :param c: Correct
+    :param md: Merges
+    :param fmd: Fragmentation and merge
+    :param fd: Fragmentations
+    :param id: Insertions
     """
     d: Union[int, float]
     f: Union[int, float]

diff --git a/package/aqudem/two_set_helper.py b/package/aqudem/two_set_helper.py
@@ -13,18 +13,23 @@
 # pylint: disable=too-many-instance-attributes
 @dataclass(frozen=True)
 class TwoSet:
-    """Data class to hold the absolute 2SET metrics.
-
-    tp: int, True Positives
-    tn: int, True Negatives
-    d: int, Deletions
-    f: int, Fragmentations
-    ua: int, Underfullings (at the start)
-    uo: int, Underfullings (at the end)
-    i: int, Insertions
-    m: int, Merges
-    oa: int, Overfullings (at the start)
-    oo: int, Overfullings (at the end)
+    """Data class to hold the 2SET metrics.
+
+    How many of the det frames can be seen as tp, tn, d, f, ua, uo, i, m, oa, oo.
+    If result of aggregated request, the values represent the average number of frames
+    over the relevant case-activity pairs.
+    Relative metrics are available as properties.
+
+    :param tp: True Positives
+    :param tn: True Negatives
+    :param d: Deletions
+    :param f: Fragmentations
+    :param ua: Underfullings (at the start)
+    :param uo: Underfullings (at the end)
+    :param i: Insertions
+    :param m: Merges
+    :param oa: Overfullings (at the start)
+    :param oo: Overfullings (at the end)
     """
     tp: Union[int, float]
     tn: Union[int, float]