openml · PGijsbers · Jul 29, 2024 · Jul 29, 2024 · Jul 29, 2024 · Jul 29, 2024
diff --git a/docs/contributing.md b/docs/contributing.md
@@ -0,0 +1,24 @@
+# Contributing
+
+Contribution to the OpenML package is highly appreciated in all forms.
+In particular, a few ways to contribute to openml-python are:
+
+-   A direct contribution to the package, by means of improving the
+    code, documentation or examples. To get started, see [this
+    file](https://github.com/openml/openml-python/blob/main/CONTRIBUTING.md)
+    with details on how to set up your environment to develop for
+    openml-python.
+-   A contribution to an openml-python extension. An extension package
+    allows OpenML to interface with a machine learning package (such
+    as scikit-learn or keras). These extensions are hosted in separate
+    repositories and may have their own guidelines. For more
+    information, see also [extensions](extensions.md).
+-   Bug reports. If something doesn't work for you or is cumbersome,
+    please open a new issue to let us know about the problem. See
+    [this
+    section](https://github.com/openml/openml-python/blob/main/CONTRIBUTING.md).
+-   [Cite OpenML](https://www.openml.org/cite) if you use it in a
+    scientific publication.
+-   Visit one of our [hackathons](https://www.openml.org/meet).
+-   Contribute to another OpenML project, such as [the main OpenML
+    project](https://github.com/openml/OpenML/blob/master/CONTRIBUTING.md).
diff --git a/docs/extensions.md b/docs/extensions.md
@@ -0,0 +1,179 @@
+# Extensions
+
+OpenML-Python provides an extension interface to connect other machine
+learning libraries than scikit-learn to OpenML. Please check the
+`api_extensions`{.interpreted-text role="ref"} and use the scikit-learn
+extension in
+`openml.extensions.sklearn.SklearnExtension`{.interpreted-text
+role="class"} as a starting point.
+
+## List of extensions
+
+Here is a list of currently maintained OpenML extensions:
+
+-   `openml.extensions.sklearn.SklearnExtension`{.interpreted-text
+    role="class"}
+-   [openml-keras](https://github.com/openml/openml-keras)
+-   [openml-pytorch](https://github.com/openml/openml-pytorch)
+-   [openml-tensorflow (for tensorflow
+    2+)](https://github.com/openml/openml-tensorflow)
+
+## Connecting new machine learning libraries
+
+### Content of the Library
+
+To leverage support from the community and to tap in the potential of
+OpenML, interfacing with popular machine learning libraries is
+essential. The OpenML-Python package is capable of downloading meta-data
+and results (data, flows, runs), regardless of the library that was used
+to upload it. However, in order to simplify the process of uploading
+flows and runs from a specific library, an additional interface can be
+built. The OpenML-Python team does not have the capacity to develop and
+maintain such interfaces on its own. For this reason, we have built an
+extension interface to allows others to contribute back. Building a
+suitable extension for therefore requires an understanding of the
+current OpenML-Python support.
+
+The
+`sphx_glr_examples_20_basic_simple_flows_and_runs_tutorial.py`{.interpreted-text
+role="ref"} tutorial shows how scikit-learn currently works with
+OpenML-Python as an extension. The *sklearn* extension packaged with the
+[openml-python](https://github.com/openml/openml-python) repository can
+be used as a template/benchmark to build the new extension.
+
+#### API
+
+-   The extension scripts must import the [openml]{.title-ref} package
+    and be able to interface with any function from the OpenML-Python
+    `api`{.interpreted-text role="ref"}.
+-   The extension has to be defined as a Python class and must inherit
+    from `openml.extensions.Extension`{.interpreted-text role="class"}.
+-   This class needs to have all the functions from [class
+    Extension]{.title-ref} overloaded as required.
+-   The redefined functions should have adequate and appropriate
+    docstrings. The [Sklearn Extension API
+    :class:\`openml.extensions.sklearn.SklearnExtension.html]{.title-ref}
+    is a good example to follow.
+
+#### Interfacing with OpenML-Python
+
+Once the new extension class has been defined, the openml-python module
+to `openml.extensions.register_extension`{.interpreted-text role="meth"}
+must be called to allow OpenML-Python to interface the new extension.
+
+The following methods should get implemented. Although the documentation
+in the [Extension]{.title-ref} interface should always be leading, here
+we list some additional information and best practices. The [Sklearn
+Extension API
+:class:\`openml.extensions.sklearn.SklearnExtension.html]{.title-ref} is
+a good example to follow. Note that most methods are relatively simple
+and can be implemented in several lines of code.
+
+-   General setup (required)
+    -   `can_handle_flow`{.interpreted-text role="meth"}: Takes as
+        argument an OpenML flow, and checks whether this can be handled
+        by the current extension. The OpenML database consists of many
+        flows, from various workbenches (e.g., scikit-learn, Weka, mlr).
+        This method is called before a model is being deserialized.
+        Typically, the flow-dependency field is used to check whether
+        the specific library is present, and no unknown libraries are
+        present there.
+    -   `can_handle_model`{.interpreted-text role="meth"}: Similar as
+        `can_handle_flow`{.interpreted-text role="meth"}, except that in
+        this case a Python object is given. As such, in many cases, this
+        method can be implemented by checking whether this adheres to a
+        certain base class.
+-   Serialization and De-serialization (required)
+    -   `flow_to_model`{.interpreted-text role="meth"}: deserializes the
+        OpenML Flow into a model (if the library can indeed handle the
+        flow). This method has an important interplay with
+        `model_to_flow`{.interpreted-text role="meth"}. Running these
+        two methods in succession should result in exactly the same
+        model (or flow). This property can be used for unit testing
+        (e.g., build a model with hyperparameters, make predictions on a
+        task, serialize it to a flow, deserialize it back, make it
+        predict on the same task, and check whether the predictions are
+        exactly the same.) The example in the scikit-learn interface
+        might seem daunting, but note that here some complicated design
+        choices were made, that allow for all sorts of interesting
+        research questions. It is probably good practice to start easy.
+    -   `model_to_flow`{.interpreted-text role="meth"}: The inverse of
+        `flow_to_model`{.interpreted-text role="meth"}. Serializes a
+        model into an OpenML Flow. The flow should preserve the class,
+        the library version, and the tunable hyperparameters.
+    -   `get_version_information`{.interpreted-text role="meth"}: Return
+        a tuple with the version information of the important libraries.
+    -   `create_setup_string`{.interpreted-text role="meth"}: No longer
+        used, and will be deprecated soon.
+-   Performing runs (required)
+    -   `is_estimator`{.interpreted-text role="meth"}: Gets as input a
+        class, and checks whether it has the status of estimator in the
+        library (typically, whether it has a train method and a predict
+        method).
+    -   `seed_model`{.interpreted-text role="meth"}: Sets a random seed
+        to the model.
+    -   `_run_model_on_fold`{.interpreted-text role="meth"}: One of the
+        main requirements for a library to generate run objects for the
+        OpenML server. Obtains a train split (with labels) and a test
+        split (without labels) and the goal is to train a model on the
+        train split and return the predictions on the test split. On top
+        of the actual predictions, also the class probabilities should
+        be determined. For classifiers that do not return class
+        probabilities, this can just be the hot-encoded predicted label.
+        The predictions will be evaluated on the OpenML server. Also,
+        additional information can be returned, for example,
+        user-defined measures (such as runtime information, as this can
+        not be inferred on the server). Additionally, information about
+        a hyperparameter optimization trace can be provided.
+    -   `obtain_parameter_values`{.interpreted-text role="meth"}:
+        Obtains the hyperparameters of a given model and the current
+        values. Please note that in the case of a hyperparameter
+        optimization procedure (e.g., random search), you only should
+        return the hyperparameters of this procedure (e.g., the
+        hyperparameter grid, budget, etc) and that the chosen model will
+        be inferred from the optimization trace.
+    -   `check_if_model_fitted`{.interpreted-text role="meth"}: Check
+        whether the train method of the model has been called (and as
+        such, whether the predict method can be used).
+-   Hyperparameter optimization (optional)
+    -   `instantiate_model_from_hpo_class`{.interpreted-text
+        role="meth"}: If a given run has recorded the hyperparameter
+        optimization trace, then this method can be used to
+        reinstantiate the model with hyperparameters of a given
+        hyperparameter optimization iteration. Has some similarities
+        with `flow_to_model`{.interpreted-text role="meth"} (as this
+        method also sets the hyperparameters of a model). Note that
+        although this method is required, it is not necessary to
+        implement any logic if hyperparameter optimization is not
+        implemented. Simply raise a [NotImplementedError]{.title-ref}
+        then.
+
+### Hosting the library
+
+Each extension created should be a stand-alone repository, compatible
+with the [OpenML-Python
+repository](https://github.com/openml/openml-python). The extension
+repository should work off-the-shelf with *OpenML-Python* installed.
+
+Create a [public Github
+repo](https://docs.github.com/en/github/getting-started-with-github/create-a-repo)
+with the following directory structure:
+
+    | [repo name]
+    |    |-- [extension name]
+    |    |    |-- __init__.py
+    |    |    |-- extension.py
+    |    |    |-- config.py (optionally)
+
+### Recommended
+
+-   Test cases to keep the extension up to date with the
+    [openml-python]{.title-ref} upstream changes.
+-   Documentation of the extension API, especially if any new
+    functionality added to OpenML-Python\'s extension design.
+-   Examples to show how the new extension interfaces and works with
+    OpenML-Python.
+-   Create a PR to add the new extension to the OpenML-Python API
+    documentation.
+
+Happy contributing!
diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,89 @@
+# OpenML
+
+**Collaborative Machine Learning in Python**
+
+Welcome to the documentation of the OpenML Python API, a connector to
+the collaborative machine learning platform
+[OpenML.org](https://www.openml.org). The OpenML Python package allows
+to use datasets and tasks from OpenML together with scikit-learn and
+share the results online.
+
+## Example
+
+```python
+import openml
+from sklearn import impute, tree, pipeline
+
+# Define a scikit-learn classifier or pipeline
+clf = pipeline.Pipeline(
+    steps=[
+        ('imputer', impute.SimpleImputer()),
+        ('estimator', tree.DecisionTreeClassifier())
+    ]
+)
+# Download the OpenML task for the pendigits dataset with 10-fold
+# cross-validation.
+task = openml.tasks.get_task(32)
+# Run the scikit-learn model on the task.
+run = openml.runs.run_model_on_task(clf, task)
+# Publish the experiment on OpenML (optional, requires an API key.
+# You can get your own API key by signing up to OpenML.org)
+run.publish()
+print(f'View the run online: {run.openml_url}')
+```
+
+Find more examples in the sidebar on the left.
+
+## How to get OpenML for python
+
+You can install the OpenML package via `pip` (we recommend using a virtual environment):
+
+```bash
+python -m pip install openml
+```
+
+For more advanced installation information, please see the
+["Introduction"](../examples/20_basic/introduction_tutorial.py) example.
+
+
+## Further information
+
+-   [OpenML documentation](https://docs.openml.org/)
+-   [OpenML client APIs](https://docs.openml.org/APIs/)
+-   [OpenML developer guide](https://docs.openml.org/Contributing/)
+-   [Contact information](https://www.openml.org/contact)
+-   [Citation request](https://www.openml.org/cite)
+-   [OpenML blog](https://medium.com/open-machine-learning)
+-   [OpenML twitter account](https://twitter.com/open_ml)
+
+## Contributing
+
+Contribution to the OpenML package is highly appreciated. Please see the
+["Contributing"][contributing] page for more information.
+
+## Citing OpenML-Python
+
+If you use OpenML-Python in a scientific publication, we would
+appreciate a reference to our JMLR-MLOSS paper 
+["OpenML-Python: an extensible Python API for OpenML"](https://www.jmlr.org/papers/v22/19-920.html):
+
+=== "Bibtex"
+
+    ```bibtex
+    @article{JMLR:v22:19-920,
+        author  = {Matthias Feurer and Jan N. van Rijn and Arlind Kadra and Pieter Gijsbers and Neeratyoy Mallik and Sahithya Ravi and Andreas MÃ¼ller and Joaquin Vanschoren and Frank Hutter},
+        title   = {OpenML-Python: an extensible Python API for OpenML},
+        journal = {Journal of Machine Learning Research},
+        year    = {2021},
+        volume  = {22},
+        number  = {100},
+        pages   = {1--5},
+        url     = {http://jmlr.org/papers/v22/19-920.html}
+    }
+    ```
+
+=== "MLA"
+
+    Feurer, Matthias, et al. 
+    "OpenML-Python: an extensible Python API for OpenML."
+    _Journal of Machine Learning Research_ 22.100 (2021):1−5.