Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs/mkdoc #1379

Draft
wants to merge 25 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
ed970ee
Start migration to mkdocs
PGijsbers Jul 29, 2024
ff29c99
Fix reference, avoids duplicate installation headers
PGijsbers Jul 29, 2024
81a0a7b
Improve citation section, fix last broken link
PGijsbers Jul 29, 2024
5cb04dd
Remove files which are to be generated
PGijsbers Jul 29, 2024
94d3024
remove orphan statements
PGijsbers Jul 29, 2024
6dcb538
Add plugin for apidoc generation
PGijsbers Jul 29, 2024
0ef96ee
Generate API documentation pages
PGijsbers Jul 29, 2024
f9dd346
Add API pages to nav, use section-index for init
PGijsbers Jul 29, 2024
12a5066
Remove examples doc as they are generated
PGijsbers Jul 29, 2024
2cc23af
Render notebooks from example folder
PGijsbers Jul 29, 2024
a4efa61
make clearer the symmetry
PGijsbers Jul 29, 2024
2bc16b8
Put notebooks as virtual files in doc, reorder plugins
PGijsbers Jul 29, 2024
62c40d9
Start converting to jupytext
PGijsbers Aug 2, 2024
7d29016
Add other files to toc
PGijsbers Oct 18, 2024
978839d
Fix formatting and links
PGijsbers Oct 18, 2024
1428872
Fix formatting
PGijsbers Oct 18, 2024
f38af1f
fix broken links
PGijsbers Oct 21, 2024
177faf1
Add more notebooks
PGijsbers Oct 21, 2024
ce24b44
Remove notebooks
PGijsbers Oct 21, 2024
e5f60f5
Convert more examples to jupytext
PGijsbers Oct 21, 2024
957c0bd
Convert remaining files to jupytext
PGijsbers Oct 21, 2024
792d759
Add titles
PGijsbers Oct 21, 2024
d148f5d
Make codeblock have light background
PGijsbers Oct 21, 2024
df5099a
Hide the "In [#]" area
PGijsbers Oct 21, 2024
cb49496
Fix (link) formatting errors
PGijsbers Oct 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Contributing

Contribution to the OpenML package is highly appreciated in all forms.
In particular, a few ways to contribute to openml-python are:

- A direct contribution to the package, by means of improving the
code, documentation or examples. To get started, see [this
file](https://github.com/openml/openml-python/blob/main/CONTRIBUTING.md)
with details on how to set up your environment to develop for
openml-python.
- A contribution to an openml-python extension. An extension package
allows OpenML to interface with a machine learning package (such
as scikit-learn or keras). These extensions are hosted in separate
repositories and may have their own guidelines. For more
information, see also [extensions](extensions.md).
- Bug reports. If something doesn't work for you or is cumbersome,
please open a new issue to let us know about the problem. See
[this
section](https://github.com/openml/openml-python/blob/main/CONTRIBUTING.md).
- [Cite OpenML](https://www.openml.org/cite) if you use it in a
scientific publication.
- Visit one of our [hackathons](https://www.openml.org/meet).
- Contribute to another OpenML project, such as [the main OpenML
project](https://github.com/openml/OpenML/blob/master/CONTRIBUTING.md).
179 changes: 179 additions & 0 deletions docs/extensions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Extensions

OpenML-Python provides an extension interface to connect other machine
learning libraries than scikit-learn to OpenML. Please check the
`api_extensions`{.interpreted-text role="ref"} and use the scikit-learn
extension in
`openml.extensions.sklearn.SklearnExtension`{.interpreted-text
role="class"} as a starting point.

## List of extensions

Here is a list of currently maintained OpenML extensions:

- `openml.extensions.sklearn.SklearnExtension`{.interpreted-text
role="class"}
- [openml-keras](https://github.com/openml/openml-keras)
- [openml-pytorch](https://github.com/openml/openml-pytorch)
- [openml-tensorflow (for tensorflow
2+)](https://github.com/openml/openml-tensorflow)

## Connecting new machine learning libraries

### Content of the Library

To leverage support from the community and to tap in the potential of
OpenML, interfacing with popular machine learning libraries is
essential. The OpenML-Python package is capable of downloading meta-data
and results (data, flows, runs), regardless of the library that was used
to upload it. However, in order to simplify the process of uploading
flows and runs from a specific library, an additional interface can be
built. The OpenML-Python team does not have the capacity to develop and
maintain such interfaces on its own. For this reason, we have built an
extension interface to allows others to contribute back. Building a
suitable extension for therefore requires an understanding of the
current OpenML-Python support.

The
`sphx_glr_examples_20_basic_simple_flows_and_runs_tutorial.py`{.interpreted-text
role="ref"} tutorial shows how scikit-learn currently works with
OpenML-Python as an extension. The *sklearn* extension packaged with the
[openml-python](https://github.com/openml/openml-python) repository can
be used as a template/benchmark to build the new extension.

#### API

- The extension scripts must import the [openml]{.title-ref} package
and be able to interface with any function from the OpenML-Python
`api`{.interpreted-text role="ref"}.
- The extension has to be defined as a Python class and must inherit
from `openml.extensions.Extension`{.interpreted-text role="class"}.
- This class needs to have all the functions from [class
Extension]{.title-ref} overloaded as required.
- The redefined functions should have adequate and appropriate
docstrings. The [Sklearn Extension API
:class:\`openml.extensions.sklearn.SklearnExtension.html]{.title-ref}
is a good example to follow.

#### Interfacing with OpenML-Python

Once the new extension class has been defined, the openml-python module
to `openml.extensions.register_extension`{.interpreted-text role="meth"}
must be called to allow OpenML-Python to interface the new extension.

The following methods should get implemented. Although the documentation
in the [Extension]{.title-ref} interface should always be leading, here
we list some additional information and best practices. The [Sklearn
Extension API
:class:\`openml.extensions.sklearn.SklearnExtension.html]{.title-ref} is
a good example to follow. Note that most methods are relatively simple
and can be implemented in several lines of code.

- General setup (required)
- `can_handle_flow`{.interpreted-text role="meth"}: Takes as
argument an OpenML flow, and checks whether this can be handled
by the current extension. The OpenML database consists of many
flows, from various workbenches (e.g., scikit-learn, Weka, mlr).
This method is called before a model is being deserialized.
Typically, the flow-dependency field is used to check whether
the specific library is present, and no unknown libraries are
present there.
- `can_handle_model`{.interpreted-text role="meth"}: Similar as
`can_handle_flow`{.interpreted-text role="meth"}, except that in
this case a Python object is given. As such, in many cases, this
method can be implemented by checking whether this adheres to a
certain base class.
- Serialization and De-serialization (required)
- `flow_to_model`{.interpreted-text role="meth"}: deserializes the
OpenML Flow into a model (if the library can indeed handle the
flow). This method has an important interplay with
`model_to_flow`{.interpreted-text role="meth"}. Running these
two methods in succession should result in exactly the same
model (or flow). This property can be used for unit testing
(e.g., build a model with hyperparameters, make predictions on a
task, serialize it to a flow, deserialize it back, make it
predict on the same task, and check whether the predictions are
exactly the same.) The example in the scikit-learn interface
might seem daunting, but note that here some complicated design
choices were made, that allow for all sorts of interesting
research questions. It is probably good practice to start easy.
- `model_to_flow`{.interpreted-text role="meth"}: The inverse of
`flow_to_model`{.interpreted-text role="meth"}. Serializes a
model into an OpenML Flow. The flow should preserve the class,
the library version, and the tunable hyperparameters.
- `get_version_information`{.interpreted-text role="meth"}: Return
a tuple with the version information of the important libraries.
- `create_setup_string`{.interpreted-text role="meth"}: No longer
used, and will be deprecated soon.
- Performing runs (required)
- `is_estimator`{.interpreted-text role="meth"}: Gets as input a
class, and checks whether it has the status of estimator in the
library (typically, whether it has a train method and a predict
method).
- `seed_model`{.interpreted-text role="meth"}: Sets a random seed
to the model.
- `_run_model_on_fold`{.interpreted-text role="meth"}: One of the
main requirements for a library to generate run objects for the
OpenML server. Obtains a train split (with labels) and a test
split (without labels) and the goal is to train a model on the
train split and return the predictions on the test split. On top
of the actual predictions, also the class probabilities should
be determined. For classifiers that do not return class
probabilities, this can just be the hot-encoded predicted label.
The predictions will be evaluated on the OpenML server. Also,
additional information can be returned, for example,
user-defined measures (such as runtime information, as this can
not be inferred on the server). Additionally, information about
a hyperparameter optimization trace can be provided.
- `obtain_parameter_values`{.interpreted-text role="meth"}:
Obtains the hyperparameters of a given model and the current
values. Please note that in the case of a hyperparameter
optimization procedure (e.g., random search), you only should
return the hyperparameters of this procedure (e.g., the
hyperparameter grid, budget, etc) and that the chosen model will
be inferred from the optimization trace.
- `check_if_model_fitted`{.interpreted-text role="meth"}: Check
whether the train method of the model has been called (and as
such, whether the predict method can be used).
- Hyperparameter optimization (optional)
- `instantiate_model_from_hpo_class`{.interpreted-text
role="meth"}: If a given run has recorded the hyperparameter
optimization trace, then this method can be used to
reinstantiate the model with hyperparameters of a given
hyperparameter optimization iteration. Has some similarities
with `flow_to_model`{.interpreted-text role="meth"} (as this
method also sets the hyperparameters of a model). Note that
although this method is required, it is not necessary to
implement any logic if hyperparameter optimization is not
implemented. Simply raise a [NotImplementedError]{.title-ref}
then.

### Hosting the library

Each extension created should be a stand-alone repository, compatible
with the [OpenML-Python
repository](https://github.com/openml/openml-python). The extension
repository should work off-the-shelf with *OpenML-Python* installed.

Create a [public Github
repo](https://docs.github.com/en/github/getting-started-with-github/create-a-repo)
with the following directory structure:

| [repo name]
| |-- [extension name]
| | |-- __init__.py
| | |-- extension.py
| | |-- config.py (optionally)

### Recommended

- Test cases to keep the extension up to date with the
[openml-python]{.title-ref} upstream changes.
- Documentation of the extension API, especially if any new
functionality added to OpenML-Python\'s extension design.
- Examples to show how the new extension interfaces and works with
OpenML-Python.
- Create a PR to add the new extension to the OpenML-Python API
documentation.

Happy contributing!
89 changes: 89 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# OpenML

**Collaborative Machine Learning in Python**

Welcome to the documentation of the OpenML Python API, a connector to
the collaborative machine learning platform
[OpenML.org](https://www.openml.org). The OpenML Python package allows
to use datasets and tasks from OpenML together with scikit-learn and
share the results online.

## Example

```python
import openml
from sklearn import impute, tree, pipeline

# Define a scikit-learn classifier or pipeline
clf = pipeline.Pipeline(
steps=[
('imputer', impute.SimpleImputer()),
('estimator', tree.DecisionTreeClassifier())
]
)
# Download the OpenML task for the pendigits dataset with 10-fold
# cross-validation.
task = openml.tasks.get_task(32)
# Run the scikit-learn model on the task.
run = openml.runs.run_model_on_task(clf, task)
# Publish the experiment on OpenML (optional, requires an API key.
# You can get your own API key by signing up to OpenML.org)
run.publish()
print(f'View the run online: {run.openml_url}')
```

Find more examples in the sidebar on the left.

## How to get OpenML for python

You can install the OpenML package via `pip` (we recommend using a virtual environment):

```bash
python -m pip install openml
```

For more advanced installation information, please see the
["Introduction"](../examples/20_basic/introduction_tutorial.py) example.


## Further information

- [OpenML documentation](https://docs.openml.org/)
- [OpenML client APIs](https://docs.openml.org/APIs/)
- [OpenML developer guide](https://docs.openml.org/Contributing/)
- [Contact information](https://www.openml.org/contact)
- [Citation request](https://www.openml.org/cite)
- [OpenML blog](https://medium.com/open-machine-learning)
- [OpenML twitter account](https://twitter.com/open_ml)

## Contributing

Contribution to the OpenML package is highly appreciated. Please see the
["Contributing"][contributing] page for more information.

## Citing OpenML-Python

If you use OpenML-Python in a scientific publication, we would
appreciate a reference to our JMLR-MLOSS paper
["OpenML-Python: an extensible Python API for OpenML"](https://www.jmlr.org/papers/v22/19-920.html):

=== "Bibtex"

```bibtex
@article{JMLR:v22:19-920,
author = {Matthias Feurer and Jan N. van Rijn and Arlind Kadra and Pieter Gijsbers and Neeratyoy Mallik and Sahithya Ravi and Andreas Müller and Joaquin Vanschoren and Frank Hutter},
title = {OpenML-Python: an extensible Python API for OpenML},
journal = {Journal of Machine Learning Research},
year = {2021},
volume = {22},
number = {100},
pages = {1--5},
url = {http://jmlr.org/papers/v22/19-920.html}
}
```

=== "MLA"

Feurer, Matthias, et al.
"OpenML-Python: an extensible Python API for OpenML."
_Journal of Machine Learning Research_ 22.100 (2021):1−5.
Loading