To contribute to the development of cnlp-transformers, please follow these steps.
-
Fork this project on GitHub. (Click "Fork" near the top of the project homepage.)
Leave the repository name the same, and select "Copy the default branch only".
-
Clone your fork to your local machine.
git clone https://github.com/{your username}/cnlp_transformers.git cd cnlp_transformers
-
Add this repository as your upstream remote.
git remote add upstream https://github.com/Machine-Learning-for-Medical-Language/cnlp_transformers.git
Now running
git remote -v
should show:origin https://github.com/{your username}/cnlp_transformers.git (fetch) origin https://github.com/{your username}/cnlp_transformers.git (push) upstream https://github.com/Machine-Learning-for-Medical-Language/cnlp_transformers.git (fetch) upstream https://github.com/Machine-Learning-for-Medical-Language/cnlp_transformers.git (push)
You can set a python development environment using a number of tools, we have instructions for using uv (recommended) or conda.
-
From the project's base directory, run:
uv sync --python 3.11 # 3.9 and 3.10 are also supported. uv will install dev dependencies by default. source .venv/bin/activate # activate the virtual environment
-
Install conda or miniconda.
-
Create a new conda environment:
conda create -n cnlpt python=3.11 # 3.9 and 3.10 are also supported conda activate cnlpt
-
From the project's base directory, install dependencies:
# editable install with dev dependencies pip install dependency-groups dependency-groups dev | xargs pip install -e.
Install the pre-commit hooks with:
make hooks
This will automatically double check the code style before you make any commit, and warn you if there are any linting or formatting errors.
We use Ruff for linting and formatting.
Run make check
to lint and format your code.
If you use VSCode, there is a ruff extension that might be handy for development (e.g., format on save).
You can run the pytest test suite with make test
.
This repository has GitHub Actions set up to automatically ensure the
codebase is linted and formatted and that the test suite in test/
is
passing.
These actions will run whenever a commit is pushed or a pull request is opened that makes changes to any of the following files:
src/**
test/**
pyproject.toml
The lint-and-format
workflow should always pass if make check
reports
that everything is correct.
The build-and-test
workflow will run pytest
on Linux, MacOS, and Windows,
for each Python version this project supports (currently 3.9, 3.10, and 3.11).
You can see the structure of these CI runs in the Actions tab of this repository.
If you have changes to the code that you wish to contribute to the repository, please follow these steps.
-
Create and checkout a new branch on your fork for the changes. For example:
git checkout -b my-new-feature
-
Start developing! Commit your changes to your new branch.
-
When you're ready, run
make check
andmake test
to make sure the linter and formatter like your code, and that the test suite passes. -
When this is done and all your changes are pushed to your fork, you can open a pull request for us to review your changes. Link any related issues that your PR addresses.
All new features and changes should be added to CHANGELOG.md
under the "Unreleased" heading. A new heading will be added on every release to
absorb all the unreleased changes.
When deciding whether to create a major, minor, or patch version, follow the Semantic Versioning guidelines. The key points are as follows:
Given a version number MAJOR.MINOR.PATCH, increment the:
- MAJOR version when you make incompatible API changes
- MINOR version when you add functionality in a backwards compatible manner
- PATCH version when you make backwards compatible bug fixes
Note
At time of writing we are still in major version 0, meaning we are in the initial development stage. For major version 0, both incompatible API changes and backwards compatible feature adds fall under the MINOR version, and the API is not considered stable.
When the codebase is ready for release, run python scripts/prepare_release.py <new version number>
.
This will walk you through the last few changes that need to be made before release,
including updating the changelog and setting the setuptools_scm fallback version,
and will also update the lockfile and your venv with the new package version.
Warning
prepare_release.py
requires uv to update the lockfile.
It will not work if uv is not installed on your machine.
Once you're done, commit your changes and open a PR on main
with a title like "Release 0.7.0".
Make sure the CI passes, then you can merge and continue to step 3.
Go to the releases
page, and click "Draft a new release". Create a new tag with the new version
number preceded by a v
(e.g. v0.7.0
), and set the release title to be the same as the tag.
Click "Generate release notes" to automatically generate release notes from the
commit history. You can edit these release notes as necessary. When you're ready,
click "Publish release".
-
Log into your PyPI account
-
Generate an API key for the
cnlp-transformers
project here -
Create a file
~/.pypirc
:[pypi] username = __token__ password = <the token value, including the `pypi-` prefix>
-
Checkout the commit for the new version; this will usually be the latest commit in
main
. -
Double check that the version number shown by
cnlpt --version
has been incremented from the previous version on PyPI. -
Build the package using
make build
:This will build the package in the
./dist/
directory, creating it if it does not exist. -
Upload to PyPI with
twine
:python -m twine upload dist/*
Here are some pointers for updating the Sphinx configuration. This is not exhaustive.
-
Whenever a new class from a third party package (usually Transformers) is added to a type annotation, a link will need to be added to the Intersphinx mappings. For Transformers, you will have to add an entry for every namespace path you use in the code; for instance, if you import
InputExample
fromtransformers
and fromtransformers.data.processing.utils
, you will need two lines intransformer_objects.txt
as follows:transformers.InputExample py:class 1 main_classes/processors#$ - transformers.data.processors.utils.InputExample py:class 1 main_classes/processors#transformers.InputExample -
The specification for the Intersphinx mappings can be found here.
To add mappings for other libraries, first check if an
objects.inv
file is published for that project somewhere; then add it tointersphinx_mappings
inconf.py
per the instructions here. -
To rebuild the autodoc toctrees and the
transformers
Intersphinx mappings, runbuild_doc_source.sh
. -
ReadTheDocs should automatically begin building documentation for the latest version upon the creation of the release in GitHub. To build the docs locally for testing documentation changes before uploading to readthedocs, first uncomment lines 36 and 65 on
docs/conf.py
, then execute the following:cd docs make html
This will write the docs to
docs/build/html
; simply opendocs/build/html/index.html
in your browser of choice to view the built documentation.