Contributing

Developing `cnlp-transformers`

To contribute to the development of cnlp-transformers, please follow these steps.

Fork the repository

Fork this project on GitHub. (Click "Fork" near the top of the project homepage.)

Leave the repository name the same, and select "Copy the default branch only".

Clone your fork to your local machine.

git clone https://github.com/{your username}/cnlp_transformers.git
cd cnlp_transformers

Add this repository as your upstream remote.

git remote add upstream https://github.com/Machine-Learning-for-Medical-Language/cnlp_transformers.git

Now running git remote -v should show:

origin  https://github.com/{your username}/cnlp_transformers.git (fetch)
origin  https://github.com/{your username}/cnlp_transformers.git (push)
upstream        https://github.com/Machine-Learning-for-Medical-Language/cnlp_transformers.git (fetch)
upstream        https://github.com/Machine-Learning-for-Medical-Language/cnlp_transformers.git (push)

Set up your Python environment

You can set a python development environment using a number of tools, we have instructions for using uv (recommended) or conda.

Using uv (recommended)

Install uv.

From the project's base directory, run:

uv sync --python 3.11 # 3.9 and 3.10 are also supported. uv will install dev dependencies by default.
source .venv/bin/activate # activate the virtual environment

Using conda

Install conda or miniconda.

Create a new conda environment:

conda create -n cnlpt python=3.11 # 3.9 and 3.10 are also supported
conda activate cnlpt

From the project's base directory, install dependencies:

# editable install with dev dependencies
pip install dependency-groups
dependency-groups dev | xargs pip install -e.

Development tools

Pre-commit hooks

Install the pre-commit hooks with:

make hooks

This will automatically double check the code style before you make any commit, and warn you if there are any linting or formatting errors.

Linting and formatting

We use Ruff for linting and formatting.

Run make check to lint and format your code.

If you use VSCode, there is a ruff extension that might be handy for development (e.g., format on save).

Testing your code

You can run the pytest test suite with make test.

CI

This repository has GitHub Actions set up to automatically ensure the codebase is linted and formatted and that the test suite in test/ is passing.

These actions will run whenever a commit is pushed or a pull request is opened that makes changes to any of the following files:

src/**
test/**
pyproject.toml

The lint-and-format workflow should always pass if make check reports that everything is correct.

The build-and-test workflow will run pytest on Linux, MacOS, and Windows, for each Python version this project supports (currently 3.9, 3.10, and 3.11).

You can see the structure of these CI runs in the Actions tab of this repository.

Proposing changes

If you have changes to the code that you wish to contribute to the repository, please follow these steps.

Create and checkout a new branch on your fork for the changes. For example:
```
git checkout -b my-new-feature
```
Start developing! Commit your changes to your new branch.
When you're ready, run make check and make test to make sure the linter and formatter like your code, and that the test suite passes.
When this is done and all your changes are pushed to your fork, you can open a pull request for us to review your changes. Link any related issues that your PR addresses.

Instructions for maintainers

Updating the changelog

All new features and changes should be added to CHANGELOG.md under the "Unreleased" heading. A new heading will be added on every release to absorb all the unreleased changes.

Releasing the next version

1. Choosing a version number

When deciding whether to create a major, minor, or patch version, follow the Semantic Versioning guidelines. The key points are as follows:

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes

MINOR version when you add functionality in a backwards compatible manner

PATCH version when you make backwards compatible bug fixes

Note

At time of writing we are still in major version 0, meaning we are in the initial development stage. For major version 0, both incompatible API changes and backwards compatible feature adds fall under the MINOR version, and the API is not considered stable.

2. Creating a PR for the release

When the codebase is ready for release, run python scripts/prepare_release.py <new version number>. This will walk you through the last few changes that need to be made before release, including updating the changelog and setting the setuptools_scm fallback version, and will also update the lockfile and your venv with the new package version.

Warning

prepare_release.py requires uv to update the lockfile. It will not work if uv is not installed on your machine.

Once you're done, commit your changes and open a PR on main with a title like "Release 0.7.0". Make sure the CI passes, then you can merge and continue to step 3.

3. Creating a release with GitHub

Go to the releases page, and click "Draft a new release". Create a new tag with the new version number preceded by a v (e.g. v0.7.0), and set the release title to be the same as the tag. Click "Generate release notes" to automatically generate release notes from the commit history. You can edit these release notes as necessary. When you're ready, click "Publish release".

4. Setting up your PyPI API key

Log into your PyPI account
Generate an API key for the cnlp-transformers project here

Create a file ~/.pypirc:

[pypi]
username = __token__
password = <the token value, including the `pypi-` prefix>

5. Building and uploading to PyPI

Checkout the commit for the new version; this will usually be the latest commit in main.
Double check that the version number shown by cnlpt --version has been incremented from the previous version on PyPI.
Build the package using make build:

This will build the package in the ./dist/ directory, creating it if it does not exist.
Upload to PyPI with twine:
```
python -m twine upload dist/*
```

Building the documentation

Here are some pointers for updating the Sphinx configuration. This is not exhaustive.

Whenever a new class from a third party package (usually Transformers) is added to a type annotation, a link will need to be added to the Intersphinx mappings. For Transformers, you will have to add an entry for every namespace path you use in the code; for instance, if you import InputExample from transformers and from transformers.data.processing.utils, you will need two lines in transformer_objects.txt as follows:
```
transformers.InputExample py:class 1 main_classes/processors#$ -
transformers.data.processors.utils.InputExample py:class 1 main_classes/processors#transformers.InputExample -
```
The specification for the Intersphinx mappings can be found here.

To add mappings for other libraries, first check if an objects.inv file is published for that project somewhere; then add it to intersphinx_mappings in conf.py per the instructions here.
To rebuild the autodoc toctrees and the transformers Intersphinx mappings, run build_doc_source.sh.
ReadTheDocs should automatically begin building documentation for the latest version upon the creation of the release in GitHub. To build the docs locally for testing documentation changes before uploading to readthedocs, first uncomment lines 36 and 65 on docs/conf.py, then execute the following:
```
cd docs
make html
```
This will write the docs to docs/build/html; simply open docs/build/html/index.html in your browser of choice to view the built documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CONTRIBUTING.md

CONTRIBUTING.md

Contributing

Developing `cnlp-transformers`

Fork the repository

Set up your Python environment

Using uv (recommended)

Using conda

Development tools

Pre-commit hooks

Linting and formatting

Testing your code

CI

Proposing changes

Instructions for maintainers

Updating the changelog

Releasing the next version

1. Choosing a version number

2. Creating a PR for the release

3. Creating a release with GitHub

4. Setting up your PyPI API key

5. Building and uploading to PyPI

Building the documentation

Files

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing

Developing cnlp-transformers

Fork the repository

Set up your Python environment

Using uv (recommended)

Using conda

Development tools

Pre-commit hooks

Linting and formatting

Testing your code

CI

Proposing changes

Instructions for maintainers

Updating the changelog

Releasing the next version

1. Choosing a version number

2. Creating a PR for the release

3. Creating a release with GitHub

4. Setting up your PyPI API key

5. Building and uploading to PyPI

Building the documentation

Developing `cnlp-transformers`