Skip to content

Commit

Permalink
Merge pull request #9 from ml-stat-Sustech/development
Browse files Browse the repository at this point in the history
update scores
  • Loading branch information
hongxin001 authored Dec 24, 2023
2 parents 6f0c70e + df02ac3 commit 3416992
Show file tree
Hide file tree
Showing 23 changed files with 170 additions and 202 deletions.
50 changes: 25 additions & 25 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Publish Python 🐍 distributions 📦 to PyPI


on:
# automatically running github actions when push a tag
# automatically running github actions when push a tag
push:
tags:
- '*'
Expand All @@ -20,27 +20,27 @@ jobs:
id-token: write
contents: read
steps:
- uses: actions/checkout@master
- name: Set up Python 3.10
uses: actions/setup-python@v3
with:
python-version: '3.10'
- name: Install pypa/setuptools
run: >-
python -m
pip install wheel
pip install readme_renderer[md]
- name: Build a binary wheel
run: >-
python setup.py sdist bdist_wheel
# - name: Publish distribution 📦 to TestPyPI
# uses: pypa/gh-action-pypi-publish@release/v1
# with:
# user: __token__
# password: ${{ secrets.jianguo_test_pypi_password }}
# repository_url: https://test.pypi.org/legacy/
- name: Publish distribution 📦 to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.jianguo_pypi_password }}
- uses: actions/checkout@master
- name: Set up Python 3.10
uses: actions/setup-python@v3
with:
python-version: '3.10'
- name: Install pypa/setuptools
run: >-
python -m
pip install wheel
pip install readme_renderer[md]
- name: Build a binary wheel
run: >-
python setup.py sdist bdist_wheel
# - name: Publish distribution 📦 to TestPyPI
# uses: pypa/gh-action-pypi-publish@release/v1
# with:
# user: __token__
# password: ${{ secrets.jianguo_test_pypi_password }}
# repository_url: https://test.pypi.org/legacy/
- name: Publish distribution 📦 to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.jianguo_pypi_password }}
6 changes: 4 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ Thank you considering contributing to TorchCP!

This document provides brief guidelines for potential contributors.

Please use pull requests for new features, bug fixes, new examples, etc. If you work on something with significant efforts, please mention it in early stage using issues.
Please use pull requests for new features, bug fixes, new examples, etc. If you work on something with significant
efforts, please mention it in early stage using issues.

We ask that you follow the `PEP8` coding style in your pull requests, [`flake8`](http://flake8.pycqa.org/) is used in continuous integration to enforce this.
We ask that you follow the `PEP8` coding style in your pull requests, [`flake8`](http://flake8.pycqa.org/) is used in
continuous integration to enforce this.

---

Expand Down
31 changes: 19 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
TorchCP is a Python toolbox for conformal prediction research on deep learning models, using PyTorch. Specifically, this toolbox has implemented some representative methods (including posthoc and training methods) for
classification and regression tasks. We build the framework of TorchCP based on [`AdverTorch`](https://github.com/BorealisAI/advertorch/tree/master). This codebase is still under construction. Comments, issues, contributions, and collaborations are all welcomed!


TorchCP is a Python toolbox for conformal prediction research on deep learning models, using PyTorch. Specifically, this
toolbox has implemented some representative methods (including posthoc and training methods) for
classification and regression tasks. We build the framework of TorchCP based
on [`AdverTorch`](https://github.com/BorealisAI/advertorch/tree/master). This codebase is still under construction.
Comments, issues, contributions, and collaborations are all welcomed!

# Overview

TorchCP has implemented the following methods:

## Classification

| Year | Title | Venue | Code Link |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------|---------|-----------------------------------------------------------------------------------|
| 2023 | [**Class-Conditional Conformal Prediction with Many Classes**](https://arxiv.org/abs/2306.09335) | NeurIPS | [Link](https://github.com/tiffanyding/class-conditional-conformal) |
Expand All @@ -18,15 +22,15 @@ TorchCP has implemented the following methods:
| 2013 | [**Applications of Class-Conditional Conformal Predictor in Multi-Class Classification**](https://ieeexplore.ieee.org/document/6784618) | ICMLA | |

## Regression

| Year | Title | Venue | Code Link |
|------|------------------------------------------------------------------------------------------------------------------------------------------------|---------|------------------------------------------------------|
| 2021 | [**Adaptive Conformal Inference Under Distribution Shift**](https://arxiv.org/abs/2106.00170) | NeurIPS | [Link](https://github.com/isgibbs/AdaptiveConformal) |
| 2019 | [**Conformalized Quantile Regression**](https://proceedings.neurips.cc/paper_files/paper/2019/file/5103c3584b063c431bd1268e9b5e76fb-Paper.pdf) | NeurIPS | [Link](https://github.com/yromano/cqr) |
| 2016 | [**Distribution-Free Predictive Inference For Regression**](https://arxiv.org/abs/1604.04173) | JASA | [Link](https://github.com/ryantibs/conformal) |



## TODO

TorchCP is still under active development. We will add the following features/items down the road:

| Year | Title | Venue | Code Link |
Expand All @@ -37,24 +41,24 @@ TorchCP is still under active development. We will add the following features/it
| 2022 | [**Conformal Prediction Sets with Limited False Positives**](https://arxiv.org/abs/2202.07650) | ICML | [Link](https://github.com/ajfisch/conformal-fp) |
| 2021 | [**Optimized conformal classification using gradient descent approximation**](https://arxiv.org/abs/2105.11255) | Arxiv | |





## Installation

TorchCP is developed with Python 3.9 and PyTorch 2.0.1. To install TorchCP, simply run

```
pip install torchcp
```

To install from TestPyPI server, run

```
pip install --index-url https://test.pypi.org/simple/ --no-deps torchcp
```

## Examples

Here, we provide a simple example for a classification task, with THR score and SplitPredictor.

```python
from torchcp.classification.scores import THR
from torchcp.classification.predictors import SplitPredictor
Expand Down Expand Up @@ -88,19 +92,21 @@ result_dict = predictor.evaluate(test_dataloader)
print(result_dict["Coverage_rate"], result_dict["Average_size"])

```

You may find more tutorials in [`examples`](https://github.com/ml-stat-Sustech/TorchCP/tree/master/examples) folder.

## Documentation

The documentation webpage is on readthedocs https://torchcp.readthedocs.io/en/latest/index.html.


## License

This project is licensed under the LGPL. The terms and conditions can be found in the LICENSE and LICENSE.GPL files.

## Citation

We will release the technical report of TorchCP recently. If you find our repository useful for your research, please consider citing our paper:
We will release the technical report of TorchCP recently. If you find our repository useful for your research, please
consider citing our paper:

```
@article{huang2023conformal,
Expand All @@ -110,6 +116,7 @@ We will release the technical report of TorchCP recently. If you find our reposi
year={2023}
}
```

## Contributors

* [Hongxin Wei](https://hongxin001.github.io/)
Expand Down
6 changes: 2 additions & 4 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,11 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
import os
import sys

sys.path.insert(0, os.path.abspath('../../'))

from unittest.mock import Mock # noqa: F401, E402

# from sphinx.ext.autodoc.importer import _MockObject as Mock
Mock.Module = object
sys.modules['torch'] = Mock()
Expand Down Expand Up @@ -49,8 +51,6 @@
with open(os.path.join(os.path.abspath('../../'), 'torchcp/VERSION')) as f:
version = f.read().strip()



# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

Expand Down Expand Up @@ -78,7 +78,6 @@
# The master toctree document.
master_doc = 'index'


# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

Expand All @@ -88,7 +87,6 @@
html_theme = 'sphinx_rtd_theme'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]


# A list of files that should not be packed into the epub file.
epub_exclude_files = ['search.html']

Expand Down
6 changes: 3 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. TorchCP documentation master file, created by
sphinx-quickstart on Fri Dec 22 16:28:31 2023.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
sphinx-quickstart on Fri Dec 22 16:28:31 2023.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to TorchCP
===================================
Expand Down
12 changes: 7 additions & 5 deletions examples/clip/clip.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,14 @@

try:
from torchvision.transforms import InterpolationMode

BICUBIC = InterpolationMode.BICUBIC
except ImportError:
BICUBIC = Image.BICUBIC


if packaging.version.parse(torch.__version__) < packaging.version.parse("1.7.1"):
warnings.warn("PyTorch version 1.7.1 or higher is recommended")


__all__ = ["available_models", "load", "tokenize"]
_tokenizer = _Tokenizer()

Expand Down Expand Up @@ -57,7 +56,8 @@ def _download(url: str, root: str):
warnings.warn(f"{download_target} exists, but the SHA256 checksum does not match; re-downloading the file")

with urllib.request.urlopen(url) as source, open(download_target, "wb") as output:
with tqdm(total=int(source.info().get("Content-Length")), ncols=80, unit='iB', unit_scale=True, unit_divisor=1024) as loop:
with tqdm(total=int(source.info().get("Content-Length")), ncols=80, unit='iB', unit_scale=True,
unit_divisor=1024) as loop:
while True:
buffer = source.read(8192)
if not buffer:
Expand Down Expand Up @@ -91,7 +91,8 @@ def available_models() -> List[str]:
return list(_MODELS.keys())


def load(name: str, device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu", jit: bool = False, download_root: str = None):
def load(name: str, device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu",
jit: bool = False, download_root: str = None):
"""Load a CLIP model
Parameters
Expand Down Expand Up @@ -202,7 +203,8 @@ def patch_float(module):
return model, _transform(model.input_resolution.item())


def tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate: bool = False) -> Union[torch.IntTensor, torch.LongTensor]:
def tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate: bool = False) -> Union[
torch.IntTensor, torch.LongTensor]:
"""
Returns the tokenized representation of given input string(s)
Expand Down
10 changes: 7 additions & 3 deletions examples/clip/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,9 @@ def forward(self, x: torch.Tensor):
x = self.conv1(x) # shape = [*, width, grid, grid]
x = x.reshape(x.shape[0], x.shape[1], -1) # shape = [*, width, grid ** 2]
x = x.permute(0, 2, 1) # shape = [*, grid ** 2, width]
x = torch.cat([self.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device), x], dim=1) # shape = [*, grid ** 2 + 1, width]
x = torch.cat(
[self.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device),
x], dim=1) # shape = [*, grid ** 2 + 1, width]
x = x + self.positional_embedding.to(x.dtype)
x = self.ln_pre(x)

Expand Down Expand Up @@ -401,12 +403,14 @@ def build_model(state_dict: dict):

if vit:
vision_width = state_dict["visual.conv1.weight"].shape[0]
vision_layers = len([k for k in state_dict.keys() if k.startswith("visual.") and k.endswith(".attn.in_proj_weight")])
vision_layers = len(
[k for k in state_dict.keys() if k.startswith("visual.") and k.endswith(".attn.in_proj_weight")])
vision_patch_size = state_dict["visual.conv1.weight"].shape[-1]
grid_size = round((state_dict["visual.positional_embedding"].shape[0] - 1) ** 0.5)
image_resolution = vision_patch_size * grid_size
else:
counts: list = [len(set(k.split(".")[2] for k in state_dict if k.startswith(f"visual.layer{b}"))) for b in [1, 2, 3, 4]]
counts: list = [len(set(k.split(".")[2] for k in state_dict if k.startswith(f"visual.layer{b}"))) for b in
[1, 2, 3, 4]]
vision_layers = tuple(counts)
vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0]
output_width = round((state_dict["visual.attnpool.positional_embedding"].shape[0] - 1) ** 0.5)
Expand Down
24 changes: 13 additions & 11 deletions examples/clip/simple_tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ def bytes_to_unicode():
To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
And avoids mapping to whitespace/control characters the bpe code barfs on.
"""
bs = list(range(ord("!"), ord("~")+1))+list(range(ord("¡"), ord("¬")+1))+list(range(ord("®"), ord("ÿ")+1))
bs = list(range(ord("!"), ord("~") + 1)) + list(range(ord("¡"), ord("¬") + 1)) + list(range(ord("®"), ord("ÿ") + 1))
cs = bs[:]
n = 0
for b in range(2**8):
for b in range(2 ** 8):
if b not in bs:
bs.append(b)
cs.append(2**8+n)
cs.append(2 ** 8 + n)
n += 1
cs = [chr(n) for n in cs]
return dict(zip(bs, cs))
Expand Down Expand Up @@ -64,30 +64,32 @@ def __init__(self, bpe_path: str = default_bpe()):
self.byte_encoder = bytes_to_unicode()
self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}
merges = gzip.open(bpe_path).read().decode("utf-8").split('\n')
merges = merges[1:49152-256-2+1]
merges = merges[1:49152 - 256 - 2 + 1]
merges = [tuple(merge.split()) for merge in merges]
vocab = list(bytes_to_unicode().values())
vocab = vocab + [v+'</w>' for v in vocab]
vocab = vocab + [v + '</w>' for v in vocab]
for merge in merges:
vocab.append(''.join(merge))
vocab.extend(['<|startoftext|>', '<|endoftext|>'])
self.encoder = dict(zip(vocab, range(len(vocab))))
self.decoder = {v: k for k, v in self.encoder.items()}
self.bpe_ranks = dict(zip(merges, range(len(merges))))
self.cache = {'<|startoftext|>': '<|startoftext|>', '<|endoftext|>': '<|endoftext|>'}
self.pat = re.compile(r"""<\|startoftext\|>|<\|endoftext\|>|'s|'t|'re|'ve|'m|'ll|'d|[\p{L}]+|[\p{N}]|[^\s\p{L}\p{N}]+""", re.IGNORECASE)
self.pat = re.compile(
r"""<\|startoftext\|>|<\|endoftext\|>|'s|'t|'re|'ve|'m|'ll|'d|[\p{L}]+|[\p{N}]|[^\s\p{L}\p{N}]+""",
re.IGNORECASE)

def bpe(self, token):
if token in self.cache:
return self.cache[token]
word = tuple(token[:-1]) + ( token[-1] + '</w>',)
word = tuple(token[:-1]) + (token[-1] + '</w>',)
pairs = get_pairs(word)

if not pairs:
return token+'</w>'
return token + '</w>'

while True:
bigram = min(pairs, key = lambda pair: self.bpe_ranks.get(pair, float('inf')))
bigram = min(pairs, key=lambda pair: self.bpe_ranks.get(pair, float('inf')))
if bigram not in self.bpe_ranks:
break
first, second = bigram
Expand All @@ -102,8 +104,8 @@ def bpe(self, token):
new_word.extend(word[i:])
break

if word[i] == first and i < len(word)-1 and word[i+1] == second:
new_word.append(first+second)
if word[i] == first and i < len(word) - 1 and word[i + 1] == second:
new_word.append(first + second)
i += 2
else:
new_word.append(word[i])
Expand Down
Loading

0 comments on commit 3416992

Please sign in to comment.