Skip to content

Commit

Permalink
[Refactor/Feat] load dataset (#254)
Browse files Browse the repository at this point in the history
* [Major] refactor imports

* [CI] update tests

* [Feat] add hfd and hf-mirror

* [doc] more guide on loading datsets

* [fix] update hfd

* [fix] dataset formatting

* [fix] load with hfd

* [fix] resolve huggingface-cli import error

* [CI] test dataset formatting

* [CI] skip OOM

* [fix] fix failed tests

* support gpt-4o

* update customize dataset

* [doc] customize model

* [CI] annotate failures

* [Feat] load evaluation_data

* [Feat] hfd_cache_path

* [CI] split pytest

* [CI] fix splits

* [CI] skip cuda

* [doc] add CONTRIBUTING.md

* [fix] evaluation_data is not None

* [CI] download nltk

* [CI] fix temp folder

* [CI] fix cache path

* [CI] skip DatasetGenerationError

* [CI] re-run failures

* [ci] fix winograd

* [CI] fix pytest-results-action

* [CI] fix

* [ci] fix xlsum
  • Loading branch information
huyiwen authored Jun 6, 2024
1 parent e40fcf8 commit 82daf6e
Show file tree
Hide file tree
Showing 65 changed files with 2,189 additions and 1,021 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/isort-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:
- 'utilization/**'

jobs:
build:
formatting-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand Down
80 changes: 64 additions & 16 deletions .github/workflows/pytest-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,30 +9,78 @@ on:
- '.github/workflows/**'

jobs:
build:
name: Run tests
Pytest:
name: subtest
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8.18"]
group: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

steps:
- uses: szenius/set-timezone@v1.2
- uses: szenius/set-timezone@v2.0
with:
timezoneLinux: "Europe/Berlin"
timezoneLinux: "Asia/Shanghai"
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
- name: Set up Python 3.8.18
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
run: pip install uv pip -U
python-version: 3.8.18
- name: Install dependencies
run: uv pip install -r tests/requirements-tests.txt --system
- name: Install isolation dependencies
run: uv pip install vllm --no-build-isolation --system
- uses: pavelzw/pytest-action@v2
run: |
pip install uv pip -U
uv pip install -r tests/requirements-tests.txt --system
uv pip install vllm --no-build-isolation --system
- name: Run tests
run: pytest --cov --junit-xml=test-results.xml --splits 10 --group ${{ matrix.group }} --reruns 3 --only-rerun PermissionError
env:
GITHUB_ACTION: 1
- name: Surface failing tests
if: always()
uses: pmeier/pytest-results-action@multi-testsuites
with:
emoji: false
verbose: true
job-summary: true
# A list of JUnit XML files, directories containing the former, and wildcard
# patterns to process.
# See @actions/glob for supported patterns.
path: test-results.xml

# (Optional) Add a summary of the results at the top of the report
summary: true

# (Optional) Select which results should be included in the report.
# Follows the same syntax as `pytest -r`
display-options: fEX

# (Optional) Fail the workflow if no JUnit XML was found.
fail-on-empty: true

# (Optional) Title of the test results section in the workflow summary
title: Test results
- name: Upload coverage
uses: actions/upload-artifact@v2
with:
name: coverage${{ matrix.group }}
path: .coverage

Coverage:
needs: Pytest
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.8.18
uses: actions/setup-python@v4
with:
python-version: 3.8.18
- name: Install uv
run: |
pip install uv pip -U
uv pip install -r tests/requirements-tests.txt --system
uv pip install vllm --no-build-isolation --system
- name: Download all artifacts
# Downloads coverage1, coverage2, etc.
uses: actions/download-artifact@v2
- name: Run coverage
run: |
coverage combine coverage*/.coverage*
coverage report --fail-under=90
coverage xml
- uses: codecov/codecov-action@v1
82 changes: 82 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Contributing

Thanks for your interest in contributing to LLMBox! We welcome and appreciate contributions.
To report bugs, create a [GitHub issue](https://github.com/RUCAIBox/LLMBox/issues).

## Contribution Guide
### 1. Fork the Official Repository

Fork [LLMBox repository](https://github.com/RUCAIBox/LLMBox) into your own account.
Clone your own forked repository into your local environment.

```shell
git clone [email protected]:<YOUR-USERNAME>/LLMBox.git
```

### 2. Configure Git

Set the official repository as your [upstream](https://www.atlassian.com/git/tutorials/git-forks-and-upstreams) to synchronize with the latest update in the official repository.
Add the original repository as upstream

```shell
cd LLMBox
git remote add upstream [email protected]:RUCAIBox/LLMBox.git
```

Verify that the remote is set.
```shell
git remote -v
```
You should see both `origin` and `upstream` in the output.

### 3. Synchronize with Official Repository
Synchronize latest commit with official repository before coding.

```shell
git fetch upstream
git checkout main
git merge upstream/main
git push origin main
```

### 4. Create a New Branch And Open a Pull Request
After you finish implementation, open forked repository. The source branch is your new branch, and the target branch is `RUCAIBox/LLMBox` `main` branch. Then PR should appears in [LLMBox PRs](https://github.com/RUCAIBox/LLMBox/pulls).

Then LLMBox team will review your code.

## PR Rules

### 1. Pull Request title

As described in [here](https://github.com/commitizen/conventional-commit-types/blob/master/index.json), a valid PR title should begin with one of the following prefixes:

- `feat`: A new feature
- `fix`: A bug fix
- `doc`: Documentation only changes
- `refactor`: A code change that neither fixes a bug nor adds a feature
- `style`: A refactoring that improves code style
- `test`: Adding missing tests or correcting existing tests
- `ci`: Changes to CI configuration files and scripts (example scopes: `.github`, `ci` (Buildkite))
- `revert`: Reverts a previous commit

For example, a PR title could be:
- `refactor: modify package path`
- `feat(training): xxxx`, where `(training)` means that this PR mainly focuses on the training component.

You may also check out previous PRs in the [PR list](https://github.com/RUCAIBox/LLMBox/pulls).

### 2. Pull Request description

- If your PR is small (such as a typo fix), you can go brief.
- If it is large and you have changed a lot, it's better to write more details.


## How to begin
Please refer to the README in each module:
- [training](./training)
- [utilization](./utilization)
- [docs](./docs)

## Tests
Please navigate to `tests` folder to see existing test suites.
At the moment, we have three kinds of tests: `pytest`, `isort`, and `yapf`.
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ bash bash/run_7b_ds3.sh
To utilize your model, or evaluate an existing model, you can run the following command:

```python
python inference.py -m gpt-3.5-turbo -d copa # --num_shot 0 --model_type instruction
python inference.py -m gpt-3.5-turbo -d copa # --num_shot 0 --model_type chat
```

This is default to run the OpenAI GPT 3.5 turbo model on the CoPA dataset in a zero-shot manner.
Expand Down Expand Up @@ -118,12 +118,11 @@ We provide a broad support on Huggingface models (e.g. `LLaMA-3`, `Mistral`, or
Currently a total of 56+ commonly used datasets are supported, including: `HellaSwag`, `MMLU`, `GSM8K`, `GPQA`, `AGIEval`, `CEval`, and `CMMLU`. For a full list of supported models and datasets, view the [utilization](https://github.com/RUCAIBox/LLMBox/tree/main/utilization) documentation.

```bash
python inference.py \
CUDA_VISIBLE_DEVICES=0 python inference.py \
-m llama-2-7b-hf \
-d mmlu agieval:[English] \
--model_type instruction \
--model_type chat \
--num_shot 5 \
--cuda 0 \
--ranking_type ppl_no_option
```

Expand Down
47 changes: 47 additions & 0 deletions docs/examples/customize_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import os
import sys

sys.path.append(".")
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

from utilization import DatasetArguments, ModelArguments, get_evaluator, register_dataset
from utilization.dataset import GenerationDataset


@register_dataset(name="my_data")
class MyData(GenerationDataset):

instruction = "Reply to my message: {input}\nReply:"
metrics = []

def format_instance(self, instance: dict) -> dict:
return instance

@property
def references(self):
return [i["target"] for i in self.evaluation_data]


evaluator = get_evaluator(
model_args=ModelArguments(model_name_or_path="gpt-4o"),
dataset_args=DatasetArguments(
dataset_names=["my_data"],
num_shots=1,
max_example_tokens=2560,
),
evaluation_data=[
{
"input": "Hello",
"target": "Hi"
},
{
"input": "How are you?",
"target": "I'm fine, thank you!"
},
],
example_data=[{
"input": "What's the weather like today?",
"target": "It's sunny today."
}]
)
evaluator.evaluate()
10 changes: 6 additions & 4 deletions docs/examples/customize_huggingface_model.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
import sys

import torch
from transformers import LlamaForCausalLM

from utilization import Evaluator
from utilization.model.huggingface_model import get_model_max_length, load_tokenizer
from utilization.utils import DatasetArguments, ModelArguments
sys.path.append(".")
from utilization import DatasetArguments, ModelArguments, get_evaluator


def load_hf_model(model_args: ModelArguments):
from utilization.model.huggingface_model import get_model_max_length, load_tokenizer

# load your own model
model = LlamaForCausalLM.from_pretrained(
Expand All @@ -24,7 +26,7 @@ def load_hf_model(model_args: ModelArguments):
return model, tokenizer


evaluator = Evaluator(
evaluator = get_evaluator(
model_args=ModelArguments(
model_name_or_path="../your-model-path",
model_type="chat",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

If you find some datasets are not supported in the current version, feel free to implement your own dataset and submit a PR.

See a full list of supported datasets at [here](https://github.com/RUCAIBox/LLMBox/tree/main/docs/utilization/supported-datasets.md).

## Choose the Right Dataset

We provide two types of datasets: [`GenerationDataset`](https://github.com/RUCAIBox/LLMBox/tree/main/utilization/dataset/generation_dataset.py) and [`MultipleChoiceDataset`](https://github.com/RUCAIBox/LLMBox/tree/main/utilization/dataset/multiple_choice_dataset.py).
Expand Down Expand Up @@ -35,7 +37,7 @@ These are the attributes you can define in a new dataset:

- `example_set` (`Optional[str]`): The example split of dataset. Example data will be automatically loaded if this is not None.

- `load_args` (`Union[Tuple[str], Tuple[str, str], Tuple[()]]`, **required\***): Arguments for loading the dataset with huggingface `load_dataset`. See [load from source data](https://github.com/RUCAIBox/LLMBox/tree/main/docs/utilization/customize-dataset.md#load-from-source-data) for details.
- `load_args` (`Union[Tuple[str], Tuple[str, str], Tuple[()]]`, **required\***): Arguments for loading the dataset with huggingface `load_dataset`. See [load from source data](https://github.com/RUCAIBox/LLMBox/tree/main/docs/utilization/how-to-customize-dataset.md#load-from-source-data) for details.

- `extra_model_args` (`Dict[str, Any]`): Extra arguments for the model like `temperature`, `stop` etc. See `set_generation_args`, `set_prob_args`, and `set_ppl_args` for details.

Expand All @@ -45,7 +47,7 @@ Then implement the following methods or properties:
- `references` (**required**): Return the reference answers for evaluation.
- `init_arguments`: Initialize the arguments for the dataset. This is called before the raw dataset is loaded.

See [here](https://github.com/RUCAIBox/LLMBox/tree/main/docs/utilization/customize-dataset.md#advanced-topics) for advanced topics.
See [here](https://github.com/RUCAIBox/LLMBox/tree/main/docs/utilization/how-to-customize-dataset.md#advanced-topics) for advanced topics.


## Load from Source Data
Expand Down
28 changes: 28 additions & 0 deletions docs/utilization/how-to-customize-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# How to Customize Model

## Customizing HuggingFace Models

If you are building on your own model, such as using a fine-tuned model, you can evaluate it easily from python script. Detailed steps and example code are provided in the [customize HuggingFace model guide](https://github.com/RUCAIBox/LLMBox/tree/main/docs/examples/customize_huggingface_model.py).

## Adding a New Model Provider

If you're integrating a new model provider, begin by extending the [`Model`](https://github.com/RUCAIBox/LLMBox/tree/main/utilization/model/model.py) class. Implement essential methods such as `generation`, `get_ppl` (get perplexity), and `get_prob` (get probability) to support different functionalities. For instance, here's how you might implement the `generation` method for a new model:

```python
class NewModel(Model):

model_backend = "new_provider"

def call_model(self, batched_inputs: List[str]) -> List[Any]:
return ... # call to model, e.g., self.model.generate(...)

def to_text(self, result: Any) -> str:
return ... # convert result to text, e.g., result['text']

def generation(self, batched_inputs: List[str]) -> List[str]:
results = self.call_model(batched_inputs)
results = [to_text(result) for result in results]
return results
```

And then, you should register your model in the [`load`](https://github.com/RUCAIBox/LLMBox/tree/main/utilization/model/load.py) file.
Loading

0 comments on commit 82daf6e

Please sign in to comment.