composable-sft/examples/text-classification at main · cambridgeltl/composable-sft

History

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
README.md		README.md
eval_nli.sh		eval_nli.sh
eval_sa.sh		eval_sa.sh
map_nusa.py		map_nusa.py
nli_multisource.json		nli_multisource.json
run_text_classification.py		run_text_classification.py
sa_multisource.json		sa_multisource.json
train_nli.sh		train_nli.sh
train_nli_ms.sh		train_nli_ms.sh
train_sa.sh		train_sa.sh
train_sa_ms.sh		train_sa_ms.sh

README.md

AmericasNLI

Before using the example scripts, you should download the AmericasNLI test data:

cd data
git clone https://github.com/nala-cub/AmericasNLI.git

Then you can use the train_nli.sh and train_nli_ms.sh scripts to train single- and multi-source task SFTs for NLI, and eval_nli.sh to evaluate on the AmericasNLI test data.

NusaX sentiment analysis

To create a version of the SMSA dataset with examples from the test set of NusaX-senti removed, run

# first run "pip install editdistance" if you don't already have it installed
python map_nusa.py

Then you can use train_sa.sh and train_sa_ms.sh to train single- and multi-source task SFTs for NusaX, and eval_sa.sh to evaluate. Single-source training uses SMSA (trimmed as above to avoid information leaks) as source, multi-source training additionally uses the NusaX training data for languages other than Indonesian.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-classification

text-classification

README.md

AmericasNLI

NusaX sentiment analysis

Files

text-classification

Directory actions

More options

Directory actions

More options

Latest commit

History

text-classification

Folders and files

parent directory

README.md

AmericasNLI

NusaX sentiment analysis