PekoNet

This repository is the source code of Improving Colloquial Case Legal Judgment Prediction via Abstractive Text Summarization.

PekoNet

Introduction

PekoNet is a colloquial case-based legal judgment prediction (LJP) framework based on abstractive text summarization. The main goal of the framework is to improve the performance of LJP on colloquial case facts and the experience for ordinary and non-professional users who use LJP services.

The framework is composed of two modules: Abstractive Text Summarization Module (ATSM) and Legal Judgment Prediction Module (LJPM). We first used a news summary dataset (CNewSum) to train ATSM. Then, we used ATSM to convert Taiwan criminal case facts from formal to colloquial. Last, we used colloquial case fact as the dataset to train LJPM.

Here are the details of the datasets and models:

datasets (Each dataset has two data formats: Orig. Fact and Summary)
- TCI Training Set: A dataset there are 305,240 data, 147 charges, and 89 legal articles.
- BART Testing Set: A dataset whose Summary data are generated by the BART model there are 30,289 data, 101 charges, and 89 legal articles.
- ChatGPT Testing Set: A dataset whose Summary data are generated by the ChatGPT model there are 30,289 data, 101 charges, and 89 legal articles.
- Human Testing Set: A dataset whose Summary data are written by humans there are 235 data, 53 charges, and 78 legal articles.
models
- Baseline Model: The PekoNet framework model without the ATS module, was trained by formal data of TCI Training Set.
- Independent Training Model: The PekoNet framework model whose ATS module and LJP module are independent models, was trained by CNewSum and colloquial data of TCI Training Set.
- ATS-Freezing Model: The PekoNet framework model whose ATS module was frozen when training LJP module, was trained by CNewSum and colloquial data of TCI Training Set.
- ATS-Finetuning Model: The PekoNet framework model whose ATS module was finetuned when training LJP module, was trained by CNewSum and colloquial data of TCI Training Set.

Environment and Requirements

Ubuntu 18.04.6 LTS
Python 3.7.8
OpenCC 1.1.4
numpy 1.21.6
scikit-learn 1.0.2
tabulate 0.8.10
torch 1.12.1
tqdm 4.64.0
transformers 4.21.2

Pretrained Language Models

You should download and put the models folder in the root directory of this repository.

models

Data

Processed Data (Recommend)

These are the data we processed and used to train the model, and do experiments. You should create the results folder in the root directory of this repository. Then, download and put the tvt_dataset folder in results.

tvt_dataset (un-updated)

Source Data

These are the source data we used in this project. If you want to generate all data by yourself, you should download and put the data folder in the root of this repository, and use generate.py to process data.

data (un-updated)

Checkpoints

You should create the results folder in the root directory of this repository. Then, download and put the checkpoints folder in results.

checkpoints (un-updated)

Usage

Unupdated.

Contact

If you have any questions, feel free to raise issues or contact me!

Citation

@article{clsr2023hong,
    title = {Improving Colloquial Case Legal Judgment Prediction via Abstractive Text Summarization},
    journal = {Computer Law & Security Review},
    volume = {51},
    pages = {105863},
    year = {2023},
    issn = {0267-3649},
    doi = {https://doi.org/10.1016/j.clsr.2023.105863},
    url = {https://www.sciencedirect.com/science/article/pii/S0267364923000730},
    author = {Yu-Xiang Hong and Chia-Hui Chang},
    keywords = {Legal judgment prediction, Legal text summarization, Abstractive text summarization, Legal artificial intelligence}
}

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
abstractive_text_summarization		abstractive_text_summarization
configs		configs
data		data
data_analysis		data_analysis
data_generation		data_generation
logs		logs
models		models
pekonet		pekonet
text_conversion		text_conversion
unused_items		unused_items
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyze.py		analyze.py
convert.py		convert.py
generate.py		generate.py
main.py		main.py
requirements.txt		requirements.txt
summarize.py		summarize.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PekoNet

Introduction

Environment and Requirements

Pretrained Language Models

Data

Processed Data (Recommend)

Source Data

Checkpoints

Usage

Contact

Citation

About

Languages

License

yxhong-tw/PekoNet

Folders and files

Latest commit

History

Repository files navigation

PekoNet

Introduction

Environment and Requirements

Pretrained Language Models

Data

Processed Data (Recommend)

Source Data

Checkpoints

Usage

Contact

Citation

About

Resources

License

Stars

Watchers

Forks

Languages