Skip to content

yxhong-tw/PekoNet

Repository files navigation

PekoNet

This repository is the source code of Improving Colloquial Case Legal Judgment Prediction via Abstractive Text Summarization.

Introduction

PekoNet is a colloquial case-based legal judgment prediction (LJP) framework based on abstractive text summarization. The main goal of the framework is to improve the performance of LJP on colloquial case facts and the experience for ordinary and non-professional users who use LJP services.

The framework is composed of two modules: Abstractive Text Summarization Module (ATSM) and Legal Judgment Prediction Module (LJPM). We first used a news summary dataset (CNewSum) to train ATSM. Then, we used ATSM to convert Taiwan criminal case facts from formal to colloquial. Last, we used colloquial case fact as the dataset to train LJPM.

Here are the details of the datasets and models:

  • datasets (Each dataset has two data formats: Orig. Fact and Summary)
    • TCI Training Set: A dataset there are 305,240 data, 147 charges, and 89 legal articles.
    • BART Testing Set: A dataset whose Summary data are generated by the BART model there are 30,289 data, 101 charges, and 89 legal articles.
    • ChatGPT Testing Set: A dataset whose Summary data are generated by the ChatGPT model there are 30,289 data, 101 charges, and 89 legal articles.
    • Human Testing Set: A dataset whose Summary data are written by humans there are 235 data, 53 charges, and 78 legal articles.
  • models
    • Baseline Model: The PekoNet framework model without the ATS module, was trained by formal data of TCI Training Set.
    • Independent Training Model: The PekoNet framework model whose ATS module and LJP module are independent models, was trained by CNewSum and colloquial data of TCI Training Set.
    • ATS-Freezing Model: The PekoNet framework model whose ATS module was frozen when training LJP module, was trained by CNewSum and colloquial data of TCI Training Set.
    • ATS-Finetuning Model: The PekoNet framework model whose ATS module was finetuned when training LJP module, was trained by CNewSum and colloquial data of TCI Training Set.

Environment and Requirements

  • Ubuntu 18.04.6 LTS
  • Python 3.7.8
  • OpenCC 1.1.4
  • numpy 1.21.6
  • scikit-learn 1.0.2
  • tabulate 0.8.10
  • torch 1.12.1
  • tqdm 4.64.0
  • transformers 4.21.2

Pretrained Language Models

You should download and put the models folder in the root directory of this repository.

Data

Processed Data (Recommend)

These are the data we processed and used to train the model, and do experiments. You should create the results folder in the root directory of this repository. Then, download and put the tvt_dataset folder in results.

Source Data

These are the source data we used in this project. If you want to generate all data by yourself, you should download and put the data folder in the root of this repository, and use generate.py to process data.

Checkpoints

You should create the results folder in the root directory of this repository. Then, download and put the checkpoints folder in results.

Usage

Unupdated.

Contact

If you have any questions, feel free to raise issues or contact me!

Citation

@article{clsr2023hong,
    title = {Improving Colloquial Case Legal Judgment Prediction via Abstractive Text Summarization},
    journal = {Computer Law & Security Review},
    volume = {51},
    pages = {105863},
    year = {2023},
    issn = {0267-3649},
    doi = {https://doi.org/10.1016/j.clsr.2023.105863},
    url = {https://www.sciencedirect.com/science/article/pii/S0267364923000730},
    author = {Yu-Xiang Hong and Chia-Hui Chang},
    keywords = {Legal judgment prediction, Legal text summarization, Abstractive text summarization, Legal artificial intelligence}
}

About

PekoNet, a user-friendly framework for legal judgment prediction.

Resources

License

Stars

Watchers

Forks

Languages