Bibliographic Reference Parser for German Humanities Journals

Dissclaimer: Currently the model only supports the extraction of author names from references. The annotations in gold/dvjs_annot_references_full.xml contain much more information.

Model

The model consists of two bidirectional GRUs and two dense layers. The first GRU receives the output of the last layer of a multilingual BERT model as input. It is not sufficient to use a German BERT model, because it cannot be adapted to the mostly English data from the GROBID project. The Multilingual Model on the other hand can be trained on both English and German data and achieves better results in combination.

Training

The training has a two-stage structure. First, training is based on the gold data from the GROBID project. These will be adapted beforehand so that they are more similar to humanities references (6817 References, gold/grobid_hum.tsv). For this purpose, typical markers such as "vgl." or "siehe dazu" are inserted or the reference is completely embedded in continuous text and divided into segments. The second training step then uses labelled data (341 References) from the Deutsche Vierteljahreszeitschrift für Literaturwissenschaft und Geistesgeschichte (DVJS). More details about the training can be found in the script (code/train_model.py)

Usage

See code/predict.py

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
code		code
gold		gold
misc		misc
models		models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bibliographic Reference Parser for German Humanities Journals

Model

Training

Usage

About

Releases

Packages

Languages

LeKonArD/bibl_parser

Folders and files

Latest commit

History

Repository files navigation

Bibliographic Reference Parser for German Humanities Journals

Model

Training

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages