Dissclaimer: Currently the model only supports the extraction of author names from references. The annotations in gold/dvjs_annot_references_full.xml contain much more information.
The model consists of two bidirectional GRUs and two dense layers. The first GRU receives the output of the last layer of a multilingual BERT model as input. It is not sufficient to use a German BERT model, because it cannot be adapted to the mostly English data from the GROBID project. The Multilingual Model on the other hand can be trained on both English and German data and achieves better results in combination. The training has a two-stage structure. First, training is based on the gold data from the GROBID project. These will be adapted beforehand so that they are more similar to humanities references (6817 References, gold/grobid_hum.tsv). For this purpose, typical markers such as "vgl." or "siehe dazu" are inserted or the reference is completely embedded in continuous text and divided into segments. The second training step then uses labelled data (341 References) from the Deutsche Vierteljahreszeitschrift für Literaturwissenschaft und Geistesgeschichte (DVJS). More details about the training can be found in the script (code/train_model.py)See code/predict.py