Skip to content

AMR Corpus of scientific news texts in Brazilian Portuguese

License

Notifications You must be signed in to change notification settings

LALIC-UFSCar/AMRScien-Br-Corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AMRScien-Br Corpus

This repository contains AMRScien-Br, a corpus of scientific news texts in Brazilian Portuguese annotated with Abstract Meaning Representation - AMR. The corpus is composed by two hundred sentences which come from the FAPESP corpus (Aziz and Specia, 2011), a Portuguese-English paralell corpus (pt-en) of scientific news texts belonging to the Brazilian magazine Pesquisa FAPESP Online (https://revistapesquisa.fapesp.br).

For more information about what is AMR and its specific notations, we indicate the AMR guidelines (https://github.com/amrisi/amr-guidelines).

This corpus is distributed under the CC-BY-NC-SA license.

Corpus Notation

AMRScien-Br follows a standard notation to ease the reading of files. The corpus file contains multiple sentences, each with some metainformation, which starts with a hashtag followed by double colons (# ::) and a keyword (id, snt). Then, the AMR graph representation in the PENMAN notation is written. An example is shown below:

 
  # ::id 16
  # ::snt Os otimistas entendem que a crise é passageira. 
  (u1 / entender-01
        :ARG0 (o0 / otimista)
        :ARG1 (t3 / passageiro
              :domain (c2 / crise))) 

A blank line separates each sentence.

Annotation Guidelines

For more details about the annotation guidelines, please see here.

About

AMR Corpus of scientific news texts in Brazilian Portuguese

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published