S3 Dataset

S3 Dataset with Generation Code and Resources

Datasets
MedlinePlus 1. MedlinePlus-SemRep.xls
- A two-column table (in .xls format) that presents the annotated version of each sentence in the MedlinePlus dataset along with the proposed simplification. 2. MedlinePlus-SemRep.xml
- For each sentence in the MedlinePlus dataset, we provide:
  1. Its annotated version
  2. Its simplification
  3. The triples that have been identified in the original sentence
WikiAstronauts 1. WikiAstronauts-DBpedia.xls
- A two-column table (in .xls format) that presents the annotated version of each sentence in the WikiAstronauts dataset along with the proposed simplification. 2. WikiAstronauts-DBpedia.xml
- For each sentence in the WikiAstronauts dataset, we provide:
  1. Its annotated version
  2. Its simplification
  3. The triples that have been identified in the original sentence
Evaluations
MedlinePlus-SemRep.xls
WikiAstronauts-DBpedia.xls * The tables (in .xls format) that have been used to evaluate the methods according to which a simplification is selected.
src
CrowdFlower 1. MedlinePlus
1. experiment.csv
- The table (in .csv format) that was used on the CrowdFlower platform.
- It contains the annotated version of the sentences from the MedlinePlus dataset.
1. f902529.csv
- The table (in .csv format) that contains all the simplifications that have been submitted by the contributors. 2. WikiAstronauts
1. experiment.csv
- The table (in .csv format) that was used on the CrowdFlower platform.
- It contains the annotated version of the sentences from the WikiAstronauts dataset.
1. f900315.csv
- The table (in .csv format) that contains all the simplifications that have been submitted by the contributors.
Data 1. MedlinePlus 2. WikiAstronauts
- Folders that contain (in .csv and .xml format) the original sentences from the two datasets along with their respective annotated versions and the triples.
Dataset-MedlinePlus.py * Python script that parses the original sentences that are found at ./Data/MedlinePlus/XML. * Provides various statistics regarding the dataset. * Stores the CrowdFlower table (in .csv format) at ./CrowdFlower/MedlinePlus/experiment.csv.
Dataset-WikiAstronauts.py * Python script that parses the original sentences that are found at ./Data/WikiAstronauts/XML. * Provides various statistics regarding the dataset. * Stores the CrowdFlower table (in .csv format) at ./CrowdFlower/WikiAstronauts/experiment.csv.
Output-MedlinePlus.py * Python script that processes the simplifications that are proposed by the contributors and are located at ./CrowdFlower/MedlinePlus/f902529.csv. * Implements a variety of metrics in order to choose the most appropriate simplification. * By executing python Output-MedlinePlus.py --evaluation, it selects 30 random sentences that are used for the evaluation purposes and are stored by default at ../Evaluation/MedlinePlus-SemRep.xls.
Output-WikiAstronauts.py * Python script that processes the simplifications that are proposed by the contributors and are located at ./CrowdFlower/WikiAstronauts/f900315.csv. * Implements a variety of metrics in order to choose the most appropriate simplification. * By executing python Output-WikiAstronauts.py --evaluation, it selects 30 random sentences that are used for the evaluation purposes and are stored by default at ../Evaluation/WikiAstronauts-DBpedia.xls.

License

This project is licensed under the terms of the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Datasets		Datasets
Evaluation		Evaluation
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S3 Dataset

Contents

License

About

Releases

Packages

Languages

License

pvougiou/KB-Text-Alignment

Folders and files

Latest commit

History

Repository files navigation

S3 Dataset

Contents

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages