Skip to content

Commit

Permalink
recreated CLDF data
Browse files Browse the repository at this point in the history
  • Loading branch information
xrotwang committed Apr 22, 2024
1 parent a1c4dcf commit dace5f6
Show file tree
Hide file tree
Showing 28 changed files with 56,596 additions and 7,832 deletions.
29 changes: 29 additions & 0 deletions .github/workflows/cldf-validation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: CLDF-validation

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10"]

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest-cldf
- name: Test with pytest
run: |
pytest --cldf-metadata=cldf/StructureDataset-metadata.json test.py
38 changes: 38 additions & 0 deletions .zenodo.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{
"creators": [
{
"name": "Louise Baird"
},
{
"name": "Nicholas Evans"
},
{
"name": "Simon J. Greenhill"
}
],
"contributors": [
{
"name": "Tiago Tresoldi",
"type": "Other"
},
{
"name": "Johann-Mattis List",
"type": "Other"
},
{
"name": "Robert Forkel",
"type": "Other"
}
],
"title": "CLDF dataset with phoneme inventories from the \"Journal of the IPA\", aggregated by Baird et al. (2021)",
"access_right": "open",
"keywords": [
"cldf:StructureDataset",
"linguistics"
],
"upload_type": "dataset",
"description": "<p>Cite the source of the dataset as:</p>\n\n<blockquote>\n<p>Baird, L., Evans, N., &amp; Greenhill, S. J. (2021). Blowing in the wind: Using &#x27;North Wind and the Sun&#x27; texts to sample phoneme inventories. Journal of the International Phonetic Association, 1\u201342. doi:10.1017/s002510032000033x</p>\n</blockquote>",
"license": {
"id": "CC0-1.0"
}
}
10 changes: 10 additions & 0 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Contributors

| Name | Role |
|:-------------------|:-------|
| Louise Baird | Author |
| Nicholas Evans | Author |
| Simon J. Greenhill | Author |
| Tiago Tresoldi | other |
| Johann-Mattis List | other |
| Robert Forkel | other |
30 changes: 27 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
# JIPA
# CLDF dataset with phoneme inventories from the "Journal of the IPA", aggregated by Baird et al. (2021)

CLDF dataset with phoneme inventories from the *Journal of the International Phonetic Association*. Aggregated by Baird et al. 2021.
[![CLDF validation](https://github.com/cldf-datasets/jipa/workflows/CLDF-validation/badge.svg)](https://github.com/cldf-datasets/jipa/actions?query=workflow%3ACLDF-validation)

* Baird L, Evans N, & Greenhill SJ. 2021. Blowing in the wind: Using 'North Wind and the Sun' texts to sample phoneme inventories. *Journal of the International Phonetic Association*, 1–42. [doi:10.1017/s002510032000033x](https://doi.org/10.1017/s002510032000033x)
## How to cite

If you use these data please cite
- the original source
> Baird, L., Evans, N., & Greenhill, S. J. (2021). Blowing in the wind: Using 'North Wind and the Sun' texts to sample phoneme inventories. Journal of the International Phonetic Association, 1–42. doi:10.1017/s002510032000033x
- the derived dataset using the DOI of the [particular released version](../../releases/) you were using

## Description


This dataset is licensed under a CC0-1.0 license

Available online at https://doi.org/10.1017/S002510032000033x



Languages representd in the dataset color-coded by language family.

![](map.svg)

## CLDF Datasets

The following CLDF datasets are available in [cldf](cldf):

- CLDF [StructureDataset](https://github.com/cldf/cldf/tree/master/modules/StructureDataset) at [cldf/StructureDataset-metadata.json](cldf/StructureDataset-metadata.json)
30 changes: 30 additions & 0 deletions RELEASING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Releasing the JIPA CLDF dataset

- Install requirements:
```shell
pip install cldfviz[cartopy]
```
- Re-create the CLDF dataset running
```shell
cldfbench makecldf cldfbench_jipa.py --glottolog-version v5.0 --with-cldfreadme --with-zenodo
cldfbench readme cldfbench_jipa.py
```
- Make sure the data is valid running
```shell
pytest
```
- Make sure data can be loaded into SQLite
```shell
rm -f jipa.sqlite
cldf createdb cldf/StructureDataset-metadata.json jipa.sqlite
```
- Recreate the coverage map
```shell
cldfbench cldfviz.map cldf --format svg --width 20 --output map.svg --with-ocean --language-properties Family --no-legend --pacific-centered
```
- Recreate the ER diagram
```shell
cldferd --format compact.svg cldf > erd.svg
```
- Commit all changes, tag the release, push code and tags.
- Create a release on GitHub and make sure it is picked up by Zenodo.
109 changes: 109 additions & 0 deletions cldf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
<a name="ds-structuredatasetmetadatajson"> </a>

# StructureDataset CLDF dataset with phoneme inventories from the "Journal of the IPA", aggregated by Baird et al. (2021)

**CLDF Metadata**: [StructureDataset-metadata.json](./StructureDataset-metadata.json)

**Sources**: [sources.bib](./sources.bib)

property | value
--- | ---
[dc:bibliographicCitation](http://purl.org/dc/terms/bibliographicCitation) | Baird, L., Evans, N., & Greenhill, S. J. (2021). Blowing in the wind: Using 'North Wind and the Sun' texts to sample phoneme inventories. Journal of the International Phonetic Association, 1–42. doi:10.1017/s002510032000033x
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF StructureDataset](http://cldf.clld.org/v1.0/terms.rdf#StructureDataset)
[dc:identifier](http://purl.org/dc/terms/identifier) | https://doi.org/10.1017/S002510032000033x
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/publicdomain/zero/1.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/cldf-datasets/jipa
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/cldf-clts/clts/tree/v2.3.0">Catalog v2.3.0</a></li><li><a href="https://github.com/cldf-datasets/jipa/tree/a1c4dcf">cldf-datasets/jipa a1c4dcf</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v5.0">Glottolog v5.0</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.10.12</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | jipa
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution


## <a name="table-valuescsv"></a>Table [values.csv](./values.csv)

property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF ValueTable](http://cldf.clld.org/v1.0/terms.rdf#ValueTable)
[dc:extent](http://purl.org/dc/terms/extent) | 6660


### Columns

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[a-zA-Z0-9_\-]+` | Primary key
[Language_ID](http://cldf.clld.org/v1.0/terms.rdf#languageReference) | `string` | References [languages.csv::ID](#table-languagescsv)
[Parameter_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | References [features.csv::ID](#table-featurescsv)
[Value](http://cldf.clld.org/v1.0/terms.rdf#value) | `string` |
[Code_ID](http://cldf.clld.org/v1.0/terms.rdf#codeReference) | `string` |
[Comment](http://cldf.clld.org/v1.0/terms.rdf#comment) | `string` |
[Source](http://cldf.clld.org/v1.0/terms.rdf#source) | list of `string` (separated by `;`) | References [sources.bib::BibTeX-key](./sources.bib)
[Contribution_ID](http://cldf.clld.org/v1.0/terms.rdf#contributionReference) | `string` | References [contributions.csv::ID](#table-contributionscsv)
`Marginal` | `boolean` |
`Allophones` | list of `string` (separated by ` `) |
`InventorySize` | `integer` |
`Value_in_Source` | `string` |

## <a name="table-featurescsv"></a>Table [features.csv](./features.csv)

property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF ParameterTable](http://cldf.clld.org/v1.0/terms.rdf#ParameterTable)
[dc:extent](http://purl.org/dc/terms/extent) | 956


### Columns

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[a-zA-Z0-9_\-]+` | Primary key
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Description](http://cldf.clld.org/v1.0/terms.rdf#description) | `string` |
`CLTS_BIPA` | `string` |
`CLTS_Name` | `string` |

## <a name="table-languagescsv"></a>Table [languages.csv](./languages.csv)

property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF LanguageTable](http://cldf.clld.org/v1.0/terms.rdf#LanguageTable)
[dc:extent](http://purl.org/dc/terms/extent) | 159


### Columns

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[a-zA-Z0-9_\-]+` | Primary key
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Macroarea](http://cldf.clld.org/v1.0/terms.rdf#macroarea) | `string` |
[Latitude](http://cldf.clld.org/v1.0/terms.rdf#latitude) | `decimal`<br>&ge; -90<br>&le; 90 |
[Longitude](http://cldf.clld.org/v1.0/terms.rdf#longitude) | `decimal`<br>&ge; -180<br>&le; 180 |
[Glottocode](http://cldf.clld.org/v1.0/terms.rdf#glottocode) | `string`<br>Regex: `[a-z0-9]{4}[1-9][0-9]{3}` |
[ISO639P3code](http://cldf.clld.org/v1.0/terms.rdf#iso639P3code) | `string`<br>Regex: `[a-z]{3}` |
`Family` | `string` |
`Glottolog_Name` | `string` |

## <a name="table-contributionscsv"></a>Table [contributions.csv](./contributions.csv)

property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF ContributionTable](http://cldf.clld.org/v1.0/terms.rdf#ContributionTable)
[dc:extent](http://purl.org/dc/terms/extent) | 159


### Columns

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[a-zA-Z0-9_\-]+` | Primary key
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Description](http://cldf.clld.org/v1.0/terms.rdf#description) | `string` |
[Contributor](http://cldf.clld.org/v1.0/terms.rdf#contributor) | `string` |
[Citation](http://cldf.clld.org/v1.0/terms.rdf#citation) | `string` |
`URL` | `string` |
[Source](http://cldf.clld.org/v1.0/terms.rdf#source) | list of `string` (separated by `;`) | References [sources.bib::BibTeX-key](./sources.bib)
[Comment](http://cldf.clld.org/v1.0/terms.rdf#comment) | `string` |
`Metadata` | `json` |
`Minimal_Pairs` | `json` |

Loading

0 comments on commit dace5f6

Please sign in to comment.