Skip to content

Commit

Permalink
Merge pull request #306 from korpling/fix/exmaralda-tlis-without-times
Browse files Browse the repository at this point in the history
Fix/exmaralda tlis without times
  • Loading branch information
MartinKl authored Aug 20, 2024
2 parents 2bbc831 + a41c6e3 commit ff957db
Show file tree
Hide file tree
Showing 9 changed files with 498 additions and 24 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- `exmaralda` import now ranks order of tlis higher than sorting by time value (more compatible with modern EXMARaLDA files)

### Fixed

- `exmaralda` import keeps events with missing time values

## [0.15.0] - 2024-08-14

## [0.15.0] - 2024-08-14
Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
| Type | Modules |
|------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Import formats | [conllu](importers/conllu.md), [exmaralda](importers/exmaralda.md), [graphml](importers/graphml.md), [meta](importers/meta.md), [none](importers/none.md), [opus](importers/opus.md), [path](importers/path.md), [ptb](importers/ptb.md), [relannis](importers/relannis.md), [saltxml](importers/saltxml.md), [textgrid](importers/textgrid.md), [toolbox](importers/toolbox.md), [treetagger](importers/treetagger.md), [xlsx](importers/xlsx.md), [xml](importers/xml.md) |
| Export formats | [graphml](exporters/graphml.md), [exmaralda](exporters/exmaralda.md), [sequence](exporters/sequence.md), [table](exporters/table.md), [textgrid](exporters/textgrid.md), [xlsx](exporters/xlsx.md) |
| Export formats | [conllu](exporters/conllu.md), [graphml](exporters/graphml.md), [exmaralda](exporters/exmaralda.md), [sequence](exporters/sequence.md), [table](exporters/table.md), [textgrid](exporters/textgrid.md), [xlsx](exporters/xlsx.md) |
| Graph operations | [check](graph_ops/check.md), [collapse](graph_ops/collapse.md), [filter](graph_ops/filter.md), [visualize](graph_ops/visualize.md), [enumerate](graph_ops/enumerate.md), [link](graph_ops/link.md), [map](graph_ops/map.md), [revise](graph_ops/revise.md), [chunk](graph_ops/chunk.md), [split](graph_ops/split.md), [none](graph_ops/none.md) |
151 changes: 151 additions & 0 deletions docs/exporters/conllu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# conllu (exporter)

This module exports a graph in CoNLL-U format.

## Configuration

### doc

This key is used to determine nodes that whose part-of subgraph constitutes a document, i. e. the entire input for a file.
Default is `annis::doc`, or `{ ns = "annis", name = "doc" }`.

Example:
```toml
[export.config]
doc = "annis::doc"
```

### groupby

This optional annotation key is used to identify annotation spans, that constitute a sentence. Default is no export of sentence blocks.
Default is `annis::doc`, or `{ ns = "annis", name = "doc" }`.

Example:
```toml
[export.config]
groupby = "norm::sentence"
```

### ordering

The nodes connected by this annotation component are used as nodes defining a line in a CoNLL-U file. Usually you want to use an ordering.
Default is `{ ctype = "Ordering", layer = "annis", name = "" }`.

Example:
```toml
[export.config]
ordering = { ctype = "Ordering", layer = "annis", name = "norm" }
```

### form

This annotation key is used to write the form column.
Default is `{ ns = "annis", name = "tok" }`.

Example:
```toml
[export.config]
form = { ns = "norm", name = "norm" }
```

### lemma

This annotation key is used to write the lemma column.
Default is `{ ns = "", name = "tok" }`.

Example:
```toml
[export.config]
lemma = { ns = "norm", name = "lemma" }
```

### upos

This annotation key is used to write the upos column.
Default is `{ ns = "", name = "upos" }`.

Example:
```toml
[export.config]
upos = { ns = "norm", name = "pos" }
```

### xpos

This annotation key is used to write the xpos column.
Default is `{ ns = "", name = "xpos" }`.

Example:
```toml
[export.config]
upos = { ns = "norm", name = "pos_spec" }
```

### features

This list of annotation keys will be represented in the feature column.
Default is the empty list.

Example:
```toml
[export.config]
features = ["Animacy", "Tense", "VerbClass"]
```

### dependency_component

The nodes connected by this annotation component are used to export dependencies.
Default is none, so nothing will be exported.

Example:
```toml
[export.config]
dependency_component = { ctype = "Pointing", layer = "", name = "dependencies" }
```

### dependency_anno

This annotation key is used to write the dependency relation, which will be looked for on the dependency edges.
Default is none, so nothing will be exported.

Example:
```toml
[export.config]
dependency_anno = { ns = "", name = "deprel" }
```

### enhanced_components

The listed components will be used to export enhanced dependencies. More than
one component can be listed.
Default is the empty list, so nothing will be exported.

Example:
```toml
[export.config]
enhanced_components = [{ ctype = "Pointing", layer = "", name = "dependencies" }]
```

### enhanced_annos

This list of annotation keys defines the annotation keys, that correspond to the
edge labels in the component listed in `enhanced_components`. The i-th element of
one list belongs to the i-th element in the other list. Default is the empty list.

Example:
```toml
[export.config]
enhanced_annos = ["func"]
```

### misc

This list of annotation keys will be represented in the misc column.
Default is the empty list.

Example:
```toml
[export.config]
misc = ["NoSpaceAfter", "Referent"]
```

10 changes: 10 additions & 0 deletions docs/exporters/table.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,16 @@ Example:
quote_char = "\""
```

### no_value

Provides the string sequence used for n/a. Default is the empty string.

Example:
```toml
[export.config]
no_value = "n/a"
```

### ingoing

By listing annotation components, the ingoing edges of that component and their annotations
Expand Down
26 changes: 6 additions & 20 deletions src/importer/exmaralda/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,7 @@ impl ImportEXMARaLDA {
let mut speaker_map = BTreeMap::new();
let mut parent_map: BTreeMap<String, BTreeMap<String, String>> = BTreeMap::new();
let mut already_defined: BTreeSet<String> = BTreeSet::new();
let mut named_orderings: BTreeMap<String, Vec<(OrderedFloat<f64>, String)>> =
BTreeMap::new();
let mut named_orderings: BTreeMap<String, Vec<(usize, String)>> = BTreeMap::new();
let mut tlis = Vec::new();
// reader
let f = File::open(document_path)?;
Expand Down Expand Up @@ -441,18 +440,6 @@ impl ImportEXMARaLDA {
"{}#{}_{}_{}-{}",
doc_node_name, tier_type, speaker_id, start_id, end_id
); // this is not a unique id as not intended to be
let start_time = if let Some((Some(t), _)) = timeline.get(key) {
t
} else {
if let Some(sender) = tx {
let msg = format!(
"Could not determine start time of event {}::{}:{}-{}. Event will be skipped.",
&speaker_id, &anno_name, &start_id, &end_id
);
sender.send(StatusMessage::Warning(msg))?;
}
continue;
};
if !already_defined.contains(&node_name) {
update.add_event(UpdateEvent::AddNode {
node_name: node_name.to_string(),
Expand Down Expand Up @@ -492,7 +479,9 @@ impl ImportEXMARaLDA {
}
continue;
};
if let Some((Some(end_time), _)) = node_tpl {
if let (Some((Some(start_time), _)), Some((Some(end_time), _))) =
(timeline.get(key), node_tpl)
{
update.add_event(UpdateEvent::AddNodeLabel {
node_name: node_name.to_string(),
anno_ns: ANNIS_NS.to_string(),
Expand All @@ -517,7 +506,7 @@ impl ImportEXMARaLDA {
anno_value: text.to_string(),
})?;
// order nodes
let order_tpl = (*start_time, node_name.to_string());
let order_tpl = (start_i, node_name.to_string());
match named_orderings.entry(anno_name.to_string()) {
std::collections::btree_map::Entry::Vacant(e) => {
e.insert(vec![order_tpl]);
Expand Down Expand Up @@ -574,10 +563,7 @@ impl ImportEXMARaLDA {
// build order relations
for (name, node_name_vec) in named_orderings {
let mut prev = None;
for (_, node_name) in node_name_vec
.into_iter()
.sorted_by(|a, b| a.0.total_cmp(&b.0))
{
for (_, node_name) in node_name_vec.into_iter().sorted_by(|a, b| a.0.cmp(&b.0)) {
if let Some(source) = prev {
update.add_event(UpdateEvent::AddEdge {
source_node: source,
Expand Down
Loading

0 comments on commit ff957db

Please sign in to comment.