Skip to content

Thoughts on 'third generation' community taxonomy editing system

Jonathan A Rees edited this page Oct 15, 2019 · 5 revisions

See also: Declarative 'patch' system

While the open tree reference taxonomy assembly tool now has two generations of taxonomy editing features, it has become increasingly clear that neither version is adequate for imagined future applications, and we can learn from and improve on both. Here are some thoughts on what the third generation system ought to be like.

Major goals for such a system:

  • It should in general be possible to write and test edit directives without the help or intervention of Open Tree personnel ("community editing").
  • Provenance should be captured for all directives (typically in the form of a URL) and all uses of taxon names.
  • Directives should be reusable. If a set of directives applies to one of Open Tree's input taxonomies such as IRMNG, it should be possible to apply them independently to IRMNG, independent of the Open Tree reference taxonomy. That is, the syntax and semantics of the directives should not be tied to the Open Tree project.

As before, the directives should be representable as human-readable and human-writable snippets. Version 1 uses spreadsheets, which is inappropriate since different directives take different numbers and kinds of arguments. Version 2 uses python syntax, which may or may not be the best choice. Other notational choices should be considered.

Features that I would like to see:

  • Declarative semantics. Each directive should be interpretable as a scientific claim. The current systems don't do this; directives are edits to the tree, and the outcome of executing directives is very sensitive to the order in which they are processed. Order dependence is not completely avoidable but it should be minimized as much as possible.
  • Idempotence. The appearance of an edit in an edit set twice, should have the same effect as it occurring once. In other words, if a taxonomy already reflects a claim, then applying the claim to the taxonomy should have no effect. This is actually a consequence of declarative semantics but is important enough to be called out separately.
  • Completeness. Anything that can be done through taxonomy ingest should be doable (if not as efficiently) using the edit system. This includes synonyms and attaching "flags" to taxa. Similarly operations performed by smasher's alignment procedure, such as identity and non-identity of taxa in different taxonomies, should be expressible.
  • Graphical front end. Some users will find a graphical interface very attractive, and it may be possible to create one with a modest amount of effort. What I have in mind is that a part of a taxonomic tree would be displayed, and one would be allowed to do a few simple operations like adding a new node or changing the topology by moving a node from one location to another.
  • Multiple interpreters. There are things you might want to with a set of directives other than to apply them to a base taxonomy. Any parser should deliver a data structure that can be processed differently by different tools. E.g. it might be desirable to display directives to users in a form other than the 'native' directive language.

I envision that the front end would not itself execute directives on the reference taxonomy directly, but that the output of the graphical front end would be a set of directives that can go through quality control and staging before being deployed in the reference taxonomy. (Cody and I talked about this kind of tool back in August 2013.)

Although this project is very different from Nico Franz's work (e.g.), I take his method and philosophy to be inspirational.