Skip to content
Daniel Jacob edited this page Mar 9, 2020 · 27 revisions

ODAM: Open Data, Access to data Mining

Give an open access to your data and make them ready to be mined

Purpose

Here, we propose a simple way to make research data broadly accessible and fully available for reuse, including by a script language such as R or Python. The main purpose is to make a dataset accessible online with minimal effort from the data provider, and to allow any scientists or bioinformaticians to be able to explore the dataset and then extract a subpart or the totality of the data according to their needs.

Each time we plan to share data coming from a common experimental design, the classical challenges for fast using data by every partner are data storage and data access. We propose an approach for sharing project data all along its development phase, from the setup of the experimental schema up to the data acquisition from the various analyzes of samples, so that all data is readily available as soon as they are generated. Based on the following criteria:

  • Centrally manage identifiers (plants, harvests, samples, ...) so that they are unique and shared by all
  • Avoid the implementation of a complex data management system (requiring a data model) given that many changes can occur during the project. (possibility of new analysis, new measures or give up some others, ...)
  • Facilitates the subsequent publication of data: either the data can serve to fill in an existing database or the data can be broadcast through a web-service approach with the associated metadata.

For this work, we made the choice to keep the good old way of scientist to use worksheets, thus using the same tool for both data files and metadata definition files. Moreover, our approach gives data access through web-services thus providing a good way to connect distributed data. This approach has to be regarded as complementary with publication of the data online within an institutional data repository as described in re3data.org for instance (e.g. INRAE Data Portal), associated or not with a scientific paper. Whereas institutional data repository focus on the experiment description with the corresponding descriptive metadata (e.g. FRIM1 dataset), our approach, by adjoining some minimal but relevant structural metadata, gives access to the data themselves with the possibility to explore and mine them.

Objectives

  • make research data locally or broadly accessible all along the project
  • allow any (data) scientists to be able to explore the dataset and then extract a subpart or the totality of the data according to their needs
  • allow data to be selected then, downloadable by web API
  • allow data and analysis to be visualized online

Guideline keywords

  • simplicity, flexibility, efficiency

ODAM framework allows experimental data tables to be widely accessible and fully reusable including through a scripting language such as R, and this with minimal effort on the part of the data provider.

  • The approach consists in building a web-based data network, based on appropriate technologies (web API), and using standard data formats (TSV, JSON).
  • Web applications, each with a clearly defined objective, then operate this network.
  • A data can therefore be used for several applications and vice versa. The data management system becomes completely independent of its operation.
  • The data is thus “decompartmentalized", a sine qua non condition for the Web of Data

See FAIR_and_DataLife_DJ_Oct2019.pdf for a presentation on the ODAM framework, its aims and what can we do with it for what purposes.

Clone this wiki locally