Skip to content
Justin Clark-Casey edited this page Mar 8, 2018 · 29 revisions

State of play

schema.org is a community project to develop a set of schemas that can be embedded in webpages, in formats such as JSON-LD, RDFa and Microdata. Example schemas include Movie, Store and Product. Among other usecases, this embedded data can then be crawled by search engines such as Google and Yandex, and used to return useful structured results on queries (such as the information boxes you see on some Google search results).

Bioschemas is a community project by the life sciences community to specify how schemas from schema.org can be used to markup life sciences information. As such, it has 2 aspects:

  1. When using existing schema.org schemas, such as DataCatalog and Dataset, Bioschemas will specify which properties are mandatory, which optional and the cardinality of properties. This is because schema.org itself specifies none of these things.
  2. In some cases, Bioschemas will come up with new schemas, such as BioChemEntity to describe biological and chemical entities, where nothing suitable pre-exists in schema.org. Once these have gone through review by the Bioschemas community, they will also be suggested to the main schema.org community.

Bioschemas is an extremely young project. As such, the specifications are subject to considerable change and some are not final (in particular BioChemEntity). In addition, very few life sciences information sources have yet implemented this markup. Nonetheless, bsbang-crawler is an alpha project to start crawling this data so that it can then be searched in the companion frontend project.

As an alpha project, bsbang-crawler is itself subject to considerable change. Until now, the crawler has been custom written. However, this is a poor choice for future scalability and maintainability, so whilst there might be a bit more work done on the custom crawler code, we are actively looking at a Transition to an established crawler package.

Clone this wiki locally