Bimoedical data science -- the merger of advanced and rigorous methods with large-scale biological and clinical data -- has the potential to change the way we study and thinking about human disease. One of the greatest challenges for biomedical data scientsits, however, is identifying an important problem to work on. Those just entering the field often do not have the biological domain expertise necessary to ask the right questions. Further, there is little guidance on where to start studying.
udib is a database of important questions. Here, the focus is on diseses that are understudied and high impact. Consider endometriosis, for example. Endometriosis is a condition that will affect 10% of women over the course of their lifetimes yet receives only $10 million per year in research funding from the National Institutes of Health (NIH). As a comparison, diabetes affects 10% of the population and receives over $1 billion per year in NIH research funding. An emerging data scientists could make a large impact in areas that are important and understudied.
The udib database will be constantly evolging. To begin we will use two resources: the Columbia Open Health Dataset (http://cohd.nsides.io) and the NIH RePORTER (https://projectreporter.nih.gov/reporter.cfm). We will rank by the ratio of each disease's prevelance in a large academic health center and the proportion of funding received annually from the NIH.
Please feel free to use and contribute to this resource. It is provided as a service from the Tatonetti Lab at Columbia University (http://tatonettilab.org).
An initial (and incomplete) data table has been generated for High Level Terms: