Skip to content

Commit

Permalink
Merge pull request #13 from harmsm/main
Browse files Browse the repository at this point in the history
added fit quality tests
  • Loading branch information
harmsm authored Sep 2, 2024
2 parents f46ca1c + c03cf9b commit 1983573
Show file tree
Hide file tree
Showing 36 changed files with 241,108 additions and 3,530 deletions.
85 changes: 61 additions & 24 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,30 @@ dataprob

.. image:: docs/badges/coverage-badge.svg

dataprob was designed to allow scientists to easily fit user-defined models to
experimental data. It allows maximum likelihood, bootstrap, and Bayesian
analyses with a simple and consistent interface.

dataprob was designed to allow experimentalists to fit parameters from arbitrary
models to experimental data.
Design principles
=================

+ **ease of use:** Users write a python function that describes their model,
then load in their experimental data as a dataframe. A full analysis can
be run with two python commands.
+ **dataframe centric:** Users use a dataframe to specify parameter bounds,
then load in their experimental data as a dataframe.
+ **dataframe centric:** Uses a pandas dataframe to specify parameter bounds,
guesses, fixedness, and priors. Observed data can be passed in as a
dataframe or numpy vector. All outputs are simple pandas dataframes.
dataframe or numpy vector. All outputs are pandas dataframes.
+ **consistent experience:** Users can run maximum-likelihood, bootstrap
resampling, or Bayesian MCMC analyses with an identical interface and nearly
identical diagnostic outputs.
+ **interpretable:** Provides simple diagnostic plots and runs tests assessing
fit results, flagging problems with residuals and co-varying parameters.
+ **interpretable:** Provides diagnostic plots and runs tests to validate
fit results.

Simple example
==============

The following code generates noisy linear data and uses dataprob to find
the maximum likelihood estimate of its slope and intercept.
`Run on Google Colab <simple-example_>`_.

.. code-block:: python
Expand Down Expand Up @@ -54,9 +57,23 @@ the maximum likelihood estimate of its slope and intercept.
y_std=0.5)
# 4. Access results
print(f.fit_df)
fig = dataprob.plot_summary(f)
fig = dataprob.plot_corner(f)
print(f.fit_df)
print(f.fit_quality)
The plots will be:

.. image:: _static/simple-example_plot-summary.svg
:align: center
:alt: data.plot_summary result
:width: 75%

.. image:: _static/simple-example_plot-corner.svg
:align: center
:alt: data.plot_corner result
:width: 75%


The ``f.fit_df`` dataframe will look something like:

Expand All @@ -68,24 +85,42 @@ The ``f.fit_df`` dataframe will look something like:
| ``b`` | ``b`` | 5.644 | 0.274 | 4.465 | 6.822 | ... | ``NaN`` |
+-------+-------+----------+-------+--------+---------+-------+-----------+

The plots will be:
The ``f.fit_quality`` dataframe will look something like:

+---------------+---------------------------------------------+---------+---------+
| name | description | is_good | value |
+===============+=============================================+=========+=========+
| num_obs | number of observations | True | 25.000 |
+---------------+---------------------------------------------+---------+---------+
| num_param | number of fit parameters | True | 2.000 |
+---------------+---------------------------------------------+---------+---------+
| lnL | log likelihood | True | -18.761 |
+---------------+---------------------------------------------+---------+---------+
| chi2 | chi^2 goodness-of-fit | True | 0.241 |
+---------------+---------------------------------------------+---------+---------+
| reduced_chi2 | reduced chi^2 | True | 1.192 |
+---------------+---------------------------------------------+---------+---------+
| mean0_resid | t-test for residual mean != 0 | True | 1.000 |
+---------------+---------------------------------------------+---------+---------+
| durbin-watson | Durbin-Watson test for correlated residuals | True | 2.265 |
+---------------+---------------------------------------------+---------+---------+
| ljung-box | Ljung-Box test for correlated residuals | True | 0.943 |
+---------------+---------------------------------------------+---------+---------+

.. image:: docs/source/_static/simple-example_plot-summary.svg

.. image:: docs/source/_static/simple-example_plot-corner.svg

Installation
============

We recommend installing dataprob with pip:

.. code-block:: bash
.. code-block:: shell
pip install dataprob
To install from source and run tests:

.. code-block:: bash
.. code-block:: shell
git clone https://github.com/harmslab/dataprob.git
cd dataprob
Expand All @@ -101,16 +136,18 @@ A good way to learn how to use the library is by working through examples. The
following notebooks are included in the `dataprob/examples/` directory. They are
self-contained demonstrations in which dataprob is used to analyze various
classes of experimental data. The links below launch each notebook in Google
colab:

+ `linear.ipynb <https://githubtocolab.com/harmslab/dataprob/blob/main/examples/linear.ipynb>`_: fit a linear model to noisy data (2 parameter, linear)
+ `binding.ipynb <https://githubtocolab.com/harmslab/dataprob/blob/main/examples/binding.ipynb>`_: a single-site binding interaction (2 parameter, sigmoidal curve)
+ `michaelis-menten.ipynb <https://githubtocolab.com/harmslab/dataprob/blob/main/examples/michaelis-menten.ipynb>`_: Michaelis-Menten model of enzyme kinetics (2 parameter, sigmoidal curve)
+ `lagged-exponential.ipynb <https://githubtocolab.com/harmslab/dataprob/blob/main/examples/lagged-exponential.ipynb>`_: bacterial growth curve with initial lag phase (3 parameter, exponential)
+ `multi-gaussian.ipynb <https://githubtocolab.com/harmslab/dataprob/blob/main/examples/multi-gaussian.ipynb>`_: two overlapping normal distributions (6 parameter, Gaussian)
+ `periodic.ipynb <https://githubtocolab.com/harmslab/dataprob/blob/main/examples/periodic.ipynb>`_: periodic data (3 parameter, sine)
+ `polynomial.ipynb <https://githubtocolab.com/harmslab/dataprob/blob/main/examples/polynomial.ipynb>`_: nonlinear data with no obvious form (5 parameter, polynomial)
+ `linear-extrapolation-folding.ipynb <https://githubtocolab.com/harmslab/dataprob/blob/main/examples/linear-extrapolation-folding.ipynb>`_: protein equilibrium unfolding data (6 parameter, linear embedded in sigmoidal)
Colab:

+ `api-example.ipynb <api-example_>`_: shows various features of the API when analyzing a linear model
+ `linear.ipynb <linear-example_>`_: fit a linear model to noisy data (2 parameter, linear)
+ `binding.ipynb <binding-example_>`_: a single-site binding interaction (2 parameter, sigmoidal curve)
+ `michaelis-menten.ipynb <michaelis-menten-example_>`_: Michaelis-Menten model of enzyme kinetics (2 parameter, sigmoidal curve)
+ `lagged-exponential.ipynb <lagged-exponential-example_>`_: bacterial growth curve with initial lag phase (3 parameter, exponential)
+ `multi-gaussian.ipynb <multi-gaussian-example_>`_: two overlapping normal distributions (6 parameter, Gaussian)
+ `periodic.ipynb <periodic-example_>`_: periodic data (3 parameter, sine)
+ `polynomial.ipynb <polynomial-example_>`_: nonlinear data with no obvious form (5 parameter, polynomial)
+ `linear-extrapolation-folding.ipynb <linear-extrapolation-folding-example_>`_: protein equilibrium unfolding data (6 parameter, linear embedded in sigmoidal)


Documentation
=============
Expand Down
2 changes: 1 addition & 1 deletion docs/badges/tests-badge.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
236,154 changes: 236,154 additions & 0 deletions docs/source/_static/good-bad-fit.ai

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/source/_static/good-bad-fit.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 1983573

Please sign in to comment.