Skip to content

End-to-End project integrating Statistical Learning (Machine Learning) and remote sensed and crop features to predict cover crop performance (Precision Sustainable Agriculture, USDA Beltsville Agricultural research Center)

Notifications You must be signed in to change notification settings

GuilleMarc/CoverCrops_ML_Project

Repository files navigation

Cover crops are species known to improve soil and water quality relative to a bare-ground if planted after or along with commercial crops. The success of a cover crop depends however on good establishment and healthy growth during and after the winter months. If early on the season growers could anticipate how much growth they expect from the cover crop, late field operations can be scheduled more effectively (e.g. termination of the cover crop, harvesting or planting the next cash crop, etc.). Statistical learning (~ Machine Learning), built upon statistical and computational algorithms to "learn" from data may help to accurately predict cover crop biomass and shoot-N contents across N limited environments based on early-season field information.

This repository features an end-to-end Machine learning project where two supervised regression based algorithms were trained, optimized, and validated on a set of remote sensing and field observed allometric features of winter-seeded rye grown at different N rates in the Mid-Atlantic US. Random Forests capture additional information gains of data-splitting strategies, such as those in decision trees, and add statistical power via resampling and bagging. LASSO (Least Absolute Shrinkage and Selection Operators), in turn, penalizes long and likely overfitting models, such as multiple regression on highly collinear datasets, and help balancing accurate but otherwise unstable models (i.e. low bias- high variance models).

Data description

The notebook summarizes the full data analysis process, from exploratory checks, to feature selection and creation, up to model training and validation. Lastly, a final test on unseen data revealed the potential of modern intensive approaches to characterize end-season preformance of winter cover crops and small grains alike, and may be adapted to data-oriented support tools that optimize farm decision making.

X= Selected features from a three-year study on rye adaptation at different fall-spring N rate applications (MD, PA). Common to each, features recorded or created* at stages GS20, and GS30: remote sensed NDVI, C and N shoot-contents, C/N-ratios*, early-season biomass, tiller-counts, tiller-efficiency relative to seeding rate*, NUE relative to groundbase soil N*. Files (.csv): data_25, data_30 --- The column names are straitghforward and self-explanatory

For validation purposes, The observations of two years in NC was hidden at all times from the training and testing phases.

y= Dry-weight biomass (Kg.ha), Shoot-Nitrogen content (Kg.ha), recorded at GS60 colnames in above files: ["biomass_60_y"],["shoot_n_kg_ha_60_y"]

Additional resources:

Precision Sustainable Agriculture

Goatech (Gathering for open Agricultural Technology) A group of folks who advocate open source hardware and software to meet, learn, share, and establish a common vision for creating technologies for our food system)

Sustainable Agricultural Systems Laboratory: Beltsville, MD (USDA-ARS)

About

End-to-End project integrating Statistical Learning (Machine Learning) and remote sensed and crop features to predict cover crop performance (Precision Sustainable Agriculture, USDA Beltsville Agricultural research Center)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published