- Infrastructure: Amazon Web Services SageMaker
- Deep Learning Library: PyTorch, transfer learning with pretrained ResNet18
- Training Dataset: PlantVillage Disease Classification Challenge - Color Images
-
Model Training
- Training Data: For this project, we made use of the PlantVillage Disease Classification Challenge - Color Images. Containing 38 classes of plant diseases. One the limitations of the dataset is that it only contains images of leaves, as such is not valuable when predicting diseases on fruits or farm produce. This limitation for us informs the next step of DiseaseFinder.
- Deep Learning Model: Our starting point was to follow the PyTorch image classification tutorial. We made use of ResNet18 from the PyTorch model zoo.
- Model Training Infrastructure: Our model was trained on Amazon SageMaker Notebooks GPU instance, ml.p2.xlarge. You can find the training Jupyter Notebook here. Once the model was trained on GPU, we saved the model as a pickle object in both GPU and CPU versions.
-
Model Deployment:
- Hosting: For the purposes of the hackathon, we realized that it would be expensive to have real-time inference hosted on AWS SageMaker. Hence, we decided to host the model with Heroku Containers running with Python Flask API, however, this poses latency challenges. Once we see traction with the solution, deployment on Amazon SageMaker would be next.
- Continuous Deployment: In addition to hosting the model on Heroku, we implemented a continuous deployment pipeline with Amazon CodeBuild, with the goal of embracing Continuous Delivery for Machine Learning.
- Currently, we had access to dataset containing images of leaves. This alone limits the performance of the model as it would not be able to classify the fruits. We hope to collect data on the fruits and improve on the performance of the model.
- We would also want to move from only classifying diseases to actually detecting spots with object detection or further improvement with instance segmentation
- Expensive but affordable to train on GPU due to infrastructure cost. This is because GPU even though on the cloud do not come cheap.
- Deployment might also be expensive. However, we hosted the model for free on Heroku. If the project at some point starts generating revenue, we will move to scale the deployment on AWS SageMaker.
Below is an example API request to the deployed deep learning model
$ curl -XPOST -H "Content-Type: application/json" -d '{"ImageUrl": "https://www.sciencesource.com/Doc/TR1_WATERMARKED/7/5/4/f/SS2839121.jpg?d63644905319" }' https://disease-finder-api.herokuapp.com/invocations
{"prediction": "Cherry (including sour) healthy"}