The kaggle computer vision task: https://www.kaggle.com/datasets/prashant268/chest-xray-covid19-pneumonia
Dataset is organized into 2 folders (train, test) and both train and test contain 3 subfolders (COVID19, PNEUMONIA, NORMAL). Dataset contains total 6432 x-ray images and test data have 20% of total images.
pip: pip install -r requirements.txt
conda: conda create --name <env> --file requirements.txt
Most important:
- tensorflow
- pillow
- sklearn
- matplotlib
The code for this section is available in
data_exploration/data_exploration.ipynb
notebook.
To start with, we can check how many images are in each class.
Normal: 1266
Covid-19: 460
Pneumonia: 3418
As we see, this is an example of skewed data. Hence, I will use precision, recall, and f1-score as metrics besides accuracy.
- What about some bias? Is one gender much more prone to covid-19 or pneumonia? If yes, the model may learn to recognize a breast...
- Can the mean be used as a feature? More precisely, can we consider a "distance" from the mean as a feature?
Models are based on pretrained, on imagenet, CNN base models. These are:
- EfficientNetB3
- InceptionV3
- MobileNetV2
I use the TensorBoard tool to track experiments and metrics like accuracy and loss.
To inspect results of the experiments, click the Launch Tensorboard Session
button in the hp_search/hp_search.ipynb
notebook.
Optimized hyperparmeters:
- dropout rate
- learning rate
- optimizer
Nevertheless, I cannot conduct an accurate tuning with lots of models and hyperparameters because of a lack of time and computational power. Hence, I used a grid search testing only three hyperparameters. The aim was to try the TensorBoard tool, which was successfully done.