Image Recognition Task Execution Times in Mobile Edge Computing

At the University of Pristina

Faculty: Electrical and Computer Engineering
Level: Master's Program
Course: Machine Learning
Professor: Prof. Dr. Lule AHMEDI
Assistant: PhD. Candidate Mërgim HOTI
Student: Gentrit IBISHI

This project undertakes a comprehensive analysis of execution times for image recognition tasks offloaded to various edge servers, including MacBook Pro models, a Raspberry Pi, and a Virtual Machine (VM). The aim is to evaluate the performance and efficiency of mobile edge computing environments in handling image recognition tasks. The dataset and Python code utilized in this analysis are integral to an experimental study designed to benchmark execution times across diverse hardware configurations.

Dataset Overview

Dataset name: Image Recognition Task Execution Times in Mobile Edge Computing
Dataset link: (https://archive.ics.uci.edu/dataset/859/image+recognition+task+execution+times+in+mobile+edge+computing)
Dataset length: 4 csv files with 1000 lines of rows.
Dataset columns: Local Time, Execution Time
Total datasets lines before preprocessing: 4000 rows
Total datasets lines after preprocessing: Around 3759 rows based on Classification and Regression
Dataset description: The datasets detail the execution times (in seconds) for an image recognition task performed on different machines/edge servers. The tasks were executed using the imageai.Prediction machine learning library. The "Turnaround Time" (TAT) encompasses the period from when the image is transferred to the edge server until the image recognition result is received back at the mobile edge node.

Phase I: Data Analysis and Model Preparation

The initial phase of the analysis involves data preprocessing, dataset combination, feature extraction, and constructing a Random Forest Regressor model to decipher the factors influencing execution times.

Key Steps

Preprocessing: This step involves imputing missing values, handling outliers, and extracting time-based features.
Feature Engineering: Features such as the hour and day of the week are extracted from the timestamps.
Model Training: A RandomForestRegressor is employed to predict execution times.
Evaluation: The model's performance is gauged using metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the R2 Score.

Phase II: Analysis and evaluation

During training, the model learns patterns on the data related to the target variable. It is therefore essential to analyze and compare the performance of the model against other algorithms to determine if further additional optimizations are needed. Different splits of test and training data will be analyzed to evaluate performance using two methods:

Regression

Through MSE and R2 Score metrics. The types of algorithms we will analyze Linear Regression, Random Forest Regressor and Ridge Regression.

Results for Regression:

Case I:

The ratio of test and training data is 0.1 to 0.9

Case II:

The ratio of test and training data is 0.2 to 0.8

Case III:

The ratio of test and training data is 0.3 to 0.7

Case IV:

The ratio of test and training data is 0.4 to 0.6

Final Decision

Based on the results generated by these metrics, it appears that Case I (test and training data split ratio of 0.1:0.9) performs slightly better than the other cases for Logistic Regression.

Visualization

Mean Squared Error (MSE)

R2 Score

Classification

Through Accuracy, F1-score, Recall and Precision metrics. The types of algorithms we will analyze Logistic Regression, Decision Tree, Random Forest, SVM, KNN and Naive Bayes.

Results for Classification:

Case I:

The ratio of test and training data is 0.1 to 0.9

Case II:

The ratio of test and training data is 0.2 to 0.8

Case III:

The ratio of test and training data is 0.3 to 0.7

Case IV:

The ratio of test and training data is 0.4 to 0.6

Final Decision

Based on the results generated by these metrics, it appears that Case I (test and training data split ratio of 0.1:0.9) performs slightly better than the other cases for Logistic Classification.

Visualization

Accuracy

F1 Score

Precision

Recall

Phase III - Analysis and evaluation (Retraining) & Application of ML tools

Function for generating 15K+ lines random

We retrained the model for both regression and classification methods and obtained these results:

Regression

Through MSE and R2 Score metrics. The types of algorithms we will analyze Linear Regression, Random Forest Regressor and Ridge Regression.

IMAGE

Results for Regression:

Case I:

The ratio of test and training data is 0.1 to 0.9

Case II:

The ratio of test and training data is 0.2 to 0.8

Case III:

The ratio of test and training data is 0.3 to 0.7

Case IV:

The ratio of test and training data is 0.4 to 0.6

Visualization

Mean Squared Error by Model and Test Size

R-squared by Model and Test size

Classification

Through Accuracy, F1-score, Recall and Precision metrics. The types of algorithms we will analyze Logistic Regression, Decision Tree, Random Forest, SVM, KNN and Naive Bayes.

Results for Classification:

Case I:

The ratio of test and training data is 0.1 to 0.9

Case II:

The ratio of test and training data is 0.2 to 0.8

Case III:

The ratio of test and training data is 0.3 to 0.7

Case IV:

The ratio of test and training data is 0.4 to 0.6

Visualization

Accuracy by Test Size

F1 Score by Test Size

Precision by Test Size

Recall by Test Size

After all results - We decided to use Random Forest

After analyzing all the results, we observed that classification with more data performs better than regression. Consequently, we decided to use classification for prediction. Among the evaluated algorithms—Logistic Regression, Decision Tree, Random Forest, SVM, and Naive Bayes—Random Forest consistently demonstrated higher accuracy and reliability with larger datasets. Therefore, for our predictive tasks, we have selected Random Forest due to its superior performance and robustness.

Usage

To replicate this analysis or apply the methodology to similar datasets, ensure the following libraries are installed:

pip install pandas numpy scikit-learn

To initiate the model preparation and training process, execute:

python model_preparation.ipynb --> To see how model is preparing by doing preprocessing steps removing outliers by IQR method, impute missing value with median.
python trainingModelRegressionAlgorithms.ipynb --> To see how is performing Regression Model in a couple of algorithms to predict time excetion.
python trainingModelClassificationAlgorithms.ipynb --> To see how is performing Classification Model in a couple of algorithms to predict time excetion.
python retraining_classification.ipynb --> To see how is performing Classification Model in a couple of algorithms with up to 15K+ lines to predict time excetion.
python retraining_regression.ipynb --> To see how is performing Regression Model in a couple of algorithms with up to 15K+ lines to predict time excetion.

Files

README.md

Latest commit

History

README.md

File metadata and controls

Image Recognition Task Execution Times in Mobile Edge Computing

At the University of Pristina

Dataset Overview

Phase I: Data Analysis and Model Preparation

Key Steps

Phase II: Analysis and evaluation

Regression

Results for Regression:

Case I:

Case II:

Case III:

Case IV:

Final Decision

Visualization

Mean Squared Error (MSE)

R2 Score

Classification

Results for Classification:

Case I:

Case II:

Case III:

Case IV:

Final Decision

Visualization

Accuracy

F1 Score

Precision

Recall

Phase III - Analysis and evaluation (Retraining) & Application of ML tools

Function for generating 15K+ lines random

We retrained the model for both regression and classification methods and obtained these results:

Regression

Results for Regression:

Case I:

Case II:

Case III:

Case IV:

Visualization

Classification

Results for Classification:

Case I:

Case II:

Case III:

Case IV:

Visualization

After all results - We decided to use Random Forest

Usage