- Faculty: Electrical and Computer Engineering
- Level: Master's Program
- Course: Machine Learning
- Professor: Prof. Dr. Lule AHMEDI
- Assistant: PhD. Candidate Mërgim HOTI
- Student: Gentrit IBISHI
This project undertakes a comprehensive analysis of execution times for image recognition tasks offloaded to various edge servers, including MacBook Pro models, a Raspberry Pi, and a Virtual Machine (VM). The aim is to evaluate the performance and efficiency of mobile edge computing environments in handling image recognition tasks. The dataset and Python code utilized in this analysis are integral to an experimental study designed to benchmark execution times across diverse hardware configurations.
- Dataset name: Image Recognition Task Execution Times in Mobile Edge Computing
- Dataset link: (https://archive.ics.uci.edu/dataset/859/image+recognition+task+execution+times+in+mobile+edge+computing)
- Dataset length: 4 csv files with 1000 lines of rows.
- Dataset columns: Local Time, Execution Time
- Total datasets lines before preprocessing: 4000 rows
- Total datasets lines after preprocessing: Around 3759 rows based on Classification and Regression
- Dataset description: The datasets detail the execution times (in seconds) for an image recognition task performed on different machines/edge servers.
The tasks were executed using the
imageai.Prediction
machine learning library. The "Turnaround Time" (TAT) encompasses the period from when the image is transferred to the edge server until the image recognition result is received back at the mobile edge node.
The initial phase of the analysis involves data preprocessing, dataset combination, feature extraction, and constructing a Random Forest Regressor model to decipher the factors influencing execution times.
-
Preprocessing: This step involves imputing missing values, handling outliers, and extracting time-based features.
-
Feature Engineering: Features such as the hour and day of the week are extracted from the timestamps.
-
Model Training: A RandomForestRegressor is employed to predict execution times.
-
Evaluation: The model's performance is gauged using metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the R2 Score.
During training, the model learns patterns on the data related to the target variable. It is therefore essential to analyze and compare the performance of the model against other algorithms to determine if further additional optimizations are needed. Different splits of test and training data will be analyzed to evaluate performance using two methods:
Through MSE and R2 Score metrics. The types of algorithms we will analyze Linear Regression, Random Forest Regressor and Ridge Regression.
The ratio of test and training data is 0.1 to 0.9
The ratio of test and training data is 0.2 to 0.8
The ratio of test and training data is 0.3 to 0.7
The ratio of test and training data is 0.4 to 0.6
Based on the results generated by these metrics, it appears that Case I (test and training data split ratio of 0.1:0.9) performs slightly better than the other cases for Logistic Regression.
Through Accuracy, F1-score, Recall and Precision metrics. The types of algorithms we will analyze Logistic Regression, Decision Tree, Random Forest, SVM, KNN and Naive Bayes.
The ratio of test and training data is 0.1 to 0.9
The ratio of test and training data is 0.2 to 0.8
The ratio of test and training data is 0.3 to 0.7
The ratio of test and training data is 0.4 to 0.6
Based on the results generated by these metrics, it appears that Case I (test and training data split ratio of 0.1:0.9) performs slightly better than the other cases for Logistic Classification.
Through MSE and R2 Score metrics. The types of algorithms we will analyze Linear Regression, Random Forest Regressor and Ridge Regression.
IMAGE
The ratio of test and training data is 0.1 to 0.9
The ratio of test and training data is 0.2 to 0.8
The ratio of test and training data is 0.3 to 0.7
The ratio of test and training data is 0.4 to 0.6
Mean Squared Error by Model and Test Size
R-squared by Model and Test size
Through Accuracy, F1-score, Recall and Precision metrics. The types of algorithms we will analyze Logistic Regression, Decision Tree, Random Forest, SVM, KNN and Naive Bayes.
The ratio of test and training data is 0.1 to 0.9
The ratio of test and training data is 0.2 to 0.8
The ratio of test and training data is 0.3 to 0.7
The ratio of test and training data is 0.4 to 0.6
Accuracy by Test Size
F1 Score by Test Size
Precision by Test Size
Recall by Test Size
After analyzing all the results, we observed that classification with more data performs better than regression. Consequently, we decided to use classification for prediction. Among the evaluated algorithms—Logistic Regression, Decision Tree, Random Forest, SVM, and Naive Bayes—Random Forest consistently demonstrated higher accuracy and reliability with larger datasets. Therefore, for our predictive tasks, we have selected Random Forest due to its superior performance and robustness.
To replicate this analysis or apply the methodology to similar datasets, ensure the following libraries are installed:
pip install pandas numpy scikit-learn
To initiate the model preparation and training process, execute:
python model_preparation.ipynb --> To see how model is preparing by doing preprocessing steps removing outliers by IQR method, impute missing value with median.
python trainingModelRegressionAlgorithms.ipynb --> To see how is performing Regression Model in a couple of algorithms to predict time excetion.
python trainingModelClassificationAlgorithms.ipynb --> To see how is performing Classification Model in a couple of algorithms to predict time excetion.
python retraining_classification.ipynb --> To see how is performing Classification Model in a couple of algorithms with up to 15K+ lines to predict time excetion.
python retraining_regression.ipynb --> To see how is performing Regression Model in a couple of algorithms with up to 15K+ lines to predict time excetion.