Skip to content

Latest commit

 

History

History
263 lines (159 loc) · 12.9 KB

File metadata and controls

263 lines (159 loc) · 12.9 KB

Image Recognition Task Execution Times in Mobile Edge Computing

At the University of Pristina

  • Faculty: Electrical and Computer Engineering
  • Level: Master's Program
  • Course: Machine Learning
  • Professor: Prof. Dr. Lule AHMEDI
  • Assistant: PhD. Candidate Mërgim HOTI
  • Student: Gentrit IBISHI

This project undertakes a comprehensive analysis of execution times for image recognition tasks offloaded to various edge servers, including MacBook Pro models, a Raspberry Pi, and a Virtual Machine (VM). The aim is to evaluate the performance and efficiency of mobile edge computing environments in handling image recognition tasks. The dataset and Python code utilized in this analysis are integral to an experimental study designed to benchmark execution times across diverse hardware configurations.

Dataset Overview

  • Dataset name: Image Recognition Task Execution Times in Mobile Edge Computing
  • Dataset link: (https://archive.ics.uci.edu/dataset/859/image+recognition+task+execution+times+in+mobile+edge+computing)
  • Dataset length: 4 csv files with 1000 lines of rows.
  • Dataset columns: Local Time, Execution Time
  • Total datasets lines before preprocessing: 4000 rows
  • Total datasets lines after preprocessing: Around 3759 rows based on Classification and Regression
  • Dataset description: The datasets detail the execution times (in seconds) for an image recognition task performed on different machines/edge servers. The tasks were executed using the imageai.Prediction machine learning library. The "Turnaround Time" (TAT) encompasses the period from when the image is transferred to the edge server until the image recognition result is received back at the mobile edge node.

Phase I: Data Analysis and Model Preparation

The initial phase of the analysis involves data preprocessing, dataset combination, feature extraction, and constructing a Random Forest Regressor model to decipher the factors influencing execution times.

Key Steps

  1. Preprocessing: This step involves imputing missing values, handling outliers, and extracting time-based features.

    image

    image

    image

    image

  2. Feature Engineering: Features such as the hour and day of the week are extracted from the timestamps.

  3. Model Training: A RandomForestRegressor is employed to predict execution times.

    image

    image

  4. Evaluation: The model's performance is gauged using metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the R2 Score.

    Evaluation Results

Phase II: Analysis and evaluation

During training, the model learns patterns on the data related to the target variable. It is therefore essential to analyze and compare the performance of the model against other algorithms to determine if further additional optimizations are needed. Different splits of test and training data will be analyzed to evaluate performance using two methods:

Regression

Through MSE and R2 Score metrics. The types of algorithms we will analyze Linear Regression, Random Forest Regressor and Ridge Regression.

image

Results for Regression:


Case I:

The ratio of test and training data is 0.1 to 0.9

image

Case II:

The ratio of test and training data is 0.2 to 0.8

image

Case III:

The ratio of test and training data is 0.3 to 0.7

image

Case IV:

The ratio of test and training data is 0.4 to 0.6

image

Final Decision

Based on the results generated by these metrics, it appears that Case I (test and training data split ratio of 0.1:0.9) performs slightly better than the other cases for Logistic Regression.


Visualization

Mean Squared Error (MSE)

image

R2 Score

image


Classification

Through Accuracy, F1-score, Recall and Precision metrics. The types of algorithms we will analyze Logistic Regression, Decision Tree, Random Forest, SVM, KNN and Naive Bayes.

image

Results for Classification:


Case I:

The ratio of test and training data is 0.1 to 0.9

image

Case II:

The ratio of test and training data is 0.2 to 0.8

image

Case III:

The ratio of test and training data is 0.3 to 0.7

image

Case IV:

The ratio of test and training data is 0.4 to 0.6

image

Final Decision

Based on the results generated by these metrics, it appears that Case I (test and training data split ratio of 0.1:0.9) performs slightly better than the other cases for Logistic Classification.


Visualization

Accuracy

image

F1 Score

image

Precision

image

Recall

image

Phase III - Analysis and evaluation (Retraining) & Application of ML tools

Function for generating 15K+ lines random

image

We retrained the model for both regression and classification methods and obtained these results:

Regression

Through MSE and R2 Score metrics. The types of algorithms we will analyze Linear Regression, Random Forest Regressor and Ridge Regression.

IMAGE

Results for Regression:


Case I:

The ratio of test and training data is 0.1 to 0.9

image

Case II:

The ratio of test and training data is 0.2 to 0.8

image

Case III:

The ratio of test and training data is 0.3 to 0.7

image

Case IV:

The ratio of test and training data is 0.4 to 0.6

image

Visualization

Mean Squared Error by Model and Test Size

image

R-squared by Model and Test size

image

Classification

Through Accuracy, F1-score, Recall and Precision metrics. The types of algorithms we will analyze Logistic Regression, Decision Tree, Random Forest, SVM, KNN and Naive Bayes.

Results for Classification:


Case I:

The ratio of test and training data is 0.1 to 0.9

image

Case II:

The ratio of test and training data is 0.2 to 0.8

image

Case III:

The ratio of test and training data is 0.3 to 0.7

image

Case IV:

The ratio of test and training data is 0.4 to 0.6

image

Visualization

Accuracy by Test Size

image

F1 Score by Test Size

image

Precision by Test Size

image

Recall by Test Size

image

After all results - We decided to use Random Forest

After analyzing all the results, we observed that classification with more data performs better than regression. Consequently, we decided to use classification for prediction. Among the evaluated algorithms—Logistic Regression, Decision Tree, Random Forest, SVM, and Naive Bayes—Random Forest consistently demonstrated higher accuracy and reliability with larger datasets. Therefore, for our predictive tasks, we have selected Random Forest due to its superior performance and robustness.

Usage

To replicate this analysis or apply the methodology to similar datasets, ensure the following libraries are installed:

pip install pandas numpy scikit-learn

To initiate the model preparation and training process, execute:

python model_preparation.ipynb --> To see how model is preparing by doing preprocessing steps removing outliers by IQR method, impute missing value with median.
python trainingModelRegressionAlgorithms.ipynb --> To see how is performing Regression Model in a couple of algorithms to predict time excetion.
python trainingModelClassificationAlgorithms.ipynb --> To see how is performing Classification Model in a couple of algorithms to predict time excetion.
python retraining_classification.ipynb --> To see how is performing Classification Model in a couple of algorithms with up to 15K+ lines to predict time excetion.
python retraining_regression.ipynb --> To see how is performing Regression Model in a couple of algorithms with up to 15K+ lines to predict time excetion.