Link to the dataset - https://www.kaggle.com/grassknoted/asl-alphabet
The dataset was taken from kaggle.
The ASL data set is a collection of images of alphabets from the American Sign Language. It is divided into training and testing datasets.The training data set contains 87,000 images. There are 29 classes, of which 26 are for the letters A-Z and 3 classes for SPACE, DELETE and NOTHING. Each class has about 3000 images.The test dataset contains sample images for each alphabet.
In total there were 43,500 images. To read and pre-process images, CV2 was used. Since the data was large , training and testing would take a lot of time, therefore only a subset of 1500 images were used.
In this section , a data.csv file was created. It maps the image paths to target class. It is divided into two columns. The first is is image_path which will hold the image paths. The second column is the target column which indicates the class of images(0 to 28).
In this , a customized CNN model was created. It has four 2D convulational layers(self.conv1.. self.conv4) , 2 linear layers(self.fc1 & self.fc2) and a max pool layer(self.pool).In the forward function , max-pooling is applied to the activations of every convulational layer.
-
The CNN model is being trained on the pre-processed images.
-
Two functions are being used :
i) fit - for training the model on the train dataset
ii) validation - for checking the models performance
-
These functions compute and return the loss and accuracy of training and validation dataset on the model. On each epoch these parameters are appended to lists so, that they can be plotted and visualized. The accuracy and loss plots are plotted using matplotlib
-
The image that has to be tested is loaded using cv2 package,it is resized and preprocessed to match the format of the images in the test dataset
-
Writing the test code. Then a code was written to detect the sign language letters inside cam_test.py file for real-time webcam feed.
-
The image is then provided to the model and final predictions are made
- The model is predicts the alphabets correctly .
- Model Performance Summary
i)The final validation accuracy is 96.99 and train accuracy is 98.93
ii)The final validation loss is 0.0046 and the train loss is 0.0012.