[INTELLIGENT MACHINES]

Face2Emoji(classifying facial emotions using pretrained networks).

Description

This app tries demonstrated how a pretrained convolutional network can be retrained for classifying facial emotions in real time and how to integrate it to one's application.

The main goal of this project is to translate human emotions into a digital format. A pretrained xception model was used in this project where the last dense layers were removed and replaced by a custom layer for our prediction task. The Xception model was used because of its lower model size as opposed to VGG net but far better acuracy



Why Xception:

I originally heard about the xception model in this video by sentdex where he develops an AI system for a self driving car in the game gta5. Right from thr start i was sold on the model due to 2 reasons:

  • Smaller model size than inception v3
  • Far better acuracy than inception and other models with almost the same number of parameters.



The Dataset:

I decided to do this project when i was building a gesture recognition system but wasnt willing to use a large video dataset (cuz not a powerful gpu :P). The dataset i used for this task was the ck dataset which consists of 486 sequences of images from 96 people which created a total of 1000 images in total belonging to 8 emotions in total :
Anger, Disgust, Fear, Happy, Sadness, Surprise, Neutral and Contempt.



Training process:

The model was trained in keras with data augmentation for zoom and horizontal flipping and vertical rotation. The model was trained for about 1000 epochs using sgd optimizer and learning rate decay. In the end the model was stopped at training acuracy of about : 0.89 and validation acuracy of about: 0.60 which can be explained by 2 factors :

  • shortage of total number of training and validation samples
  • redundancy in the data, since the data was a sequence of change of emotions on different subjects, the last 3 images for every emotion were considered to be belonging to that class and the first 2 images were considered to be of neutral category, so there wasn't a lot of variation in the images.

In addition to training i wrote a small script which generated real time detection using the webcam of the laptop and even though acuracy wasnt that high while training, the predictions were pretty acurate most of the time.
I thought of implementing a tensorflow js implementation for detection in the browser itself, but currently steered clear of that due to large size of the model (small, comparitively to other models) and not wanting a proper server to store and host the model just for one project.