MOODIFY – Suggestion of Songs on the basis of Facial Emotion Recognition Project

Modify is a song suggested that recommends the song to the user according to his mood. ‘Modify’ will do the job leaving the user to get carried away with the music.

I/We, student(s) of B.Tech, hereby declare that the project entitled “MOODIFY (Suggestion of Songs on the basis of Facial Emotion Recognition)” which is submitted to the Department of CSE in partial fulfillment of the requirement for the award of the degree of Bachelor of Technology in CSE. The Role of Team Mates involved in the project is listed below:

  • Training the model for facial emotion recognition.
  • Designing the algorithm for image segregation.
  • Algorithm designing for music player.
  • Graphical user interface designing.
  • Testing the model.
  • Collection of data for model and music player.
  • Preprocessing of the data and images.

Dataset:

The dataset we have used is “Cohn-Kanade”. 
This dataset is classified so we cannot provide the actual dataset but the link for you to download is :
http://www.consortium.ri.cmu.edu/ckagree/index.cgi
And to read more about the dataset you can refer to:
http://www.pitt.edu/~emotion/ck-spread.htm

Feature Extraction and Selection:

1. Lips
2. Eyes
3. Forehead
4. Nose

These features are processed by CNN layers and then selected by the algorithm and then they are converted to a NumPy array then the model is trained by that and the following three classifications are made.

How this project works:

  • First Open the Application, CHOOSE THE MODE IN WHICH YOU WANT TO LISTEN to THE SONG
  • Then it shows “YOUR MOOD, YOUR MUSIC”
  • Press “OKAY TO CAPTURE THE IMAGE”
  • After that press “c” to capture
  • You seem Happy please select your favorite genre
  • You seem Excited please select your favorite genre
  • You Seem Sad please select your favorite genre

CODE DESCRIPTION

  • All libraries are imported into this.
  • Model Initialization and building.
  • Training of test and testing.
  • Training our model
  • Model Building, Splitting of test and train set, and training of the model.
  • Saving a model.
  • Loading a saved model.
  • Saving image with OpenCV after cropping and loading it and then the prediction
  • Suggesting songs in Offline mode
  • Suggesting songs online(Youtube)
  • Rest of the GUI part
  • Variable Explorer

IPython Console

  • Importing Libraries
  • Model Training
  • Model Summary
  • Online Mode
  • Offline Mode

GUI

  • Splash Screen
  • Main Screen
  • Selection screen
  • Display songs and then select them, after that they will play

Summary

We successfully build a model for Facial Emotion Recognition(FER) and trained it with an average accuracy over various test sets of over 75%. Then we successfully build a Desktop application to suggest songs on the basis of their facial expression and hence completed our project. This FER model can be widely used for various purposes such as home automation, social media, E-commerce, etc and we have the motivation to take this project to a next level.

Download the complete Project code, report on MOODIFY – Suggestion of Songs on the basis of Facial Emotion Recognition Project

Audio Classification On Cat’s And Dog’s Python Project

Our Audio Classification project illustrates a straightforward audio classification model supported by deep learning. we tend to address the matter of classifying the sort of sound-supported short audio signals and their generated spectrograms, from classifying dog’s audio to cat’s audio throughout model training. So as to satisfy this challenge, we tend to use a model-supported Convolutional Neural Network (CNN). The audio was processed with Mel-frequency Cepstral Coefficients (MFCC) into what is unremarkably called Mel spectrograms, and hence, was reworked into a picture. Our final CNN model achieved 89% accuracy on the testing dataset.

Project Overview :

The input to our model, in this project, is cats and associated dogs recording audio go in WAV kind. It lies below the supervised machine learning class. Thus, a dataset is also present as well as a target class. Hence, the intention here is to classify if the given input wav file is that of a cat or dog. Each of the dog and cat sounds is incredibly distinguished like in their pitch and frequency level since completely different| sounds have different sample rates. By default, Librosa mixes all audio to mono and resamples them to 22050 cycles/second at load time. For music and audio analysis, Librosa is associated ASCII text file python package. The info and the sampling rate are provided by Librosa. Audio or sound is in its raw kind, and the data provided should be pre-processed to extract significant and meaningful features so we implemented an algorithm i.e., MFCC (Mel Frequency Cepstral Coefficients) rule. Then, when audio extraction is done, the information is fed and the dataset is split into training and test set. So, after the preprocessing, a Convolutional Neural Network model is designed using tensor flow. For every code and model building, Keras API was used to implement Google colab.

Motivation

Machine learning can be used in image processing, understanding speech, and musical instruments, speech-to-text, environmental sound classification, and many more. And as for our project, we implemented a class of speech processing i.e, audio classification. Converting sound waves into audio and spectrograms which is a visual representation of frequencies with the help of function provided by machine learning.

There are many techniques to classify images as many different in-built neural networks under CNN are already there, especially if it is related to images. And it’s straightforward to extract options from pictures as a result of pictures already being available in the shape of numbers, because the formation of a picture may be an assortment of pixels, and pixels area units within the sort of numbers. When we have data as text, we use the sequential encoder and decoder-based techniques to find features. But if it is to sound recognition or audio it is more difficult compared to text because it is based on frequency and time. Therefore a proper model is to be made to extract the frequency and pitch of that audio so as to make it easier to later recognize it.

Flow Chart:

Preliminaries and Background 

Related work

Machine learning: Image classification of cats and dogs – Before a decade, in computer notion, many problems had been saturating in accordance with their precision. However, the accuracy of those troubles significantly stepped forward with the boom of deep gaining knowledge of strategies. The majority of the problems that arise from image class is that it is defined as predicting the distinct categories a photo can belong to. Hence, for the supplied enter/ photograph detection with the aim of accomplishing high precision, a state-of-the-art approach is incorporated, i.e., a convolutional neural network turned into the build for the photo category mission of puppies and cats. A dataset become given from Kaggle comprising a total of 25000 pix of each dog and cat.

Machine learning: Audio classification of different bird species – Here, the methodology and results of using deep learning to assist in the classification of birds by their sounds are presented. As birds indicate the health of an ecosystem, hence this topic is of high importance. Random Forest Classification and custom-made six CNN models from the literature were performed on a dataset of ten birds that were composed of xeno-canto.org. The highest accuracy was achieved at around 65% by the Random Forest and at about 58% for the CNN model.

conclusion and future work 

In this report, we first briefly explained the overview of this project and showed some referred project work already established. Then, we precisely illustrated our task, including the learning task and the performance task. After that, we explained the approach we are heading toward in order to classify the datasets. The approach/model we used is a neural network which is an implementation of the deep network which is a trainable model by which we were able to classify the dog’s and cat’s audio. The highest accuracy we got was 89.6%.

  1. In the future, we will try to implement the different high-level models in order to achieve much higher
  2. We’ll build a system that can directly intake a live raw