Automatic Video Surveillance System AI & IOT Project

Surveillance is an integral part of security. The main objective of the Video Surveillance system IoT project is to build an effective system that can be used across different domains and technologies. The system is used to Detect Human intervention and breach in personal or commercial property of the user in real-time using AI and IoT.

It helps users to secure their property with the help of advanced artificial intelligence. The resulting system is fast and accurate, thus helping users with more secure surveillance systems.

For the most part, the job entails looking out for something undesirable to happen. The application is to have a system that provides real-time monitoring and alert security when a human is detected in a user’s property in their absence.

OBJECTIVES:

The main objective is to build an effective Video Surveillance System that can be used across different domains and technologies. The system is used to detect people trying to breach security in the personal or commercial property of the user in real time and send a message along with a short video clip to the user. 

PURPOSE OF EXISTING SYSTEM:

Currently, the existing Rocker Bogie Suspension Systems Project or surveillance robot for defense Surveillance systems can keep video recordings of homes, offices, banks, and so on. But that is useful only after an incident or robbery happens. No Real-time Updates are provided when there is a breach in real-time. 

Just imagine, You’re at your home and someone breaks security and stole money or goods from your office or property. Or consider you’re out of town for some days and there is a robbery at your home. So after you come back home or someone identifies it after some time and they will update you about the breach at your place.

You can take any action after a breach has been done, not at the time it is being done. That issue will be resolved in our system with real-time monitoring and updates.

SCOPE OF SYSTEM:

The Video Surveillance System can be implemented in any residential, Industrial, or commercial property. The system recommends detecting any human intervention on the user’s property and sends a notification along with a threshold of a 10-second video clip immediately as soon as it detects humans.

PROBLEM DEFINITION:

This Video Surveillance System project aims to develop an advanced Surveillance system that can keep on monitoring homes, offices, banks, and so on. With the help of this, you can find out if anyone breaches your security in your absence. We have to simply integrate our system into users existing surveillance systems. 

Module specification: 

  1. Raspberry pi
  2. Camera
  3. Server
  4. SNS
  5. S3  

Need Of Modules:

  • Raspberry pi as a Client to send frames to the server.
  • Camera to capture live video streams.
  • Server for processing frames and detecting humans.
  • SNS sends a multimedia message to a user when someone tries to breach security.
  • S3 to store a short video clip of the breach and send it to the user.

Non-Functional requirement.

EFFICIENCY REQUIREMENT :

When AI is taking care of your property then customers can relax and not have to worry about their security.

RELIABILITY REQUIREMENT :

 The system should provide a reliable environment for both the client and the server.

USABILITY REQUIREMENT :

The system is designed for a secure environment and ease of use.

IMPLEMENTATION REQUIREMENT :

Implementation of the system with pi, night vision camera, python, machine learning, and AI.

DELIVERY REQUIREMENT

The whole system is expected to be delivered in four months of time with a weekly evaluation by the project guide.

Limitations of the System:-

False Positives

Due to different light variant conditions and camera resolutions, sometimes the system detects humans as false when there is none but that can be neglected if there is a human and the system doesn’t detect it then there should be a problem.

Limited Processing Power

As we are using a microcontroller to send feeds to the server, it cannot handle multiple feeds at once and will be slower as the device increases.

Download Automatic Video Surveillance Management System Project Python Code, Documentation & report, Paper Presentation PPT

Decision Model for Prediction of Movie Success Rate Data Mining J Component Project

ABSTRACT

The purpose of this Movie Success Rate Prediction project is to predict the success of any upcoming movie using Data Mining Tools. For this purpose, we have proposed a method that will analyze the cast and crew of the movie to find the success rate of the film using existing knowledge. Many factors like the cast (actors, actresses, directors, producers), budget, worldwide gross, and language will be considered for the algorithm to train and test the data. Two algorithms will be tested on our dataset and their accuracy will be checked.

 LITERATURE REVIEW

  • They developed a model to find the success of upcoming movies based on certain factors. The number of audience plays a vital role in a movie becoming successful
  • The factorization Machines approach was used to predict movie success by predicting IMDb ratings for newly released movies by combining movie metadata with social media data
  • Using the grossattribute as a training element for the model. The data are converted into .csv files after the pre-processing is done
  • Using S-PLSA – the sentiment information from online reviews and tweets, we have used the ARSA model for predicting the sales performance of movies using sentiment information and past box office performance.
  • A mathematical Model is used to predict the success and failure of upcoming movies depending on certain criteria. Their work makes use of historical data in order to successfully predict the ratings of movies to be released
  • According to them, Twitter is a platform that can provide geographical as well as timely information, making it a perfect source for spatiotemporal models.
  • The data they collected was gathered from Box Office Mojo and Wikipedia. Their data was comprised of movies released in 2016
  • Initially having a dataset of 3183 movies, they removed movies whose budget could not be found or missed key features in the end a dataset of 755 movies were obtained. After Key feature extraction was completed.
  • some useful data mining on the IMDb data, and uncovered information that cannot be seen by browsing the regular web frontend to the database.
  • According to their conclusion, brand power, actors or directors isn’t strong enough to affect the box office.
  • Their neural network was able to obtain an accuracy of 36.9% and compromising mistakes made within one category an accuracy of a whopping 75.2%
  • They divided the movies into three classes rise, stay, and fall finding that support vector machine SMO can give up to 60% correct predictions
  • The data was taken from the Internet Movie Database or IMDb as the data source, the data they obtained was from the years 1945 to 2017.
  • A more accurate classifier is also well within the realm of possibility, and could even lead to an intelligent system capable of making suggestions for a movie in preproduction, such as a change to a particular director or actor, which would be likely to increase the rating of the resulting film.
  • In this study, we proposed a movie investor assurance system (MIAS) to aid movie investment decisions at the early stage of movie production. MIAS learns from freely available historical data derived from various sources and tries to predict movie success based on profitability.
  • The data they gathered from movie databases was cleaned, integrated, and transformed before the data mining techniques were applied.
  • They used feature, extraction techniques, and polarity scores to create a list of successful or unsuccessful movies. This was done by gathering the data using IMDb and YouTube.

PROBLEM STATEMENT

in this Movie Success Rate Prediction project, The method of using the ratings of the films by the cast and crew has been an innovative and original way to solve the dilemma of film producers. Film producers have often trouble casting successful actors and directors and still trying to keep a budget. Looking at the average ratings of each actor and director together with all the films they participated in should be able to give the producer a good idea of who to cast and who not to cast in a film that is to be out right now.

Implementation:

  • Data Preprocessing & Correlation Analysis
  • Application of Decision Tree Algorithm
  • Application of Random Forest Algorithm

RESULTS & CONCLUSION

After testing both the algorithms on the IMDb dataset i.e. Decision Tree and Random Forest algorithm, we found that the Random Forest algorithm got a better accuracy (99.6%) on the data rather than the decision tree algorithm in which we obtained just 60% accuracy.

Predict the Forest Fires Python Project using Machine Learning Techniques

Predict the Forest Fires Python Project using Machine Learning Techniques is a Summer Internship Report Submitted in partial fulfillment of the requirement for an undergraduate degree of  Bachelor of Technology In Computer Science Engineering. I submit this industrial training workshop entitled “PREDICT THE FOREST FIRES” to the University, Hyderabad in partial fulfillment of the requirements for the award of the degree of “Bachelor of Technology” in “Computer Science Engineering”. 

Apart from my effort, the success of this internship largely depends on the encouragement and guidance of many others. I take this opportunity to express my gratitude to the people who have helped me in the successful competition of this internship.

I would like to thank the respected faculties who helped me to make this internship a successful accomplishment.

I would also like to thank my friends who helped me to make my work more organized and well-stacked till the end.

OBJECTIVE OF THE PROJECT:

This is a regression problem with clear outliers which cannot be predicted using any reasonable method. A comparison of the three methods has been done :

(a) Random Forest Regressor,
(b) Neural Network,
(c) Linear Regression

The output ‘area’ was first transformed with an ln(x+1) function.

One regression metric was measured: RMSE and r2 score is obtained. An analysis of the regression error curve(REC) shows that the RFR model predicts more examples within a lower admitted error. In effect, the RFR model predicts better small fires, and the r2 score is obtained by using Linear Regression.

Best Algorithm for the project:

The best model is the Random Forest Regressor which has an RMSE value of 0.628 for which we are using GridSearchCV.

Scikit-learn has the functionality of trying a bunch of combinations and seeing what works best, built-in with GridSearchCV. The CV stands for cross-validation.

MODEL BUILDING

PREPROCESSING OF THE DATA:

Preprocessing of the data actually involves the following steps:

GETTING THE DATASET:

we can get the data from the client. we can get the data from the database.
https://archive.ics.uci.edu/ml/datasets/forest+fires

IMPORTING THE LIBRARIES:

We have to import the libraries as per the requirement of the algorithm.

IMPORTING THE DATA SET:

Pandas in python provide an interesting method read_csv(). The read_csv function reads the entire dataset from a comma-separated values file and we can assign it to a DataFrame to which all the operations can be performed. It helps us to access each and every row as well as columns and each and every value can be accessed using the data frame. Any missing value or NaN value has to be cleaned.

HANDLING MISSING VALUES:

OBSERVATION:

As we can see there are no missing values in the given dataset of forest fires

DATA VISUALIZATION:

  • scatterplots and distributions of numerical features to see how they may affect the output ‘area’
  • Boxplot of how categorical column day affects the outcome
  • Boxplot of how categorical column month affects the outcome

CATEGORICAL DATA:

  • Machine Learning models are based on equations, we need to replace the text with numbers. So that we can include the numbers in the equations.
  • Categorical Variables are of two types: Nominal and Ordinal
  • Nominal: The categories do not have any numeric ordering between them. They don’t have any ordered relationship between each of them. Examples: Male or Female, any color
  • Ordinal: The categories have a numerical ordering between them. Example: Graduate is less than Post Graduate, Post Graduate is less than Ph.D. customer satisfaction survey, high low medium
  • Categorical data can be handled by using dummy variables, which are also called indicator variables.
  • Handling categorical data using dummies: In the panda’s library, we have a method called get_dummies() which creates dummy variables for those categorical data in the form of 0’s and 1’s.
  • Once these dummies got created we have to concat this dummy set to our data frame or we can add that dummy set to the data frame.
  • Categorical data-column ‘month
  • dummy set for column ‘month’
  • Categorical column-‘day’
  • dummy set for column ‘day’
  • Concatenating dummy sets to a data frame
  • Getting dummies using label encoder from scikit learn package
  • We have a method called label encoder in scikit learn package. we need to import the label encoder method from scikitlearn package and after that, we have to fit and transform the data frame to make the categorical data into dummies.
  • If we use this method to get dummies then in place of categorical data we get the numerical values (0,1,2….)
  • importing label encoder and one hot encoder
  • Handling categorical data of column month
  • Handling categorical data of column day

TRAINING THE MODEL:

  • Splitting the data: after the preprocessing is done then the data is split into train and test set
  • In Machine Learning in order to access the performance of the classifier. You train the classifier using a ‘training set’ and then test the performance of your classifier on an unseen ‘test set’. An important point to note is that during training the classifier only uses the training set. The test set must not be used during the training of the classifier. The test set will only be available during the testing of the classifier.
  • training set – a subset to train a model. (Model learns patterns between Input and Output)
  • test set – a subset to test the trained model. (To test whether the model has correctly learned)
  • The amount or percentage of Splitting can be taken as specified (i.e. train data = 75%, test data =25% or train data = 80%, test data= 20%)
  • First we need to identify the input and output variables and we need to separate the input set and output set
  • In scikit learn library we have a package called model_selection in which the train_test_split method is available. we need to import this method
  • This method splits the input and output data to train and test based on the percentage specified by the user and assigns them to four different variables(we need to mention the variables)

 EVALUATING THE CASE STUDY:

Building the model (using splitting):

First, we have to retrieve the input and output sets from the given dataset

  • Retrieving the input columns
  • Retrieving output column

MODEL BUILDING:

  • Defining Regression Error Characteristic (REC)

Download the complete project Code, Report on Predict the Forest Fires using Project using Machine Learning Techniques

Audio Classification On Cat’s And Dog’s Python Project

Our Audio Classification project illustrates a straightforward audio classification model supported by deep learning. we tend to address the matter of classifying the sort of sound-supported short audio signals and their generated spectrograms, from classifying dog’s audio to cat’s audio throughout model training. So as to satisfy this challenge, we tend to use a model-supported Convolutional Neural Network (CNN). The audio was processed with Mel-frequency Cepstral Coefficients (MFCC) into what is unremarkably called Mel spectrograms, and hence, was reworked into a picture. Our final CNN model achieved 89% accuracy on the testing dataset.

Project Overview :

The input to our model, in this project, is cats and associated dogs recording audio go in WAV kind. It lies below the supervised machine learning class. Thus, a dataset is also present as well as a target class. Hence, the intention here is to classify if the given input wav file is that of a cat or dog. Each of the dog and cat sounds is incredibly distinguished like in their pitch and frequency level since completely different| sounds have different sample rates. By default, Librosa mixes all audio to mono and resamples them to 22050 cycles/second at load time. For music and audio analysis, Librosa is associated ASCII text file python package. The info and the sampling rate are provided by Librosa. Audio or sound is in its raw kind, and the data provided should be pre-processed to extract significant and meaningful features so we implemented an algorithm i.e., MFCC (Mel Frequency Cepstral Coefficients) rule. Then, when audio extraction is done, the information is fed and the dataset is split into training and test set. So, after the preprocessing, a Convolutional Neural Network model is designed using tensor flow. For every code and model building, Keras API was used to implement Google colab.

Motivation

Machine learning can be used in image processing, understanding speech, and musical instruments, speech-to-text, environmental sound classification, and many more. And as for our project, we implemented a class of speech processing i.e, audio classification. Converting sound waves into audio and spectrograms which is a visual representation of frequencies with the help of function provided by machine learning.

There are many techniques to classify images as many different in-built neural networks under CNN are already there, especially if it is related to images. And it’s straightforward to extract options from pictures as a result of pictures already being available in the shape of numbers, because the formation of a picture may be an assortment of pixels, and pixels area units within the sort of numbers. When we have data as text, we use the sequential encoder and decoder-based techniques to find features. But if it is to sound recognition or audio it is more difficult compared to text because it is based on frequency and time. Therefore a proper model is to be made to extract the frequency and pitch of that audio so as to make it easier to later recognize it.

Flow Chart:

Preliminaries and Background 

Related work

Machine learning: Image classification of cats and dogs – Before a decade, in computer notion, many problems had been saturating in accordance with their precision. However, the accuracy of those troubles significantly stepped forward with the boom of deep gaining knowledge of strategies. The majority of the problems that arise from image class is that it is defined as predicting the distinct categories a photo can belong to. Hence, for the supplied enter/ photograph detection with the aim of accomplishing high precision, a state-of-the-art approach is incorporated, i.e., a convolutional neural network turned into the build for the photo category mission of puppies and cats. A dataset become given from Kaggle comprising a total of 25000 pix of each dog and cat.

Machine learning: Audio classification of different bird species – Here, the methodology and results of using deep learning to assist in the classification of birds by their sounds are presented. As birds indicate the health of an ecosystem, hence this topic is of high importance. Random Forest Classification and custom-made six CNN models from the literature were performed on a dataset of ten birds that were composed of xeno-canto.org. The highest accuracy was achieved at around 65% by the Random Forest and at about 58% for the CNN model.

conclusion and future work 

In this report, we first briefly explained the overview of this project and showed some referred project work already established. Then, we precisely illustrated our task, including the learning task and the performance task. After that, we explained the approach we are heading toward in order to classify the datasets. The approach/model we used is a neural network which is an implementation of the deep network which is a trainable model by which we were able to classify the dog’s and cat’s audio. The highest accuracy we got was 89.6%.

  1. In the future, we will try to implement the different high-level models in order to achieve much higher
  2. We’ll build a system that can directly intake a live raw

Fake Disaster Tweet Detection Web-App Python Machine Learning Project

This project “Fake Disaster Tweet Detection” aims to help predict, whether a tweet weather it is fake or real. It uses the Multinomial Naïve Bayes approach for detecting fake or real tweets from existing datasets available on Kaggle. The classifier will be trained only on text data. Traditionally text analysis is performed using Natural Language Processing also known as NLP. Natural language processing is a field that comes under Artificial Intelligence. Its main focus is on letting computers understand human language and process it. NLP helps recognize and predict diseases using speech, it helps in sentiment analysis, cognitive assistant, spam detection, the healthcare industry, etc. In this project Training Data is pre-processed, then sent to the classifier, then and the classifier predicts weather the tweet is real or fake.

This project is made on Jupyter Notebook which is a part of Anaconda Navigator. This project ran successfully on Jupyter Notebook. The dataset was successfully loaded into the notebook. All the extra python packages which were required for project completion were also loaded into the notebook. The model is also deployed successfully using HTML, CSS, python, and flask.

The accuracy score on test data is 77.977%. average recall value is 0.775 and the average precision score is 0.775. Precision is used to calculate a number of correct positive predictions made by the model. The recall is used to calculate the number of correct positive predictions made out of all the positive predictions that could have been made.

System Design

System Flowchart

System Flowchart

Problem: To detect disaster tweets whether it’s fake or real using a machine learning algorithm. In this, the concept of Natural language Processing is used.

Identification of data: In this project, I have used a dataset available on Kaggle competition based on Natural language processing. This project works only on text data. It has five columns:

  1. Id: It tells the unique identification of each tweet
  2. Text: It tells the tweet in text form
  3. Location: It tells the place from where the tweet was sent and it can be blank
  4. Keyword: It tells a particular word in the tweet and it can be blank
  5. Target: It tells the actual value of the tweet weather it’s a real tweet or Fake

Data-preprocessing: First the preprocessing is done in the dataset which includes the removal of punctuations, then the removal of URLs, digits, non-alphabets, and contractions, then tokenization and removing Stopwords, and removing Unicode. Then lemmatization is done on the dataset. After preprocessing Countvectorizer is used to convert text data into numerical data as the classifier only works for numerical data. The dataset is then split into 70% training data and 30% test data.

Definition of Training Data: The training dataset which contains 70% of the whole dataset is used for training the model.

Algorithm Section: In this project Multinomial Naïve Bayes classifier algorithm is used for detecting disaster tweets whether they are fake or real.

Evaluation with test set: Several text samples are passed through the model to check whether the classification algorithm gives the correct result or not.

Prediction Model

Implementation Work Details

The data-set which is used in this project “Fake disaster tweet detection” is taken from the Kaggle competition “Natural Language Processing with Disaster Tweets”. The data set contains 7613 samples. This project works only on text data. It has five columns:

  • Id: It tells the unique identification of each tweet
  • Text: It tells the tweet in text form
  • Location: It tells the place from where the tweet was sent and it can be blank
  • Keyword: It tells a particular word in the tweet and it can be blank
  • Target: It tells the actual value of the tweet weather it’s a real tweet

Step 2: Data-Preprocessing

  1. Removing Punctuations: Punctuations are removed with the help of the following python code
  1. Removing URLs, digits, non-alphabets, _: True means it has HTTP, and False means it does not have HTTP
  1. Removing Contraction: It expands the words which are written in short form like can’t is expanded into cannot, I’ll is expanded into I will, etc.
  1. Lowercase the text, tokenize them, and remove Stopwords: Tokenizing means splitting the text into a list of tokens. Stopwords are the words in the text which does not provide additional meaning to the text.
  1. Lemmatizing: It converts any word into its root form like running, ran into a run.
  1. Countvectorizer:

Text cannot be used to train our model, it has to be converted into numbers that our computer can understand, so far in this project, Countvectorizer is used. Countvectorizer counts the number of times each word appears in a document. Countvectorizer works as:

Step1: It first identifies unique words in the complete dataset.

Step 2: then it will create an array of zeros for each sample of the same length as above Step 3: It then takes each word at a time and find its occurrence in each sample in the dataset. The number of times the word appears in the sample will replace the zero positioned at the word in the list. This will repeat for every word. 

Step 3: Model Used:

In this project, the Multinomial Naïve Bayes approach is used for detecting fake or real tweets from existing datasets available on Kaggle. Naïve Bayes classifier is based on the probability theorem “Bayes Theorem” and also has an assumption of conditional independence among every pair.

System Testing

This project is made on Jupyter Notebook which is a part of Anaconda Navigator. This project ran successfully on Jupyter Notebook. The dataset was successfully loaded into the notebook. All the extra python packages which were required for project completion were also loaded into the notebook. The model is also deployed successfully using HTML, CSS, python, and flask.

The machine learning model is evaluated we normally use classification accuracy which is the number of correct predictions divided by the total number of predictions.

This accuracy measuring technique works well when there is an equal number of samples in the dataset belonging to each class. The accuracy score on test data is 77.977%. average recall value is 0.775 and the average precision score is 0.775. Precision is used to calculate a number of correct positive predictions made by the model. The recall is used to calculate the number of correct positive predictions made out of all the positive predictions that could have been made.

  • Precision = True Positives / (True Positives + False Positives)
  • Recall = True Positives / (True Positives + False Negatives)

Conclusion

In this project only one classification algorithm is used which is Multinomial Naïve Bayes. First, the preprocessing is done in the dataset which includes the removal of punctuations, then removal of URLs, digits, non-alphabets, and contractions, then tokenization and removing Stopwords, and removing Unicode. Then lemmatization is done on the dataset. After preprocessing Countvectorizer is used to convert text data into numerical data as the classifier only works for numerical data. The dataset is then split into 70% training data and 30% test data. The accuracy score on test data is 77.977%. average recall value is 0.775 and the average f1 score is 0.775.

Future Scope

In the future, some other classification algorithms can also be tried on this dataset like KNN, Support vector machine (SVM), Logistic Regression, and even Deep learning algorithms can also be used which give very high accuracy. Vectorizing can be done using other methods like word2vec, Tf-Idf vectorizer, etc.

Download the Complete Project on ake Disaster Tweet Detection Web Application Python-based Machine Learning Project.

Body Fitness Prediction using Random Forest Classifier Project

Purpose of the Project

To avoid several health issues, we should monitor our body fitness by using various fitness prediction gadgets like smartwatches, oximeters, B-P machines, etc. we can monitor our B-P, calories burnt, bone weight, etc. the devices work with smart device technology to exchange data via Bluetooth communication protocol. Here, in this project, we import the data which consists of (date, step count, mood, calories burned, hours of sleep, bool of activity, and weight in kg) and split the dataset into the testing set and training set. We are using a random forest classifier in this project.

Existing problem

Body fitness prediction play’s a key role in leading a healthy life. Fitness is a state of health and well-being, more specifically the ability to perform daily activities body fitness is generally achieved through proper nutrition and physical exercise, and rest. By this, we are losing our body fitness and it leads to various chronic issues

Proposed solution

Importing Dataset

Exploratory Data Analysis ]: df. shape

Here, in this project, we import the data which consists of (date, step count, mood, calories burned, hours of sleep, bool of activity, and weight in kg) and split the dataset into the testing set and training set. We are using a random forest classifier in this project.

EXPERIMENTAL INVESTIGATIONS

Dataset:

We will use the body fitness prediction dataset which was retrieved from Kaggle.com.

  • Check if there are associations between physical activity (by counting steps), caloric expenditure, body weight hours of sleep, and the feeling of feeling active and/or inactive.
  • Compare caloric expenditure between the categories of mood and self-perceived activity (active and inactive)
  • Compare the hours of sleep between the categories of mood and self-perceived activity (active and inactive)
  • Compare body weight between categories of self-perceived activity (active and inactive)
  • Database The database has 96 observations, and 7 columns. Its quantitative variables are “number of steps” (step_count), “caloric expenditure” (calories_burned), “hours of sleep” (hours_of_sleep and “body weight” (weight_kg). And qualitative variables “dates” (date), “mood” “(mood), self-perceived activity” active or inactive “(bool_of_active). The variable” humor “was assigned the value” 300 “to mean” Happy “, the value” 200 “for” Neutral “and” 100 “for” sad “and for the variable” self-perceived activity
  • Contingency tables of categorical variables will be exposed.
  • A correlation matrix between variables will be presented
  • Bar charts and violins to demonstrate the distribution of quantitative variables by categories
  • Scatter plot for analysis of the possible linear relationship between two variables

RESULT

Output result

RANDOM FOREST CLASSIFIER

Random Forest Classifier

CORRELATION PLOT

Correlation Plot

FINAL RESULT:

Body fitness prediction Output

APPLICATIONS

There are so many different kinds of applications used to predict the fitness of Human beings today.

TRAINING AND TESTING:

Splitting the data:

We use sklearn. ensemble module train_test_split which is used for the training and testing part.

Dependent and Independent variables:

Independent variables contain a list of variables on which the bool of activity is dependent.

The dependent variable is the variable that is dependent on the other variable’s values.

Independent variables are mood,step_count, calories burned, hours of sleep,weightkg.

The dependent variables are bool_of_active.

MODEL BUILDING:

We use Random Forest Classifier for predicting Body Fitness Prediction. Because it gives an accurate prediction.

CONCLUSION

We have analyzed the Body fitness prediction Data and used Machine Learning to Predict the fitness of a human being. We have used a Random Forest classifier and its variations, to make predictions and compared their performance. xgboost regressor has the lowest RMSE and is a good choice for this problem.

Intelligent Customer Help Desk Python and Node-Red Project

Project Summary:

In this Intelligent Customer Help Desk project, we need to create a chatbot application that can answer the question(s) that falls outside the scope of the pre-determined question set.

This can be done using a chatbot that will use the intelligent document understanding feature of Watson Discovery. 

Project Requirements:

IBM Cloud, IBM Watson, Python, Node-Red.

Project Scope:

In this Python and Node-Red Project, we need to create a website first using HTML code. Next, we should create a chatbot with help of IBM Watson Assistant and Watson discovery.

Using Node-Red we need to build a web application that integrates all services and deploys the same on the IBM cloud.

This project will answer all queries of the user and if any question falls outside the scope of the predetermined question set then this project will use the Smart Document Understanding feature of Watson Discovery to train it on what text in the owner’s manual is important and what is not.

This will improve the answers returned from the queries.

AI-Powered News Articles Search Web Application using IBM Cloud and Slack Bot

Purpose

The purpose of this News Articles Search project is to develop a web application that fulfills our need to find the obvious and recent news articles and update them regularly. After the discovery service is integrated with Slack Workspace, it gives a bot as an intermediate to search news with a keyword. In addition, the web application also analysis the sentimental present in the news article and extracts keywords and concepts to make it an attractive and understandable format for the user to understand what is important and what is not.

Literature Survey

Existing Problem

News Article applications that are currently used are confusing the users, with multiple functions and an overflow of design, these applications still do not fulfill the demand of the news users and often get results from the past days, weeks, and months, which confuses the users only. Also, there is no way in these apps to know what the approximate feeling of the audience is regarding the article or news topic, which makes it less interactive and very low number of users.

Proposed Solution

Discovery service available in the IBM cloud, creating a web app to get the latest and obvious news results fast and user friendly. When integrated with Red Node Flow, the IBM Discovery Service can create a simple, engaging, organized user interface that provides users with relevant news articles as Discovery Service continuously crawls the web for the latest news to provide. By adding emotional analysis, we make the user interface more interactive, easier to understand, and attain more users.

Project Tasks

1. Creating and deploying the Watson discovery news app locally.
2. Integrating Slack-bot with Watson Discovery.
3. Creating node-red user Interface.
4. Integrating node-red UI with Watson Discovery.

Flow Chart:

Flow Chart

Experimental Investigation

First, we use the discovery service to configure and query adding our collection. A red node application is created in which the discovery is integrated and a simple flow of 5 nodes is created to enter the news topic and the results show related news. Slack then integrates with Watson’s discovery service so that news articles can be searched on more than one platform, and finally, sentiment analysis is performed on the data/news articles being searched.

Advantages and Disadvantages

1. The News Articles Search web application provides interactive sentiment analysis.
2. It can be accessed through more than one platform which is slack.
3. It collects and delivers the most recent data.
4. It does not have additional features like storing news history.
5. It does not provide a stand-alone app but rather uses a web application.

Applications

1. This News Articles Search web application can be used by any user in need of accurate and fast results.
2. Can be used by firms and organizations.
3. Can be used in the stock market to make predictions.

Bot on slack

Bot on slack

Conclusion

This News Articles Search project gives some basic working knowledge of the Watson Discovery Service and showed you how to use Discovery along with JavaScript and Node.js to build your own news mining web application. It also gives insight into real-world applications of AI and helps us understand Slack better.

Future Scope

1. The IBM Cloud and Slack Bot web application can be integrated with the cloud and made into a mobile app to use on it on-the-go.
2. Additional sentiments can be added to the UI.
3. Related and trending news topics can be shown to the user.

 

Development of Speech Recognition AI Project with Python

Methodology

Working on the Speech Recognition Python Project. Design and Development of Speech Recognition AI Project with Python Source code, report, and ppt using NLP, PLP, and Deep Neural Networks.

Speak– The assistant will speak the following introduction, the output, and the following things according to which good is given. It will use the laptop microphone to hear the input from the user and later recognize the voice said by the user and match the code words and if anything matches it will show the output.

Wish Me-The assistant will speak the Message included in the introduction even if it will wish the morning afternoon and even the evening depending upon the real-time based scenario. It will wish the morning from 04HH to 11HH 59MM. It will wish the afternoon from 12HH to 17HH 59MM. It will wish the evening from 18HH to 03HH 59MM.

Take Command– The assistant will take microphone(speech) input from the user and returns string output. It will be sub-divide into many different parts as described below. Listening-The assistant will open the microphone and try to hear what the user wants to convey to it.

Recognizing– The assistant will try to recognize the input spoken by the user and then check the code whether the word that is recognized by the assistant is there or not if the input matches it will show the output otherwise it will speak “Say that again please” this line which means to give the input again by the user. If the word is correctly recognized, it will follow the instructions assigned to it.

Wikipedia– If the word is recognized as “Wikipedia” it will search Wikipedia according to the input given by the user. E.g. if we say Narendra Modi Wikipedia so the assistant will speak “searching Wikipedia Narendra Modi” and then after it “According to Wikipedia…” and the details of that particular person. Youtube- If the word is recognized as “YouTube”, it will open the internet explorer and directly start opening the default web browser by the link “youtube.com”.

Google– If the word is recognized as “Google”, it will open the internet explorer and directly start opening the Google by the link “google.com”.

Train Information– If the word is recognized as “Train info”. It will fetch the detail from a CSV file and returns the detail of all the train and display them on the terminal. Stack Overflow- If the word is recognized as “Stack Over Flow” it will open the internet explorer and directly start opening the Stack Over Flow website by the link “stackoverflow.com”.

Play Music– If the word is recognized as “Play Music” it will search the .mp3 or .mp4 file in the default path of the device that is provided by the programmer in the programming. E.g. if we say Play Music so the assistant will search in the path like “D:\\Non Critical\\songs\\Favourite Songs2” and it will play that particular song. The Time- If the word is recognized as “The Time” it will check the real-time from the device and speak the same in terms of “HH:MM: SS”. E.g. if we say the time so the assistant will check the time and if the time is 08:14:21 P.M. it will speak “Sir, the time is 20HH:14MM:21SS”.

Open Code– If the word is recognized as “Open Code” it will search the .java or .py file in the default path of the device that is provided by the programmer in the programming. E.g. if we say Open Code so the assistant will search in the path like “C:\\Users\\XYZ\\AppData\\Local\\Programs\\project.py” and it will open the code. Stop- If the word is recognized as “Stop” it will speak “Quitting sir thanks for your time” and the code terminates.

Code-Snippet

Speech Recognition Project Coding

Algorithms used in Speech Recognition

NLP (Natural Language Processing) & Tokenization
PLP
Deep Neural Networks
Discrimination training
WFST Frameworks etc;

The following must be installed-:

1. sudo pip install SpeechRecognition.
2. Sudo apt-get installs python-pyaudio python3-pyaudio or pip install pyaudio.
This is the most important module in your project as it provides the main functionality in our project to convert speech into text.

Future Scope

This specific area of AI ends up being productive in each specialized field. We have additionally actualized this to show how it is valuable in various fields as we have made a little undertaking to exhibit its use in various documented, for example, railroad, looking through feed and so on; Like PCs began to play chess better than human, speech recognition before long will be improved by PCs as well. Critically, that will include some significant information about nature in general and the human mind specifically. So speech recognition is a significant advance in our investigation of natural laws. Our venture can be utilized by railroads and another center point to show distinctive data utilizing speech recognition.

Used Car Price Prediction AI / Machine Learning Project using Python

Abstract

Used Car price prediction using AI / Machine Learning techniques has picked researchers’ interest since it takes a significant amount of work and expertise on the part of the field expert. For a dependable and accurate forecast, a large number of unique attributes are analyzed. We employed 6 different machine learning approaches to develop a model for forecasting the price of used automobiles.

Problem statement

With the Coronavirus sway on the lookout, we have seen a lot of changes in the vehicle market. Presently some vehicles are sought after subsequently making them exorbitant and some are not popular and consequently less expensive. With the adjustment of the market due to the Coronavirus 19 effect, people/sellers are facing issues with their past Car Price valuation AI/Machine Learning models. Along these lines, they are searching for new AI models from new information. Here we are building the new car price, valuation model.

The primary point of this Used Car Price Prediction AI / Machine Learning Project is to create a dataset with the help of web scraping and anticipate the cost of a trade-in vehicle given different elements.

The objective of the Project:

1. Data Collection: To scrape the data of at least 5000 used cars from various websites like Olx, cardekho, cars24, auto portal, cartrade, etc.
2. Model Building: To build a supervised machine learning model for forecasting the value of a vehicle based on multiple attributes.

Motivation Behind the Project:

There are a few major worldwide multinational participants in the automobile sector, as well as several merchants. By trade, international companies are mostly manufacturers, although the retail industry includes both new and used automobile dealers. The used automobile market has seen a huge increase in value, resulting in a bigger percentage of the entire market. In India, about 3.4 million automobiles are sold each year on the secondhand car market.

Collecting the data

We have scraped the data for over 5000 cars using Selenium script from 4 different websites from different locations around the country. The websites are as followed:
1. OLX
2. Cars24
3. CarDekho
4. Autoportal

There are 9 columns:

1.’Brand & Model’: It gives us the brand of the car along with its model name and      manufacturing year

2.’Varient’: It gives us a variety of particular car model

3.’Fuel Type’: It gives us the type of fuel used by the car

4.’Driven Kilometers’: It gives us the total distance in km covered by car

5.’Transmission’: It tells us whether the gear transmission is Manual or Automatic

6.’Owner’: It tells us the total number of owners cars had previously

7.’Location’: It gives us the location of the car

8.’Date of Posting Ad’: It tells us when the advertisement for selling that car was posted online

9.’Price (in ₹)’: It gives us the price of the car.

Here ‘Price (in ₹)’ is our target variable.

Reading the dataset

Now we read the dataset into Pandas and since the target column ‘Price’ is of integer data type, we will apply regression algorithms to it.

Data Cleaning

We check for null values and find that there are few in column ‘Variant’ and we will treat them with Mode.
Since all the features are categorical hence we need not check for outliers and skewness.
Exploratory data analysis
Firstly, we will plot the boxplot and distribution plot for the target variable. And find that few outliers need not be treated and the data is tightly distributed with an almost normalized distribution.

Bar graph

Since Brands, Varients, Driven Kilometers & locations have a wide range of values in them, we will not perform bivariate analysis for them as they will not give us any specific details. Now by plotting the graph of Fuel Type, Transmission, and Owner against Price, we conclude that a Car that uses Diesel has automatic Transmission, and Has only 1 owner is more likely to have a high price.

Model building

The models used in training and testing datasets are as followed:

SVR
Linear Regression
SGD Regressor
neighbors Regressor
Decision Tree Regressor
Random Forest Regressor
Only Decision Tree Regressor and Random Forest Regressor are performing well and giving an accuracy of 80.2 % and 87.7%, respectively.

Final model

The accuracy of Model ‘PriceCar’ (Random Forest Regressor) after applying Hyper Tuned Parameters is found to be 87.79% and the score is 0.98 which is quite good.

Conclusion

Here, we can see that all the predicted prices are either equal or nearly equal to the original prices of the car. Hence we conclude that our model ‘price car’ is working very well. And we shall save it for further use.

Limitations of this work and Scope for Future Work

As a part of future work, we aim at the variable choices over the algorithms that were used in the project. We could only explore two algorithms whereas many other algorithms exist and might be more accurate. More specifications will be added to a system or provide more accuracy in terms of price in the system i.e.
1) Horsepower
2) Battery power
3) Suspension
4) Cylinder
5) Torque

As we know technologies are improving day by day and there is also advancement in-car technology, so our next upgrade will include hybrid cars, electric cars, and Driverless cars.

Download Used Car Price Prediction AI / Machine Learning Project using Python. For more details about the project feel free to contact the developer at github