Used Car Price Prediction AI / Machine Learning Project using Python

Abstract

Used Car price prediction using AI / Machine Learning techniques has picked researchers’ interest since it takes a significant amount of work and expertise on the part of the field expert. For a dependable and accurate forecast, a large number of unique attributes are analyzed. We employed 6 different machine learning approaches to develop a model for forecasting the price of used automobiles.

Problem statement

With the Coronavirus sway on the lookout, we have seen a lot of changes in the vehicle market. Presently some vehicles are sought after subsequently making them exorbitant and some are not popular and consequently less expensive. With the adjustment of the market due to the Coronavirus 19 effect, people/sellers are facing issues with their past Car Price valuation AI/Machine Learning models. Along these lines, they are searching for new AI models from new information. Here we are building the new car price, valuation model.

The primary point of this Used Car Price Prediction AI / Machine Learning Project is to create a dataset with the help of web scraping and anticipate the cost of a trade-in vehicle given different elements.

The objective of the Project:

1. Data Collection: To scrape the data of at least 5000 used cars from various websites like Olx, cardekho, cars24, auto portal, cartrade, etc.
2. Model Building: To build a supervised machine learning model for forecasting the value of a vehicle based on multiple attributes.

Motivation Behind the Project:

There are a few major worldwide multinational participants in the automobile sector, as well as several merchants. By trade, international companies are mostly manufacturers, although the retail industry includes both new and used automobile dealers. The used automobile market has seen a huge increase in value, resulting in a bigger percentage of the entire market. In India, about 3.4 million automobiles are sold each year on the secondhand car market.

Collecting the data

We have scraped the data for over 5000 cars using Selenium script from 4 different websites from different locations around the country. The websites are as followed:
1. OLX
2. Cars24
3. CarDekho
4. Autoportal

There are 9 columns:

1.’Brand & Model’: It gives us the brand of the car along with its model name and      manufacturing year

2.’Varient’: It gives us a variety of particular car model

3.’Fuel Type’: It gives us the type of fuel used by the car

4.’Driven Kilometers’: It gives us the total distance in km covered by car

5.’Transmission’: It tells us whether the gear transmission is Manual or Automatic

6.’Owner’: It tells us the total number of owners cars had previously

7.’Location’: It gives us the location of the car

8.’Date of Posting Ad’: It tells us when the advertisement for selling that car was posted online

9.’Price (in ₹)’: It gives us the price of the car.

Here ‘Price (in ₹)’ is our target variable.

Reading the dataset

Now we read the dataset into Pandas and since the target column ‘Price’ is of integer data type, we will apply regression algorithms to it.

Data Cleaning

We check for null values and find that there are few in column ‘Variant’ and we will treat them with Mode.
Since all the features are categorical hence we need not check for outliers and skewness.
Exploratory data analysis
Firstly, we will plot the boxplot and distribution plot for the target variable. And find that few outliers need not be treated and the data is tightly distributed with an almost normalized distribution.

Bar graph

Since Brands, Varients, Driven Kilometers & locations have a wide range of values in them, we will not perform bivariate analysis for them as they will not give us any specific details. Now by plotting the graph of Fuel Type, Transmission, and Owner against Price, we conclude that a Car that uses Diesel has automatic Transmission, and Has only 1 owner is more likely to have a high price.

Model building

The models used in training and testing datasets are as followed:

SVR
Linear Regression
SGD Regressor
neighbors Regressor
Decision Tree Regressor
Random Forest Regressor
Only Decision Tree Regressor and Random Forest Regressor are performing well and giving an accuracy of 80.2 % and 87.7%, respectively.

Final model

The accuracy of Model ‘PriceCar’ (Random Forest Regressor) after applying Hyper Tuned Parameters is found to be 87.79% and the score is 0.98 which is quite good.

Conclusion

Here, we can see that all the predicted prices are either equal or nearly equal to the original prices of the car. Hence we conclude that our model ‘price car’ is working very well. And we shall save it for further use.

Limitations of this work and Scope for Future Work

As a part of future work, we aim at the variable choices over the algorithms that were used in the project. We could only explore two algorithms whereas many other algorithms exist and might be more accurate. More specifications will be added to a system or provide more accuracy in terms of price in the system i.e.
1) Horsepower
2) Battery power
3) Suspension
4) Cylinder
5) Torque

As we know technologies are improving day by day and there is also advancement in-car technology, so our next upgrade will include hybrid cars, electric cars, and Driverless cars.

Download Used Car Price Prediction AI / Machine Learning Project using Python. For more details about the project feel free to contact the developer at github

Detecting Impersonators in Examination Centres using AI

 

Detecting impersonators in examination halls is important to provide a better way of examination handling system which can help in reducing malpractices happening in examination centers.  According to the latest news reports, 56 JEE candidates who are potential impersonators were detected by a national testing agency. In order to solve this problem, an effective method is required with less manpower.

With the advancement of machine learning and AI technology, it is easy to solve this problem. In this project we are developing an AI system where images of students are collected with names and hall ticket numbers are pre-trained using the KDTree algorithm and the model is saved. Whenever a student enters the classroom, the student should look at the camera and enter class, after the given time or class is filled the student’s information will store in a  video file with the student’s name and hall ticket no. The video will have a user with a hall ticket no and name on each face. If the admin finds any unknown user tag on the face admin can recheck and trace impersonators. 

Problem statement:

Detecting impersonators in examination halls is important to provide a better way of examination handling system which can help in reducing malpractices happening in examination centers.  According to the latest news reports, 56 JEE candidates who are potential impersonators were detected by a national testing agency.

Existing system:

Information given in the hall ticket is used as verification to check if the student is the impersonator or not.  Manual security checks performed are not perfect and sometimes students can even change images from the hall ticket.    

Advantages:

Manual verification methods are used for checking personally for each student which is not possible to check each student personally.

Chances of changing images from hall tickets are possible which doesn’t have a verification method.

Proposed system:

  • In the proposed system initially, images of each student are collected and each dataset consists of 50 images of each student. These images are trained using kdtree algorithm using the image processing technique and the model is saved in the system this model can be used for automatic prediction of students in exam halls from live video or images. 

Advantages:

  • The student verification process is fast and accurate with the least effort. Reduces impersonator’s issue with live verification.
  • The time taken for prediction and processing is less and prediction is done automatically using a trained model.
  • A trained model can be used to track live video and automates the process of detecting students at exam centers and display them in the video.  

SOFTWARE REQUIREMENT: 

  •  Operating system:           Windows XP/7/10
  • Coding Language:           python

  • Development Kit             anaconda

  • Library:     Keras, OpenCV

  • Dataset:   any student’s dataset

Movie Character Recognition From Video And Images Project

Live tracking of characters from movies is important for automating the process of classification for user-friendly information management systems like online platforms where characters in a movie can be seen before watching the movie. At present manual method is used which can be automated using this movie character classification method. The objective of this work is to collect a dataset of any movie characters and train a model which captures the facial features of all characters and the model is saved for prediction. 

For testing purposes, a real-time live video can be used to track characters. This application also works for images where users can give input as images of trained movie characters and get results with character names on the image as output. In this project for training dataset KDTree, the algorithm is used which takes images from a given folder and trains each image and saves the model into a dump file in the system. In the second stage using this trained model input image or input video is predicted with the model and the result is shown as a video or image.

Problem statement:

Classification of characters for each movie manually is a time taking process and the database should be managed.

Objective:

The objective of this project is to develop an automatic classification of characters after training from the dataset. If the one-time model is created it can be used for prediction at any time from images or video

Existing system:

In the existing system movie characters are managed in the database and which are used for displaying when required in this process database is the important to the time taken for processing is more.

Disadvantages:

  • The time taken for processing is more and the database should be managed and integrated with the required system whenever required.
  • This method includes the manual process of data collection and updating and deleting data. 

Proposed system:

In the proposed system initially, a dataset of respected move characters is collected and each dataset consists of 50 images. These images are trained using the KDDTree algorithm using the image processing technique and the model is saved in the system this model can be used for the automatic prediction of characters from live video or images.

Advantages:

  • The time taken for prediction and processing is less and prediction is done automatically using a trained model.
  • A trained model can be used to track live video and automates the process of detecting characters and displays on screens.

SOFTWARE REQUIREMENTS:

 Operating system:           Windows XP/7/10

  • Coding Language:  Python
  • Development Kit: Anaconda
  • Library:   TensorFlow, Keras, OpenCV
  • Dataset:  Any movie dataset

Canteen Automation System using NLTK and Machine Learning

The canteen automation system project is designed to select the food items from a web application with cost, time of cooking, and give rating for products. This application is designed to help students to order food items without giving orders to waiters or going to the counter and giving orders. Most of the colleges don’t have order-taking system students should directly reach the counter and give an order which is time taking process in order to solve this problem this online order-booking system is designed.

As there will be many students who will be giving orders from different departments as a web application is designed with multiple admins, each department will have one admin who will take request and process request. Another problem is best food from today’s canteen menu can be known by checking ratings given by other users based on that students can give orders. Students can also give reviews for each food item along with ratings. NLT is used to calculate the sentiment of each review by taking the yelp dataset and applying machine learning and NLTK to calculate sentiment and store it in the database.

Proposed system:

  • In the proposed system food ordering is done online and each department has its own admin who handles requests on daily basis, users can give a rating of food items which will help other students to select the food item from the list. Sentiment analysis using Yelp data set and NLTK and Machine learning are used to store the sentiment of each review given by the student.

Advantages:

  • Helps students to give orders from any location inside the campus and save time by reaching the canteen based on the given cooking time from the application.
  • Sentiment analysis is done for reviews using NLTK and Machine Learning. Sentiment and Rating are useful for students to select food items.

SOFTWARE REQUIREMENTS:

 Operating system:  Windows XP/7/10

  • Coding Language:  Html, JavaScript,  
  • Development Kit:  Flask Framework
  • Database:  MySQL
  • Dataset:  YELP
  • IDE:  Anaconda prompt

Stress Detection from Sensor Data using Machine Learning

Stress is commonly defined as a feeling of strain and pressure which occurs from any event or thought that makes you feel frustrated, angry, or nervous. In the present situation, many people have succumbed to stress especially the adolescent and the working people. Stress increase nowadays leads to many problems like depression, suicide, heart attack, and stroke. The current technology, using Galvanic skin response (GSR), Heart rate variability (HRV), and Skin temperature are being used individually to detect stress.

In this project data set is created using five features age, gender, body temperature, heartbeat, and blood pressure, and four stages of labels are used for detecting the level of stress.  A decision tree algorithm is used to train the data set and create a model and use the Flask framework to take input data and predict the stress level of the user. 

EXISTING SYSTEM:

 Existing systems were designed to detect stress by taking tweets as input from the Twitter or Facebook data set and machine learning algorithms are applied to detect stress from tweets.

Disadvantages:

  • Most of the existing system works were on social networking stress data not on body-based sensor data.
  • Stress level is calculated based on tweets posted by users.

PROPOSED SYSTEM:

The proposed system is designed by collecting data from sensors and preparing data set on three features (temperature, heartbeat, age, male or female). Using this data set machine learning Decision tree algorithm is applied using and the model is saved. Front end web application is designed to collect new user features and passed them to the model to predict stress stages which are divided into 4 stages.

Advantages: 

  • Data is collected from real-time sensors and a data set is created for different ages and male and female users.
  • Data is trained using machine learning which helps automate the process of stress detection.
  • The web applications can help users to easily check their stress state based on their features.

Data collection:

  • In this state data is collected from real-time sensors and stored in an excel sheet with five features age, gender, temperature, heartbeat, and this data is applied for machine learning, and a model is created.

Data preprocessing:

  • Features are extracted from the data set and stored in the variable as train variable and labels are stored in y train variable. Data is preprocessing by standard scalar function and new features and labels are generated. 

Testing training:

  • In this stage, data is sent to the testing and training function and divided into four parts x test train, and y test train. Train variables are used for passing to the algorithm whereas tests are used for calculating the accuracy of the algorithm. 

Initializing Decision tree Algorithm:

  • In this stage, the decision tree algorithm is initialized and train values are given to the algorithm by this information algorithm will know what are features and label. Then data is modeled and stored as a pickle file in the system which can be used for prediction. 

Predict data:

  • In this stage, new data is taken as input and trained models are loaded using pickle and then values are preprocessed and passed to predict function to find out a result which is shown on the web application.

SOFTWARE REQUIREMENTS:

 Operating system:           Windows XP/7/10

  • Coding Language:           Html, JavaScript,  
  • Development Kit:        Flask Framework
  • IDE:           Anaconda prompt
  • Dataset:          Stress dataset

COVID-19 Data Analysis And Cases Prediction Using CNN Project

ABSTRACT:

Coronavirus ( COVID-19 ) is creating panic all over the world with fast-growing cases. There are various datasets available that provide information on the world affected information. Covid has affected all counties with a large number of cases with a variety of numbers under death, survived, and affected. In this project, we are using a data set that has county-wise details of cases with various combined features and labels.

Covid data analysis and case prediction project provide solutions for data analysis of various counties on various time and data factors and creating models for survival and death cases and prediction cases in the future. Machine learning provides deep learning methods like Convolution neural network which is used for model creation and prediction for the next few months done using this project.  

PROBLEM STATEMENT:

With the increase of COVID-19 cases all over the world daily predictions and analysis are required for effective control of pandemic all over the world

OBJECTIVE:

By collecting data from Kaggle and new York datasets data preprocessing is performed and data analysis is performed on the dataset and a machine learning model is generated for future prediction of cases.

EXISTING SYSTEM:

  • The prediction was performed on COVID-19 cases based on different machine learning techniques which are based on an x-ray data set collected from COVID-19 patients.
  • Disease prediction from x-ray images is done using deep-learning techniques.

Disadvantages:

  • The data set used for predicting disease is different compared to the one we are using for this project.
  • Image processing techniques are used.

PROPOSED SYSTEM:

Using the data set pre-processing is performed on the collected data set and various steps for the deep learning model are performed and prediction of cases is done then data analysis is done on various factors.

Advantages:

  • Data analysis and prediction are performed on textual data
  • Deep learning models are generated for predicting future cases.
  • Data analysis is performed for various factors.

SOFTWARE REQUIREMENTS

  • Operating system:  Windows XP/7/10
  • Coding Language:  python
  • Development Kit:  anaconda 
  • Programming language: Python
  • IDE : Anaconda prompt