Competitive Programming Platform for Students Project Synopsis

Introduction

Most of the major IT corporations are leveraging online coding competitions to judge the pressure handling and fundamentals of upcoming software engineers. This has led to a significant increase in the number of online judges and coding competitions. Most of the students are now confused, about which platform they should opt for and how to approach these coding contests on time, every time. This is where the Competitive Programming Platform comes into the picture.
A competitive Programming Platform is a collection of extensions, APIs, bots, and web apps aimed to simplify competitive programming. With this project, students can observe, compare, shortlist and outperform these online judges and compare the improvements and achievements with their peers in a healthy environment. Technologies that we’ll be using in this Competitive Programming Platform project will be Python, Javascript, Node, Flask, Selenium, VueJS, and Tailwind.

Objectives

The main objective is to create a platform on which students can easily select and prepare for online coding competitions in the best possible way.
The key objectives of the Competitive Programming Platform are:
1. Looking at all the competitive profiles at a glance.
2. Get updates about the latest programming contests.
3. Getting all the updates through an email newsletter and push notification.
4. Fetching global and local leadership.
5. VS code extension to speed up local development.
6. Chrome extension to view upcoming contests on the go.
7. Standalone REST API.

Methodology

In the first step, we will scrape the data from various resources using a crawler built in Python with Selenium. We will store this data in our database and create a pipeline with a cronjob every six hours.
Now we will deliver all the extracted data through our SPA using VueJS. We will use Workbox 6.0 to convert our SPA into a Progressive Web Application and natively support push notifications.

Web Scrapping: Web scraping is an automatic method to obtain large amounts of data from websites.
Cronjobs in recurrent pipelines: A cron job is normally used to schedule a job that is executed periodically. In our case, we use a cronjob to run our python script that will fetch, and extract unstructured HTML data, validate it and save it in our required database.
User interfaces: Building user-friendly interfaces that bring meaning to our extracted data and visualize it through various tables, charts, and graphs.

Work Flow

Facilities required

• Vue, Tailwind, ChartJS, Babel, GSAP, Node
• Flask, Postgre, Selenium, Python
• Git, GitHub, CodeQL, VS Code
• NGINX, PM2, Travis, Certbot

Expected Outcome

• Responsive, minimalistic user interface with a clutter-free user experience.
• Powerful REST API that can power other third-party applications.
• Healthy competitive environment with ’ friendly competition’ among peers and making competitive programming a constructive habit.

Social-Eyez an Social Media App Minor Project Synopsis

Introduction

A Social-Eyez is a social media app that facilitates the creation and sharing of information, ideas, interests, and other forms of expression through virtual communities and networks. In our project, we are working to build an app that caters to the above requirements. the app will contain many features like user authentication, content creation(uploading images), interactions in the form of the like button, comment button, etc. the app will be made using Flutter Framework. Among many apps build on this framework, Google pay is a prime example as it is built using this framework.

In the initial Phase, our target is to build the application for Android OS and then take it to another platform like IOS. this application is going to be a live project and our team will keep on integrating new features from time to time. We will start will basic structure and features and then scale it up according to available resources.

Rationale 

Social media is a part of our life now and there are only a few platforms that are dominating this space. Meta is the shark in the tank, it is the parent organization of Instagram, Facebook, and Whatsapp China has its own alternative apps like WeChat and Weibo. India is still heavily dependent on these apps. This Project is a humble effort to give Indians their own social media app and eventually realize the dream of Atmanirbhar Bharat apart from this our app will include many other features which are lagging in Instagram like:-

  1. SOS (Save Our Souls) feature. it can be triggered in the event of an emergency.
  2. Social and News combined.

Objectives 

Social-Eyez will try to give a platform for its user to interact and connect with each other. The main objectives of our app in the long run are:-

  1. Community Builder: One of the many reasons to use Social Media is that it acts as a Community Builder! 
  2. Exchange of Ideas: Social Media has been one of the most successful and popular ways of exchanging ideas! 80percent of people have their accounts on one of the many social media sites thus, making it more logical to validate my point! You can exchange innumerable ideas and can apply them too!
  3. Engagement of users: 
  4. Biggest Marketing Platform: Social Media is now one of the largest media where you can market all your product right from the needle to the largest machines! There’s nothing impossible now!
  5.  Monetization:. One of the major advantages to use social media is that they provide us with the facility to monetize.

Methodology

  •  User authentication. 
  •  Writing posts with image attachments. 
  •  Becoming followers of other users. 
  •  Reacting and commenting on posts. 
  •  Notification about the latest posts from followed users. 
  •  Searching for specific posts.

Facilities required for Social-Eyez

Technologies that we’ll be using in this project will be Flutter Framework, Dart Programming, Git, and Visual Studio.

the main development of the Application will be done using Flutter Framework. it is a slightly new framework backed by Google. Firebase will also be used. it is a Backend-as-a-Service (BaaS) app development platform that provides hosted backend services. Firebase supports Flutter.

Visual Studio will be used as a code editor and Git will be used for version control development.

Hardware requirement: 

1. Laptop

2. Smartphone

References 

  1. Shakleen Ishfar “Leaf: Flutter Social Media App” https://medium.com/@shakleenishfar/leaf-flutter-social-media-app-part-0-954ab180d476
  2. Stanislav Termosa, ”An Introduction to Flutter: The Basics ” https://www.freecodecamp.org/news/an- introduction-to-flutter-the-basics-9fe541fd39e2/
  3. Introduction to Dart Programming, 2 April 2018 https://dart.dev/tutorials

Food for Life PHP Minor Project Synopsis

Introduction

”FOOD FOR LIFE” is a food relief web project to serve food for the needy. Nowadays, One-third of the food produced is being wasted and about 9 percent of the people in this world go to bed on an empty stomach. In this Minor Project, various party palaces or hotels have to give information about their unused food on our website and we or other organisations have to collect the food and distribute the food to the needy.

Technologies to be used

 This Food for Life project will be a Web application to be developed in PHP having below technologies:

  • PHP
  • HTML
  • MYSQL
  • JavaScript

Technical feasibility

The technical needs of the Food for Life system may include:

Front-end and back-end selection

When we decided to develop the Food for Life project we went through an extensive study to determine the most suitable platform that suits the needs of the organization as well as helps in the development of the PHP & MySQL project.

Front-end selection:

    1. Scalability and extensibility
    2. Flexibility
    3. Robustness
    4. Platform independent
    5. Easy to debug and maintain

Back-end Selection:

1. Multiple user support.
2. Efficient data handling.
3. Provide inherent features for security.
4. Efficient data retrieval and maintenance.
5. Easy to install.
6. Various drivers must be available.
7. Easy to implant with the Front-end.

Objectives 

The main objectives of this Food for Life project are:

  • The main objective of our academic minor project is to reduce food wastage as many of us throw unused food in dustbins
  • The objective of this is to help the organization that serves food to the hungry people

Conclusion 

This ”Food For Life” project is to feed hungry people with the help of our system. Organizations such as hotels or restaurants have to publish the details of their food on our Food for Life website.

College Classroom Check and Fill Mini Project Synopsis

Introduction

The Classroom Check and Fill project is to prepare a website that tells the current status of a particular room. It tells whether a class is going on or empty or there is no class in that particular room. . It uses the technologies like PHP, python, java, MySQL, and many more. With the help of this website, a teacher or a student can know the status of the room and work accordingly.

Objectives

The aim of our project is to help teachers and students to check if in the room a class is going on if the room is empty or if there is no class in that particular room.

1. To help the HODs and teachers to check whether the venue is empty or not (in one click)
2. To help students to check their timetables with ease
3. Provides user-friendly application

Methodology/ Planning of work

Step 1: GATHERING RELEVANT INFORMATION
Our project is to help teachers and students to check if in the room a class is going on or the room is empty or there is no class in that particular room. We will take the relevant information from the CR of a particular class of IT and update the status of the room accordingly.

Step 2: PLANNING

Step 3: DESIGN LAYOUT

Basically in this step, we create the front-end part of our website with the help of languages like HTML, CSS, Bootstrap, and Javascript.

Step 4: DEVELOPMENT

Step 5: TESTING, REVIEW, AND LAUNCH

Step 6: MAINTENANCE AND UPDATION

Facilities required for proposed work

Hardware Requirements: Laptop – i3 processor or higher, 4 GB RAM or higher, 100 GB ROM or higher
Software Requirements: Laptop or PC, Windows 7 or higher, Visual Studio, HTML, CSS, Javascript, Mysql, Php

References

[1] Geekathon series(2013)[Online]. Available: http://www.GeeksforGeeks.com
[2] Jimmy Wales, Larry Sanger (2001)[Online]. Available: http://www.Wikipedia.com
[3] Refnes Data (1998)[Online]. Available: http://www.w3schools.com
[4] Steve Chen (2005) [Online]. Available: http://www.youtube.com

Student and Faculty based University Management System C++ Project

This Project Titled “Student and faculty based  University Management System” is developed mainly for the purpose of managing All College functions like :

  • ADMINISTRATION
  • STUDENT’S INFORMATION & PERFORMANCE
  • FACULTY DETAILS

Motivation

  • In today’s time, it is very difficult to maintain the records of thousands of students manually.
  • Moreover Finding each & every small detail related to Students and Faculties Of Different Departments in a University is not at all an easy task.
  • So we designed this system which makes the work of an administrator easier and faster.

SOFTWARES

  • Code: Blocks
  • Turbo C++

Flowcharts:

Finance Module Flow Chart:

Admin Module Flow Chart:

Student & Faculty Module Flow Chart:

Applications:

  • It can be implemented in each and every university in which access can be given to all the students and faculties.
  • This could help them to be updated with all the information regarding academics and fees/salaries etc.
  • Instead of finding receipts of fees or salaries, which may take hours, now it could be easily generated in a fraction of a second.

Challenges

  • It was difficult to merge different individual classes into a single large program as it undergoes inheritance and also variable scope gave us errors.
  • We also had some problems while formatting Marksheets and Certificates as it includes various types of ASCII characters.
  • We also had File Handling as our biggest Challenge. For eg., While getting information from the file we had many errors like the number of columns not matching in File and Program.
  • Along with that Modifying Information in files like Changing Fees after Paying Pending Fees was a difficult task for us.

Conclusion

  • We Learnt How To Make Real Life Applications with C++
  • Different concepts we learned are :
  • INHERITANCE (Single & Multiple Inheritance )
  • FILE MANAGEMENT
  • LOOPING STRUCTURE (FOR and WHILE)
  • CLASSES & OBJECTS
  • other Concepts Like SWITCH, GOTO, Simple IF…ELSE

Future Scope

  • We can add Attendance Criteria which would be helpful for Teachers and Students as well.
  • We can also add Academics Section, after which students can access their subject-related materials and can also submit Assignments.
  • In This Project, Quiz Option also can be added which can help students to improve their studies, and also it can be evaluated.

Download the complete Student and faculty-based University Management System C++Project Code.

Online Shopping Management System Java Console Application

This Project simulates the working of an online shopping portal where customers can buy products. Our Online Shopping Management System project is a purely console-based application and is implemented using the programming language JAVA.

This Java Console Application contains mainly two panels :

  1. Admin Panel – functions provided like managing products and customers
  2. Customer Panel – functions provided like buying products and making payments

A total of 8 class files have been created which are :

  • Database connection.java
  • Shop.java ( This is the main or the starting point of the project )
  • Admin.java
  • Customer.java
  • Products.java
  • Cart.java
  • Payment.java
  • Bills.java

Java Concepts used in the project are :

  • String manipulations
  • Collections framework in form of ArrayList
  • JDBC
  • Exception Handling
  • Inheritance
  • Classes and Objects
  • BufferedReader for taking entry

ROLE OF EACH MEMBER IN THE PROJECT

  • Designed class files – DatabaseConnection.java, Shop.java, Admin.java, and Customer.java and contributed to Debugging
  • Designed class file – products.java and contributed to Debugging
  • Designed class file – bills.java and contributed to Debugging
  • Designed class files – Payment.java and Cart.java and contributed to Debugging
  • Combination of class files in the end and for their joint working, Each member contributed equally.

DETAILS OF CLASS FILES

MAIN CLASS ( superclass ) = Shop.java

SUBCLASSES of Shop.java = admin.java and customer.java

Shop.java :

Main functions = registration of customer or admin, login into system Entry through buffered Reader

Array List used in login function to store id, password, and user type ( C for the customer, A for admin ) as a list

Database tables used are login info, admin info, and cast info Login info = storing used id, password, and type of user Admininfo = storing all details of admin except password Custinfo = storing all details of the customer except password setUID() function sets the admin ID to store in database setCUID() function sets the customer ID to store in the database.

Admin.java 

Functions include managing products (add,delete,view,search) by calling productsPage() function of products.java

Other functions include adding customers, removing customers, editing profiles, view registered customers.

For registering customers, since admin.java is subclass of Shop.java , registerCustomer() function of Shop is called by Shop.registerCustomer(), hence the small use of inheritance is here as the function need not be rewritten.

Customer.java :

Database table custinfo accessed for editing profile function

The main functions are viewing products, searching for products, adding and removing products from the cart, view the cart, and proceeding to the payment function.

Here first initializeProducts() function is called to store all product info in array lists, so that database need not be accessed everytime, hence Concept of collection framework is used here in form of ArrayList and through ArrayList functions .add(), .get(), .clear()

.add() = to add to ArrayList

.get(int i) = to get the element stored at index i in the ArrayList

Proceed to payment function calls payment.java class file and functions like add to cart, remove from the cart, and view cart call Cart.java. Calling is done via class objects like customerCart and p.

customerCart = object of Cart class p=object of Payment class

Customer.java is also the subclass of Shop.java where it calls the registerCustomer() function of the Shop.java through Classname.methodname like Shop.registerCustomer()

Products.java :

The main functions are added, removing, altering product info, viewing, and searching products setPid() function is used to set product id to store in the database

database table products are accessed to add, remove and alter product info

Cart.java :

This class file contains functions of the cart like add to cart, view cart, remove the product from the cart and cancel cart which is called from the customer.java class file via object.

Here add to cart function gets the product details to be added from customer.class via the constructor and adds them to the ArrayList so that the ArrayList can be used later on for displaying cart details and other functions as required.

Payment.java :

The main function of the payment class is to display bills and pay bills by calling bill.java, therefore it is an intermediate class between customer.java and bill.java and this class also stores payment details like bill and card details.

Bill.java :

Bill. class is called from the payment page through an object, this class contains details of a bill like a billing id, products purchased, and total amount. It also contains customer details whose bill it is. It stores the product details that are purchased in the ArrayList for easy access later on.

It contains functions like :

Generate bill = for calculating and storing the total amount in a variable Set bill id = for setting the bill id

Display bill = for displaying bill details

addtoDatabase = to add bill details to database table bills.

DatabaseConnection.java :

 The database connection is a class file that is used to establish a connection with the MySQL server and to create a database “Onlineshop” and five tables – login info, admin info, bills, products, and cast info. It takes the help of a flag variable to check whether the database schema exists or not and if exists, it only connects Java to MySQL.

It is imported into the class files like Shop.java and products.java where it is used to access the database and make connections.

Exceptions that are used in the project are :

  • IOException: This exception is used wherever BufferedReader has been used.
  • For handling the exceptions caused due to database like

ClassNotFoundException or SQL Exception, a try-catch block has been added.

  • For any other type of exception, sufficient try-catch blocks have been added.

Exception Handling features that are used in the project are :

Try-Catch block

Exception class functions like printStackTrace() have been used

throws keyword has been added in those methods wherever an exception is thrown and not handled by the method itself.

Snapshot of DatabaseConnection.java where Exception handling is used.

COLLECTION FRAMEWORK USED IN THE PROJECT

The concept of collections framework is used in the project through the implementation of ArrayList .

Use of ArrayList in the project :

Array List is used in the project to store the fetched results from the database for easy access later on. ArrayList is used in many class files in the project, like in customer.java , available product details are fetched from the database and stored in the ArrayList so that whenever the customer tries to access product info, it is fetched from the ArrayList and not from the database which reduces the complexity of code and saves time.

For using ArrayList Wrapper classes used are :

  • Integer for storing integers
  • Float for storing float values
  • String as a class for storing String values

DATABASE SCHEMA USED IN THE PROJECT

Here, the concept of JDBC comes into the picture and is implemented using MySQL and JAVA. The database is used in the project for storing information about admins, customers, products, bills, and login details.

Database details are as follows :

Name of the database = Onlineshop

The tables used in the project are :

  • Login Info Table
  • Admin Info Table
  • Cust Info Table
  • Products Table
  • Bills Table

How connectivity to MySQL was done :

 To connect to the MySQL server, we have used the JDBC concept and used the SQL Driver class to connect to the MySQL database. For using the SQL Driver class and other classes used for connection, java.sql.* is imported into the java class file.

Creation of the database and creation of tables were made in the DatabaseConnection.java class file and to access the database, later on, the code written was :

Class.forName(“com.mysql.jdbc.Driver);

Connection con=DriverManager.getConnection(String URL,username,password);

SOME SNAPSHOTS OF PROJECT OUTPUT

Main Page :

Customer section :

Some important points regarding the project :

  • The connector jar file should be added to the “Java project” using :
    JRE System Library > Build Path > Configure build path > Libraries > Add External JARs
  • For connecting to the database used in the project, enter the root password of MySql

Download the Complete Java Console Application Project on Online Shopping Management System

Audio Classification On Cat’s And Dog’s Python Project

Our Audio Classification project illustrates a straightforward audio classification model supported by deep learning. we tend to address the matter of classifying the sort of sound-supported short audio signals and their generated spectrograms, from classifying dog’s audio to cat’s audio throughout model training. So as to satisfy this challenge, we tend to use a model-supported Convolutional Neural Network (CNN). The audio was processed with Mel-frequency Cepstral Coefficients (MFCC) into what is unremarkably called Mel spectrograms, and hence, was reworked into a picture. Our final CNN model achieved 89% accuracy on the testing dataset.

Project Overview :

The input to our model, in this project, is cats and associated dogs recording audio go in WAV kind. It lies below the supervised machine learning class. Thus, a dataset is also present as well as a target class. Hence, the intention here is to classify if the given input wav file is that of a cat or dog. Each of the dog and cat sounds is incredibly distinguished like in their pitch and frequency level since completely different| sounds have different sample rates. By default, Librosa mixes all audio to mono and resamples them to 22050 cycles/second at load time. For music and audio analysis, Librosa is associated ASCII text file python package. The info and the sampling rate are provided by Librosa. Audio or sound is in its raw kind, and the data provided should be pre-processed to extract significant and meaningful features so we implemented an algorithm i.e., MFCC (Mel Frequency Cepstral Coefficients) rule. Then, when audio extraction is done, the information is fed and the dataset is split into training and test set. So, after the preprocessing, a Convolutional Neural Network model is designed using tensor flow. For every code and model building, Keras API was used to implement Google colab.

Motivation

Machine learning can be used in image processing, understanding speech, and musical instruments, speech-to-text, environmental sound classification, and many more. And as for our project, we implemented a class of speech processing i.e, audio classification. Converting sound waves into audio and spectrograms which is a visual representation of frequencies with the help of function provided by machine learning.

There are many techniques to classify images as many different in-built neural networks under CNN are already there, especially if it is related to images. And it’s straightforward to extract options from pictures as a result of pictures already being available in the shape of numbers, because the formation of a picture may be an assortment of pixels, and pixels area units within the sort of numbers. When we have data as text, we use the sequential encoder and decoder-based techniques to find features. But if it is to sound recognition or audio it is more difficult compared to text because it is based on frequency and time. Therefore a proper model is to be made to extract the frequency and pitch of that audio so as to make it easier to later recognize it.

Flow Chart:

Preliminaries and Background 

Related work

Machine learning: Image classification of cats and dogs – Before a decade, in computer notion, many problems had been saturating in accordance with their precision. However, the accuracy of those troubles significantly stepped forward with the boom of deep gaining knowledge of strategies. The majority of the problems that arise from image class is that it is defined as predicting the distinct categories a photo can belong to. Hence, for the supplied enter/ photograph detection with the aim of accomplishing high precision, a state-of-the-art approach is incorporated, i.e., a convolutional neural network turned into the build for the photo category mission of puppies and cats. A dataset become given from Kaggle comprising a total of 25000 pix of each dog and cat.

Machine learning: Audio classification of different bird species – Here, the methodology and results of using deep learning to assist in the classification of birds by their sounds are presented. As birds indicate the health of an ecosystem, hence this topic is of high importance. Random Forest Classification and custom-made six CNN models from the literature were performed on a dataset of ten birds that were composed of xeno-canto.org. The highest accuracy was achieved at around 65% by the Random Forest and at about 58% for the CNN model.

conclusion and future work 

In this report, we first briefly explained the overview of this project and showed some referred project work already established. Then, we precisely illustrated our task, including the learning task and the performance task. After that, we explained the approach we are heading toward in order to classify the datasets. The approach/model we used is a neural network which is an implementation of the deep network which is a trainable model by which we were able to classify the dog’s and cat’s audio. The highest accuracy we got was 89.6%.

  1. In the future, we will try to implement the different high-level models in order to achieve much higher
  2. We’ll build a system that can directly intake a live raw

College Placement Management System Java Full Stack Project

Developing a Placement Management System Java Full Stack Project site for recruiting eligible candidates who have already enrolled their names in the Placement Office in an engineering college and a completely interactive site for the students to enhance their technical and communication skills and to be easily placed in companies.

Modules involved in this Full Stack Project are below:

1. Creating GUI Interfaces for the Recruiter, Student, and Admin and handling the inner logic of those three actors of the system.

2. Generating Reports of Online marks, eligible students for writing online tests, selected students for an interview, and final selected students. Sending emails to selected students and uploading files.

3. Online Test module

4. Providing practice sets for students and outsiders who visit the website

5. Creating a  Discussion Forum for handling student queries and other information which helps students in preparing for placements.

6. Send an SMS to the final selected students.

Download the Complete Java Full Stack Project on Placement Management System Source Code.

Fake Disaster Tweet Detection Web-App Python Machine Learning Project

This project “Fake Disaster Tweet Detection” aims to help predict, whether a tweet weather it is fake or real. It uses the Multinomial Naïve Bayes approach for detecting fake or real tweets from existing datasets available on Kaggle. The classifier will be trained only on text data. Traditionally text analysis is performed using Natural Language Processing also known as NLP. Natural language processing is a field that comes under Artificial Intelligence. Its main focus is on letting computers understand human language and process it. NLP helps recognize and predict diseases using speech, it helps in sentiment analysis, cognitive assistant, spam detection, the healthcare industry, etc. In this project Training Data is pre-processed, then sent to the classifier, then and the classifier predicts weather the tweet is real or fake.

This project is made on Jupyter Notebook which is a part of Anaconda Navigator. This project ran successfully on Jupyter Notebook. The dataset was successfully loaded into the notebook. All the extra python packages which were required for project completion were also loaded into the notebook. The model is also deployed successfully using HTML, CSS, python, and flask.

The accuracy score on test data is 77.977%. average recall value is 0.775 and the average precision score is 0.775. Precision is used to calculate a number of correct positive predictions made by the model. The recall is used to calculate the number of correct positive predictions made out of all the positive predictions that could have been made.

System Design

System Flowchart

System Flowchart

Problem: To detect disaster tweets whether it’s fake or real using a machine learning algorithm. In this, the concept of Natural language Processing is used.

Identification of data: In this project, I have used a dataset available on Kaggle competition based on Natural language processing. This project works only on text data. It has five columns:

  1. Id: It tells the unique identification of each tweet
  2. Text: It tells the tweet in text form
  3. Location: It tells the place from where the tweet was sent and it can be blank
  4. Keyword: It tells a particular word in the tweet and it can be blank
  5. Target: It tells the actual value of the tweet weather it’s a real tweet or Fake

Data-preprocessing: First the preprocessing is done in the dataset which includes the removal of punctuations, then the removal of URLs, digits, non-alphabets, and contractions, then tokenization and removing Stopwords, and removing Unicode. Then lemmatization is done on the dataset. After preprocessing Countvectorizer is used to convert text data into numerical data as the classifier only works for numerical data. The dataset is then split into 70% training data and 30% test data.

Definition of Training Data: The training dataset which contains 70% of the whole dataset is used for training the model.

Algorithm Section: In this project Multinomial Naïve Bayes classifier algorithm is used for detecting disaster tweets whether they are fake or real.

Evaluation with test set: Several text samples are passed through the model to check whether the classification algorithm gives the correct result or not.

Prediction Model

Implementation Work Details

The data-set which is used in this project “Fake disaster tweet detection” is taken from the Kaggle competition “Natural Language Processing with Disaster Tweets”. The data set contains 7613 samples. This project works only on text data. It has five columns:

  • Id: It tells the unique identification of each tweet
  • Text: It tells the tweet in text form
  • Location: It tells the place from where the tweet was sent and it can be blank
  • Keyword: It tells a particular word in the tweet and it can be blank
  • Target: It tells the actual value of the tweet weather it’s a real tweet

Step 2: Data-Preprocessing

  1. Removing Punctuations: Punctuations are removed with the help of the following python code
  1. Removing URLs, digits, non-alphabets, _: True means it has HTTP, and False means it does not have HTTP
  1. Removing Contraction: It expands the words which are written in short form like can’t is expanded into cannot, I’ll is expanded into I will, etc.
  1. Lowercase the text, tokenize them, and remove Stopwords: Tokenizing means splitting the text into a list of tokens. Stopwords are the words in the text which does not provide additional meaning to the text.
  1. Lemmatizing: It converts any word into its root form like running, ran into a run.
  1. Countvectorizer:

Text cannot be used to train our model, it has to be converted into numbers that our computer can understand, so far in this project, Countvectorizer is used. Countvectorizer counts the number of times each word appears in a document. Countvectorizer works as:

Step1: It first identifies unique words in the complete dataset.

Step 2: then it will create an array of zeros for each sample of the same length as above Step 3: It then takes each word at a time and find its occurrence in each sample in the dataset. The number of times the word appears in the sample will replace the zero positioned at the word in the list. This will repeat for every word. 

Step 3: Model Used:

In this project, the Multinomial Naïve Bayes approach is used for detecting fake or real tweets from existing datasets available on Kaggle. Naïve Bayes classifier is based on the probability theorem “Bayes Theorem” and also has an assumption of conditional independence among every pair.

System Testing

This project is made on Jupyter Notebook which is a part of Anaconda Navigator. This project ran successfully on Jupyter Notebook. The dataset was successfully loaded into the notebook. All the extra python packages which were required for project completion were also loaded into the notebook. The model is also deployed successfully using HTML, CSS, python, and flask.

The machine learning model is evaluated we normally use classification accuracy which is the number of correct predictions divided by the total number of predictions.

This accuracy measuring technique works well when there is an equal number of samples in the dataset belonging to each class. The accuracy score on test data is 77.977%. average recall value is 0.775 and the average precision score is 0.775. Precision is used to calculate a number of correct positive predictions made by the model. The recall is used to calculate the number of correct positive predictions made out of all the positive predictions that could have been made.

  • Precision = True Positives / (True Positives + False Positives)
  • Recall = True Positives / (True Positives + False Negatives)

Conclusion

In this project only one classification algorithm is used which is Multinomial Naïve Bayes. First, the preprocessing is done in the dataset which includes the removal of punctuations, then removal of URLs, digits, non-alphabets, and contractions, then tokenization and removing Stopwords, and removing Unicode. Then lemmatization is done on the dataset. After preprocessing Countvectorizer is used to convert text data into numerical data as the classifier only works for numerical data. The dataset is then split into 70% training data and 30% test data. The accuracy score on test data is 77.977%. average recall value is 0.775 and the average f1 score is 0.775.

Future Scope

In the future, some other classification algorithms can also be tried on this dataset like KNN, Support vector machine (SVM), Logistic Regression, and even Deep learning algorithms can also be used which give very high accuracy. Vectorizing can be done using other methods like word2vec, Tf-Idf vectorizer, etc.

Download the Complete Project on ake Disaster Tweet Detection Web Application Python-based Machine Learning Project.

Covid-19 Outbreak Prediction Using Machine Learning Python Project

The aim of this Covid-19 Outbreak Prediction project is to make a model which will forecast the number of confirmed cases covid-19 virus in the upcoming days. Covid-19 is an infectious disease that is affecting a huge number of people all around the world.

This virus was first identified in Wuhan, China, and later spread throughout the world causing a pandemic that forced most countries to go into lockdown.

Various machine learning models and time series forecasting models.

The predictive model will be created using machine learning and using the dataset obtained from Kaggle. Machine learning automates the formation of analytical models. It is a branch of artificial intelligence focused on the principle that data can be learned from processes, It can find patterns and take decisions.

Time series forecasting will be used which is a type of predictive model. Time series forecasting is the use of a model centered on earlier observed values to evaluate future values. 

INTRODUCTION

The aim of this project is to make a predictive model which will predict the trajectory of the outbreak of the covid-19 virus in the upcoming days. Covid-19 is an infectious disease that is affecting a huge number of people all around the world.

It was first identified in Wuhan, China, and then later spread all over the world causing a pandemic.

Since no vaccine is developed which can be available all throughout the world, we have to take preventive measures which can stop the spread of the disease. Since a lockdown cannot last forever, we have to know how fast the spread is and how much more people will be infected.

The predictive model will be created using machine learning and using the dataset obtained from Kaggle. Machine learning automates the formation of analytical models. It is a branch of artificial intelligence focused on the principle that data can be learned from processes, It can find patterns and take decisions.

Time series forecasting will be used which is a type of predictive model. Time series forecasting is the use of a model centered on earlier observed values to evaluate future values.

PRESENT SYSTEM

Various work on this problem related to covid-19 is being done. Officials all over the world are using several outbreak prediction models for covid-19 to make informed decisions and implement relevant control measures. Simple statistical models have received greater attention from authorities among the standard models for covid-19 global pandemic prediction. One of the works suggests using SEIR models. SEIR means susceptible-exposed-infected-recovered model.

This model aims to forecast factors like the spread of a disease, the total number of infected, and the span of an outbreak, and estimate different epidemiological parameters like the number of reproductive. Such models can illustrate how the outcome of the disease can be affected by various public health measures.

PROPOSED SYSTEM 

In this project, we will first collect and evaluate the dataset. We will transform the raw data into an accessible format and visualize it using data preprocessing. Various machine learning algorithms such as Linear regression, polynomial regression, SVM, holt’s linear model, Holt’s winter model, AR model, ARIMA model, and SARIMA model are used. The tools used in this project are mainly sklean for model selection, and NumPy library which is used to work with the arrays and pandas that use a key data structure called a data frame that allows us to store and manipulate tabular data in observation rows and variable columns, matplotlib is a library of plotting that is used to plot graphs. After implementing the model, the model with the least mean square error will be considered the best-fit model.

System Design 

The dataset is first preprocessed and visualized so that it is in a usable format for analysis. After this, we model the data using Linear regression, polynomial regression, SVM, holt’s linear model, Holt’s winter model, AR model, ARIMA model, and SARIMA. Then we evaluate the model and choose the best one according to its root mean square.

The flowchart depicts the following

Dataset 

The dataset involves the collection of data from various sources.

Data Pre-processing and visualization 

In order to obtain accurate results, data preprocessing is done to check if there is any inconsistency in the data, if there is it is handled accordingly. We then visualize the data to study the pattern and trends in the data.

Model Building 

Various models are used in this project-: Linear Regression

Polynomial Regression SVM

Holt’s Linear

Holt’s Winter Model

Auto Regressive Model (AR)

Moving Average Model (MA) ARIMA Model

SARIMA Model

DATASET

In this project, the dataset is taken from Kaggle which is the Novel Corona Virus 2019 Dataset and the goal is to study the effect and spread of COVID-19 in the coming days, and conduct predictions and time series forecasting.

Hardware and Software Details 

  •  Software Details Python 3.7(64-bit) Jupyter notebook

Implementation work details  

First, the data is pre-processed and visualization is done and analyzed. Afterward, various models are used to train the data and the model with the least root mean squared error is selected as the best fit model. Various machine learning models are used and time series forecasting models such as holt’s linear model and ARIMA model are used. The dataset is obtained from Kaggle.

Real-life applications 

It can be used by the government for predicting the extent of the spread of the infectious disease and take action accordingly.

Data implementation and program execution 

The data is analyzed and visualized afterward. On different models, the data is trained and the one with the least mean square error is considered to be the best fit model and can be used for forecasting. The program is executed on a Jupyter notebook.

Output Screens 

Fig: Growth of different types of cases in India

Fig: Confirmed cases Linear Regression Prediction

Fig: Polynomial Regression Prediction for confirmed cases

Fig: SVM regressor Prediction for confirmed cases

Fig: Holts Linear Model Prediction for confirmed cases

Fig: Holt’s Winter model prediction for confirmed cases

Fig: AR model prediction for confirmed cases

Fig: SARIMA model Prediction

System Testing 

In this project, the model evaluation part is very important as by the means of it we can identify which model can best fit the problem.

Here the models are evaluated on the basis of their root mean square error(rmse).

The root-mean-square variance (RMSD) or root-mean-square error (RMSE) is a commonly used calculation of the differences expected by the model or estimator between values (sample or population values) and the values observed.

According to the rmse values of all the models tested in the project, the one with the least rmse value was the SARIMA model. So it can be considered the best fit model for this problem.

Conclusion

 It is concluded that machine learning models can be used to forecast the spread of infectious diseases like Covid-19. In the project, we used various algorithms to forecast the rise of confirmed cases. It was observed among all the algorithms used, SARIMA had the least rmse so it was considered the best fit model for the data that was available.

Limitations

It is a new virus so only a year worth of dataset is available. Generally, the more data we have the better accuracy we get and we have to keep updating the data.

Scope for future work

 It can be implemented such that it can update its graphs or predictions according to real-time values.

Download the Complete project on Covid-19 Outbreak Prediction Using Machine Learning Python Project Code & Report.