Competitive Programming Platform for Students Project Synopsis

Introduction

Most of the major IT corporations are leveraging online coding competitions to judge the pressure handling and fundamentals of upcoming software engineers. This has led to a significant increase in the number of online judges and coding competitions. Most of the students are now confused, about which platform they should opt for and how to approach these coding contests on time, every time. This is where the Competitive Programming Platform comes into the picture.
A competitive Programming Platform is a collection of extensions, APIs, bots, and web apps aimed to simplify competitive programming. With this project, students can observe, compare, shortlist and outperform these online judges and compare the improvements and achievements with their peers in a healthy environment. Technologies that we’ll be using in this Competitive Programming Platform project will be Python, Javascript, Node, Flask, Selenium, VueJS, and Tailwind.

Objectives

The main objective is to create a platform on which students can easily select and prepare for online coding competitions in the best possible way.
The key objectives of the Competitive Programming Platform are:
1. Looking at all the competitive profiles at a glance.
2. Get updates about the latest programming contests.
3. Getting all the updates through an email newsletter and push notification.
4. Fetching global and local leadership.
5. VS code extension to speed up local development.
6. Chrome extension to view upcoming contests on the go.
7. Standalone REST API.

Methodology

In the first step, we will scrape the data from various resources using a crawler built in Python with Selenium. We will store this data in our database and create a pipeline with a cronjob every six hours.
Now we will deliver all the extracted data through our SPA using VueJS. We will use Workbox 6.0 to convert our SPA into a Progressive Web Application and natively support push notifications.

Web Scrapping: Web scraping is an automatic method to obtain large amounts of data from websites.
Cronjobs in recurrent pipelines: A cron job is normally used to schedule a job that is executed periodically. In our case, we use a cronjob to run our python script that will fetch, and extract unstructured HTML data, validate it and save it in our required database.
User interfaces: Building user-friendly interfaces that bring meaning to our extracted data and visualize it through various tables, charts, and graphs.

Work Flow

Facilities required

• Vue, Tailwind, ChartJS, Babel, GSAP, Node
• Flask, Postgre, Selenium, Python
• Git, GitHub, CodeQL, VS Code
• NGINX, PM2, Travis, Certbot

Expected Outcome

• Responsive, minimalistic user interface with a clutter-free user experience.
• Powerful REST API that can power other third-party applications.
• Healthy competitive environment with ’ friendly competition’ among peers and making competitive programming a constructive habit.

Fake Disaster Tweet Detection Web-App Python Machine Learning Project

This project “Fake Disaster Tweet Detection” aims to help predict, whether a tweet weather it is fake or real. It uses the Multinomial Naïve Bayes approach for detecting fake or real tweets from existing datasets available on Kaggle. The classifier will be trained only on text data. Traditionally text analysis is performed using Natural Language Processing also known as NLP. Natural language processing is a field that comes under Artificial Intelligence. Its main focus is on letting computers understand human language and process it. NLP helps recognize and predict diseases using speech, it helps in sentiment analysis, cognitive assistant, spam detection, the healthcare industry, etc. In this project Training Data is pre-processed, then sent to the classifier, then and the classifier predicts weather the tweet is real or fake.

This project is made on Jupyter Notebook which is a part of Anaconda Navigator. This project ran successfully on Jupyter Notebook. The dataset was successfully loaded into the notebook. All the extra python packages which were required for project completion were also loaded into the notebook. The model is also deployed successfully using HTML, CSS, python, and flask.

The accuracy score on test data is 77.977%. average recall value is 0.775 and the average precision score is 0.775. Precision is used to calculate a number of correct positive predictions made by the model. The recall is used to calculate the number of correct positive predictions made out of all the positive predictions that could have been made.

System Design

System Flowchart

System Flowchart

Problem: To detect disaster tweets whether it’s fake or real using a machine learning algorithm. In this, the concept of Natural language Processing is used.

Identification of data: In this project, I have used a dataset available on Kaggle competition based on Natural language processing. This project works only on text data. It has five columns:

  1. Id: It tells the unique identification of each tweet
  2. Text: It tells the tweet in text form
  3. Location: It tells the place from where the tweet was sent and it can be blank
  4. Keyword: It tells a particular word in the tweet and it can be blank
  5. Target: It tells the actual value of the tweet weather it’s a real tweet or Fake

Data-preprocessing: First the preprocessing is done in the dataset which includes the removal of punctuations, then the removal of URLs, digits, non-alphabets, and contractions, then tokenization and removing Stopwords, and removing Unicode. Then lemmatization is done on the dataset. After preprocessing Countvectorizer is used to convert text data into numerical data as the classifier only works for numerical data. The dataset is then split into 70% training data and 30% test data.

Definition of Training Data: The training dataset which contains 70% of the whole dataset is used for training the model.

Algorithm Section: In this project Multinomial Naïve Bayes classifier algorithm is used for detecting disaster tweets whether they are fake or real.

Evaluation with test set: Several text samples are passed through the model to check whether the classification algorithm gives the correct result or not.

Prediction Model

Implementation Work Details

The data-set which is used in this project “Fake disaster tweet detection” is taken from the Kaggle competition “Natural Language Processing with Disaster Tweets”. The data set contains 7613 samples. This project works only on text data. It has five columns:

  • Id: It tells the unique identification of each tweet
  • Text: It tells the tweet in text form
  • Location: It tells the place from where the tweet was sent and it can be blank
  • Keyword: It tells a particular word in the tweet and it can be blank
  • Target: It tells the actual value of the tweet weather it’s a real tweet

Step 2: Data-Preprocessing

  1. Removing Punctuations: Punctuations are removed with the help of the following python code
  1. Removing URLs, digits, non-alphabets, _: True means it has HTTP, and False means it does not have HTTP
  1. Removing Contraction: It expands the words which are written in short form like can’t is expanded into cannot, I’ll is expanded into I will, etc.
  1. Lowercase the text, tokenize them, and remove Stopwords: Tokenizing means splitting the text into a list of tokens. Stopwords are the words in the text which does not provide additional meaning to the text.
  1. Lemmatizing: It converts any word into its root form like running, ran into a run.
  1. Countvectorizer:

Text cannot be used to train our model, it has to be converted into numbers that our computer can understand, so far in this project, Countvectorizer is used. Countvectorizer counts the number of times each word appears in a document. Countvectorizer works as:

Step1: It first identifies unique words in the complete dataset.

Step 2: then it will create an array of zeros for each sample of the same length as above Step 3: It then takes each word at a time and find its occurrence in each sample in the dataset. The number of times the word appears in the sample will replace the zero positioned at the word in the list. This will repeat for every word. 

Step 3: Model Used:

In this project, the Multinomial Naïve Bayes approach is used for detecting fake or real tweets from existing datasets available on Kaggle. Naïve Bayes classifier is based on the probability theorem “Bayes Theorem” and also has an assumption of conditional independence among every pair.

System Testing

This project is made on Jupyter Notebook which is a part of Anaconda Navigator. This project ran successfully on Jupyter Notebook. The dataset was successfully loaded into the notebook. All the extra python packages which were required for project completion were also loaded into the notebook. The model is also deployed successfully using HTML, CSS, python, and flask.

The machine learning model is evaluated we normally use classification accuracy which is the number of correct predictions divided by the total number of predictions.

This accuracy measuring technique works well when there is an equal number of samples in the dataset belonging to each class. The accuracy score on test data is 77.977%. average recall value is 0.775 and the average precision score is 0.775. Precision is used to calculate a number of correct positive predictions made by the model. The recall is used to calculate the number of correct positive predictions made out of all the positive predictions that could have been made.

  • Precision = True Positives / (True Positives + False Positives)
  • Recall = True Positives / (True Positives + False Negatives)

Conclusion

In this project only one classification algorithm is used which is Multinomial Naïve Bayes. First, the preprocessing is done in the dataset which includes the removal of punctuations, then removal of URLs, digits, non-alphabets, and contractions, then tokenization and removing Stopwords, and removing Unicode. Then lemmatization is done on the dataset. After preprocessing Countvectorizer is used to convert text data into numerical data as the classifier only works for numerical data. The dataset is then split into 70% training data and 30% test data. The accuracy score on test data is 77.977%. average recall value is 0.775 and the average f1 score is 0.775.

Future Scope

In the future, some other classification algorithms can also be tried on this dataset like KNN, Support vector machine (SVM), Logistic Regression, and even Deep learning algorithms can also be used which give very high accuracy. Vectorizing can be done using other methods like word2vec, Tf-Idf vectorizer, etc.

Download the Complete Project on ake Disaster Tweet Detection Web Application Python-based Machine Learning Project.

Web Technologies Project on Car Pooling Application

A brief walkthrough of the Car Pooling project

This Car Pooling application allows users to:

  • Become a member of the carpooling community (register and login)
  • Join rides
  • Offer rides
  • See the most popular rides taken

Joining A ride

The user searches for:

  • Source (starting point)
  • Destination (drop off point)

On selecting a ride, choose the number of seats (based on availability)

The cost of the ride will be displayed and will ask for verification of booking the ride

Offering the ride

A registered user creates a ride by

  • Selecting the starting point (source)
  • Selecting the endpoint (destination)
  • Entering car model (registration)
  • Enter the number of seats available
  • Cost per kilometer
  • The offered pickup points

 Inclusion of features

  • Webservices using RESTful APIs
  • Ajax Patterns
  • Submission throttling
  • To populate the source and destination of the list being searched
  • Multistage download
  • On loading the home page, the images are downloaded one after the other
  • Comet
  • SSE (server-sent events)
  • To view the topmost rides driven/ joined

Use of framework

  • RESTful API’s
  • Flask micro framework
  • Bootstrap for CSS
  • Mongo DB (for the database)

Intelligent Component

  • Calculate the fare: – ((distance * cost/km) / seats), Distance is calculated using the haversine algorithm
  • haversine algorithm – Takes two points (their latitude and longitude) and used to calculate the distance
  • To select pickup points – K nearest neighbors used
  • The intelligent component is trained using the dataset having rows as Place name, latitude, longitude
  • The model generated is used to predict the nearest neighbors of any given place
  • To decrease calculation time a pre-computed matrix of given places with respect to  distance  from all other points (places) is used

Two such matrices are used:

  • Distance matrix
  • Indexes matrix

Real-Time Map-Based Pollution Monitoring and Data Management System

Title : Real-Time Map-Based Pollution Monitoring And Data Management System

Introduction: For years, pollution has been a major issue faced by mankind and it is increasing by the day. The recent pollution disasters that happened in major cities across the globe have taught us one thing and that is, that it is important to keep an eye on the pollution that is increasing day by day. Many government and global organizations have started to work on it and almost a decade has passed since these programs have been functioning. But, the major issue with these organizations is that they are focused on beating pollution on every front whether it is air pollution or water pollution.

These organizations are more focused on amending laws for pollution control and the monitoring process boils down to analyzing air quality and then making changes in the environmental laws. Also, the issue is that these bodies are controlled by the central or federal government. But, pollution is no longer an issue that can be tackled gradually and conventionally. It needs immediate attention and effective monitoring is required so that the authorities can take necessary measures to solve the pollution problems.

The pollution problem is more persistent in urban metropolitans and metros. But, municipal corporations have very little control over the situation because of a lack of data to act upon. Recent developments in the smart city sector are also encouraging cities to develop monitoring systems. The city of Ahmedabad, Gujarat has implemented digital signboards that show the real-time value of major air pollutants and overall air quality. This data is displayed to the people driving on the road so that they can take necessary precautions to avoid or minimize the health risks due to pollution. But, this kind of Pollution Monitoring project requires a huge amount of funds and is also not feasible everywhere.

So we are building a minimalistic model to tackle the issue of monitoring pollution. Our main goal is to provide real-time data visualization and also provide a database that will store all the data and provide readings of various pollutants. The data will be visualized through the means of a map hence it would be easy to pinpoint the exact location when any kind of action is needed. We will also build a device to capture data and then feed it into a web application that can be used to monitor and visualize the data.

The main aim of this Real-Time Map-Based Pollution Monitoring project is to provide a centralized repository of sensor data and also to create an effective and centralized monitoring system. The low cost and feasibility of the project make it easy to use for both smart cities as well as small towns. Furthermore, this kind of monitoring system will allow for the development of effective countermeasures and control strategies for keeping the pollution problem in check.

Process Flow:

Pollution Monitoring System Process Flow

Methodology

Methodology

This Real-Time Map-Based Pollution Monitoring project is aimed at local authorities like the municipal corporation rather than the central government so that immediate action can be taken by them to control the pollution problem.

This Pollution Monitoring project can be briefly divided into three main parts:-

  • Data Collection.
  • Data Monitoring.
  • Data Storage.

1. Data Collection:

Data collection is an important part of this Pollution Monitoring System project. Any kind of monitoring system is functional only because of the data that has been provided to it.

Data collection will be consisting of reading data from sensors. Now, from the research conducted, we have been able to deduce the major kind of data that we need. Looking at the urban pollutants we have observed that the most prominent pollutant is the Particulate Matter (PM) and Suspended Particulate Matter (SPM).

Hence we have decided to use a DSM501A Particulate Matter and Suspended Particulate Matter Sensor for detecting PM(2.5) or Particulate Matter, which is one of the major pollutants. Also, it leads to various lung and carcinogenic diseases and skin problems.

Particulate Matter concentrations have raised dramatically in the past decades to increase the number of automobiles on urban roads. Hence we have decided that monitoring PM/SPM (Particulate Matter and Suspended Particulate Matter) is going to be one of the main agendas of our monitoring system.

Another major pollutant that has been identified is Carbon Monoxide (CO). Now, CO is not just a single pollutant but, it is also responsible for creating another harmful pollutant i.e Ozone (O3). Ozone is important for blocking UVs from the sun but, at the ground level, the Ozone is a dangerous gas. Carbon Monoxide is specifically dangerous as it affects the hemoglobin if the concentrations exceed 35 ppm (parts-per-million).

From the research we have done, it has been clear that CO is present in spatial quantities but, that means that we need to effectively monitor it to keep its concentrations at safe levels. We will be using an MQ-7 sensor for measuring Carbon Monoxide.

Studies have pointed out that SO2 and NO2 are also major air pollutants and contribute to the degradation of overall air quality. Also, several hydro-carbon compounds are pollutants although not major, affecting the air quality a lot. Hence we have decided to use an MQ-135 sensor to monitor SO2 and NO2 levels as well as the overall air quality.

The sensors will be interfaced on a Raspberry Pi and their data will record using the GPIO library (Python). The data from these sensors will then be directed to the web server and the storage.

2. Data Monitoring:

Data Monitoring is the key component of the system. To monitor the data we have decided to use Google Maps so that the position of our Raspberry Pi Module can be pinpointed and then by using color-coding we can determine the levels of pollution in the vicinity of our Raspberry Pi Module.

All of this will be achieved by creating a web server in Python using the Flask framework and the main desktop app will be a web application written in HTML, CSS, Bootstrap, and JavaScript. The desktop app will have three options

  1. Map-Based Monitoring
  2. Individual Pollutant Monitoring
  3. Statistics

3. Data Storage:

Data Storage is necessary to reference past data and develop statistics from them. The data will be stored locally on the file system and can be downloaded in the form of excel sheets.

Timeline 

Serial Number

Tasks

Duration

1.

Synopsis and Presentation Submission

15 days

2.

Component Purchasing and Testing

15 Days

3.

Interfacing sensors and writing server script

15 days

4.

Writing Front-End Application

15 days

5.

Integrating Front-End and Back-End services

15 days

Components

  • Raspberry Pi model B
  • SD card and adapter
  • MQ-7 sensor
  • MQ-135
  • DSM501A

 

Student Coding Assignment Evaluation Using API Project

Abstract:

Data mining in educational institutions is helping to analyze students’ details and provide an effective evaluation system in a short time. With the advancement of new technologies, the student evaluation procedure has changed from manual correction to automating the process of correction and analysis. This student coding assignment evaluation system using API is designed to evaluate students coding correction process through the automation process.

When a student submits an answer to a student’s question online faculty will evaluate coding by sending data to API and get results or error messages. By checking these messages faculty will give marks to students. This process is done through a web application that is developed in a python programming language.  

Problem statement:

Students assignment evaluation is a time taking process for faculty which required a manual process of checking each line of code and giving marks to students. 

Objective:

The coding evaluation process can be automated by using available code-checking API which can be integrated into the college assignment assigning website. Using this process evaluation is completed with just in a click and faculty can give marks based on results.

Existing system:

  • A manual process was used for checking assignments and evaluating results.
  • Data mining techniques were used for evaluation which uses previous coding datasets and predicts results that are not accurate.

Disadvantages:

  • Faculty must check each line of code to evaluate coding and give grading.
  • The time taken for the evaluation process is high.

Proposed system:

The student online coding evaluation system provides an automatic coding checking process through which faculty can assign coding assignments and get results from students and compile code in click and check results and give marks.

Advantages:

  • The entire process of assigning to evaluation is done online and coding evaluation is done in one click.
  • API is used for checking errors in code and giving grading.

System requirement: 

Programing language: python

Framework: Flask

Database: MYSQL

API: for compiling code