Corona Virus Prediction and Analysis Machine Learning Project

1.  Introduction 


Currently, there are many people, who are being affected by CoronaVirus. It started in China and now it is spreading all over the world. Till now, there is no medicine for this virus, and it’s killing millions of millions of people. So, it is a big question among all of us of how many people are going to be affected.

Problem Statement 

Currently, there is no application that can predict the spread of CoronaVirus for the future 30 days. So, with this project, we would like to create awareness among the people, by showing them how the corona rises for the future 30 days so that they can take some preventive measures by staying indoors.

Project Goal 

The main objective of this Corona Virus Prediction project is :

  • Future prediction of the increase/decrease in the number of active Coronavirus Cases for the next 30 days – for the whole world as well as for the United States of America. We have chosen the USA among all the counties as it is the highly affected country due to corona.
  • Future prediction of the increase/decrease in the number of deaths due to Coronavirus for the next 30 days.
  • Future prediction of the increase/decrease in the number of recovered cases due to Coronavirus for the next 30 days.

2.  Literature Review

 There is an outbreak of Corona in early December. This is caused due to severe acute respiratory syndrome coronavirus 2, which is basically the family of SARS virus. Many governments all over the world are issuing their own preventive measures to control the spread of coronavirus. So, we have conducted a literature review regarding this virus, based on the information that is publicly available.

Background of Literature Review:

China alerted WHO on 31st December 2019 that many people are reported to be suffering from Pneumonia, in Wuhan City. They reported that it started on Dec 8th, 2019, and there were an increasing number of patients who are working or living around the Huanan Seafood Wholesale Market.

When we started working on this project at the start of February, the Coronavirus was majorly prevalent in China. Initially, at the time of our project proposal, the mortality rate in China among all the confirmed cases is around 1.2% as of February 2020. And the mortality rate in all other countries, other than china was around only 0.2%. Among all the patients, who were admitted to the hospitals, the mortality rate, was around 11%. COVID-19 is increasing with great speed, and now there is a relatively very high mortality rate

A Way to Further Research :

So, we have performed this literature review, to analyze the spread of coronavirus. After analyzing how increasingly it’s spreading all over the world, we thought of performing our own prediction regarding this virus, so as to make people aware of its spread, and with this, they can take their own preventive measures, so that they do not fall prey to this dangerous virus.

We had very little amount of data when we started this project. It is a very trending topic all over the world. And millions of millions of people are losing their lives due to this virus. So, we are very curious to analyze this pandemic and so we have taken up this project.

We have found many datasets to collect data regarding the corona cases. Some of them include Kaggle, John Hoppkins, etc. So, we thought of choosing the dataset from John Hoppkins, as it’s updating the dataset on a daily basis. So, we collected the data and performed our own future predictions.

3. Methodology 


 So, basically, we have followed the below approach to kick-start our Corona Virus Prediction project:

  1. Firstly, we have started with research on choosing the datasets. On performing research on various datasets, we have finalized with John Hoppkins data set, as it gives us the live data on coronavirus.
  2. Secondly, we have collected the data and performed our preprocessing operation, so as to make our data ready for future predictions.
  3. Next, coming to choosing the machine learning algorithm. We have chosen appropriate machine learning(we will discuss below regarding this).
  4. Finally, we have performed our predictions to analyze the active cases, deaths, and recoveries for the next 30 days, based on the data available from the datasets and the chosen machine learning algorithm.

Figure: Approach

4.  Implications 

Benefits of the Project: 

  • This Corona Virus Prediction project helps in the prediction of coronavirus cases for the next 30 days, all over the world.
  • With this, we can also predict the increase in corona cases in the world.
  • By this, we can know how fast the coronavirus is spreading all over the world.
  • We can create awareness among people.
  • We can also create awareness in government so that they can take preventive measures to stop the spread of corona.

Lessons Learned:

Initially, I had no idea of a Machine learning algorithm. I started learning about machines from scratch. I bought some Udemy tutorials and through that, I learned everything step by step. At the start of the project, I am not even aware of what machine learning algorithm to use.

It was really an exciting experience doing this project. I am inspired to take up a Machine Learning Course for my next semester to learn deeply about Machine Learning Algorithms.

I tried my level best and contributed my 100% to this project.

Now, I came to know about machine learning, different types of machine learning Algorithms, and the differences between classification and regression algorithms -when to use what, creating test and train sets, building up the model, choosing the appropriate parameters, and performing future predictions. In the future, I would also love to take up a project related to Classification Algorithms.

5.  Conclusion 

  • Finally, to conclude, we have performed prediction using SVR and Polynomial Regression Algorithm.
  • SVR predictions are mainly for predicting the world case scenario, which includes confirmed, death, and recovered cases.
  • Polynomial Regression is used for the prediction of US Cases.
  • Based on the results, we believe that our predictions were almost accurate, with some little differences from the actual values.
  • This project can be further scalable, to include the predictions for various individual

6.  Appendix 

  • We have used Google Collab for our project. As we are two members of the team, we have chosen this, because it enables us to simultaneously work on the project from different
  • No Installation is Required.
  • We just need to have a google account. And we can easily create a Google Collaboratory file in our google drive, just like Google docs.
  • We will provide both .py files as well as .ipynb files along with this report, so as to run on google collab.
  • .ipynb can be uploaded to google collab directly and the results of the projects can be easily checked.

