Predict the Forest Fires Python Project using Machine Learning Techniques

Predict the Forest Fires Python Project using Machine Learning Techniques is a Summer Internship Report Submitted in partial fulfillment of the requirement for an undergraduate degree of  Bachelor of Technology In Computer Science Engineering. I submit this industrial training workshop entitled “PREDICT THE FOREST FIRES” to the University, Hyderabad in partial fulfillment of the requirements for the award of the degree of “Bachelor of Technology” in “Computer Science Engineering”. 

Apart from my effort, the success of this internship largely depends on the encouragement and guidance of many others. I take this opportunity to express my gratitude to the people who have helped me in the successful competition of this internship.

I would like to thank the respected faculties who helped me to make this internship a successful accomplishment.

I would also like to thank my friends who helped me to make my work more organized and well-stacked till the end.

OBJECTIVE OF THE PROJECT:

This is a regression problem with clear outliers which cannot be predicted using any reasonable method. A comparison of the three methods has been done :

(a) Random Forest Regressor,
(b) Neural Network,
(c) Linear Regression

The output ‘area’ was first transformed with an ln(x+1) function.

One regression metric was measured: RMSE and r2 score is obtained. An analysis of the regression error curve(REC) shows that the RFR model predicts more examples within a lower admitted error. In effect, the RFR model predicts better small fires, and the r2 score is obtained by using Linear Regression.

Best Algorithm for the project:

The best model is the Random Forest Regressor which has an RMSE value of 0.628 for which we are using GridSearchCV.

Scikit-learn has the functionality of trying a bunch of combinations and seeing what works best, built-in with GridSearchCV. The CV stands for cross-validation.

MODEL BUILDING

PREPROCESSING OF THE DATA:

Preprocessing of the data actually involves the following steps:

GETTING THE DATASET:

we can get the data from the client. we can get the data from the database.
https://archive.ics.uci.edu/ml/datasets/forest+fires

IMPORTING THE LIBRARIES:

We have to import the libraries as per the requirement of the algorithm.

IMPORTING THE DATA SET:

Pandas in python provide an interesting method read_csv(). The read_csv function reads the entire dataset from a comma-separated values file and we can assign it to a DataFrame to which all the operations can be performed. It helps us to access each and every row as well as columns and each and every value can be accessed using the data frame. Any missing value or NaN value has to be cleaned.

HANDLING MISSING VALUES:

OBSERVATION:

As we can see there are no missing values in the given dataset of forest fires

DATA VISUALIZATION:

  • scatterplots and distributions of numerical features to see how they may affect the output ‘area’
  • Boxplot of how categorical column day affects the outcome
  • Boxplot of how categorical column month affects the outcome

CATEGORICAL DATA:

  • Machine Learning models are based on equations, we need to replace the text with numbers. So that we can include the numbers in the equations.
  • Categorical Variables are of two types: Nominal and Ordinal
  • Nominal: The categories do not have any numeric ordering between them. They don’t have any ordered relationship between each of them. Examples: Male or Female, any color
  • Ordinal: The categories have a numerical ordering between them. Example: Graduate is less than Post Graduate, Post Graduate is less than Ph.D. customer satisfaction survey, high low medium
  • Categorical data can be handled by using dummy variables, which are also called indicator variables.
  • Handling categorical data using dummies: In the panda’s library, we have a method called get_dummies() which creates dummy variables for those categorical data in the form of 0’s and 1’s.
  • Once these dummies got created we have to concat this dummy set to our data frame or we can add that dummy set to the data frame.
  • Categorical data-column ‘month
  • dummy set for column ‘month’
  • Categorical column-‘day’
  • dummy set for column ‘day’
  • Concatenating dummy sets to a data frame
  • Getting dummies using label encoder from scikit learn package
  • We have a method called label encoder in scikit learn package. we need to import the label encoder method from scikitlearn package and after that, we have to fit and transform the data frame to make the categorical data into dummies.
  • If we use this method to get dummies then in place of categorical data we get the numerical values (0,1,2….)
  • importing label encoder and one hot encoder
  • Handling categorical data of column month
  • Handling categorical data of column day

TRAINING THE MODEL:

  • Splitting the data: after the preprocessing is done then the data is split into train and test set
  • In Machine Learning in order to access the performance of the classifier. You train the classifier using a ‘training set’ and then test the performance of your classifier on an unseen ‘test set’. An important point to note is that during training the classifier only uses the training set. The test set must not be used during the training of the classifier. The test set will only be available during the testing of the classifier.
  • training set – a subset to train a model. (Model learns patterns between Input and Output)
  • test set – a subset to test the trained model. (To test whether the model has correctly learned)
  • The amount or percentage of Splitting can be taken as specified (i.e. train data = 75%, test data =25% or train data = 80%, test data= 20%)
  • First we need to identify the input and output variables and we need to separate the input set and output set
  • In scikit learn library we have a package called model_selection in which the train_test_split method is available. we need to import this method
  • This method splits the input and output data to train and test based on the percentage specified by the user and assigns them to four different variables(we need to mention the variables)

 EVALUATING THE CASE STUDY:

Building the model (using splitting):

First, we have to retrieve the input and output sets from the given dataset

  • Retrieving the input columns
  • Retrieving output column

MODEL BUILDING:

  • Defining Regression Error Characteristic (REC)

Download the complete project Code, Report on Predict the Forest Fires using Project using Machine Learning Techniques

Designing an UI/UX-based Project for a Customizable App

Project Title: Designing an UI/UX-based project for a customizable app that prevents a person from “High Screen Time”. This UI/UX design for an app & more Projects also includes “Custom Portfolios”, and Banner-Design for an online education platform.

Project Tools: Adobe XD, Anima, Adobe Photoshop Undraw, GradientUI, Adobe Fonts, Adobe color.

Project Duration: 6-7 Weeks.

Project Files: The project files include numerous “png files”, Browser supported links, and a project report.

REFERENCE LINKS:

• Youtube
• Behance
• Pinterest
• Adobe Tutorials

UI & UX PROJECT FEATURES:

  • The Idea of Creating an app design that would focus on being minimalistic and target the main objective of reducing screen time.
  • The title of the project is “Unplug” which fits perfectly with our goal.
  • The app includes features like custom alarm, statistics, and one-click-detox.
  • A signup page and feedback page have been created in accordance with the app design to attract customers and improve app performance.
  • User experiences have been set up to have numerous feedbacks with the goal-objective kept in mind.

SIGN-UP PAGE AND FEEDBACK PAGE DESIGN

Banner Design & Landing Page Design Features:

  • The Banner is designed accustomed to the customer needs and contains the important tagline “STAY LOCAL AND WORK GLOBAL”
  • The Banner Design contains an idea of digitally marketing an online course on “Machine Learning” with mini tags explaining the features of the online course.
  • Coming to the design tool, for the banner, I have used Adobe XD, and the logos are imported from different resources using png format.
  • The landing page design is accustomed to an individual and contains the detailing of the person’s achievement in an attractive way
  • The landing page is designed in both “Light-mode” and ‘’dark-mode’’. The main idealogy behind designing a portfolio page is to attract people to know more about the person’s work and this can improve for people seeking jobs or companies looking for employees.
  • The landing page can also be used for a portfolio website.

BANNER DESIGN AND LANDING PAGE DESIGN

APP-FEATURE DESIGNS

CONCLUSION

The project took around 5-6 weeks to complete. The basics of understanding a UI & UX design and how to execute it. The goal was to resolve the complex issue of reducing “screen time” so that the customers could focus on much more important things and improve their work consistently. The project helped me understand the key principles of design and how to adapt them to our daily project and to provide users with all the necessary features required for the goal. The add/banner design helped me understand the view of a UI designer from a digital-market side and how its system work. For the portfolio design which would further help in CV and resume-building.

FUTURE WORK

  • Since I have learned some of the most basic things in UI & UX designing, I would like to gain much more expertise and experience in the field and learn how such designs solve real-world problems.
  • Adobe xd has regularly been updating itself and so learning new skill sets would be a constant work to improve me.
  • I would learn one step further by linking java-script codes and style sheets with the designs thus making a fully-fledged front end for a website or an app.
  • Since the user-testing field is evolving continuously, I would learn various user stories and try to improve them by using them in various projects.