Decision Model for Prediction of Movie Success Rate Data Mining J Component Project

ABSTRACT

The purpose of this Movie Success Rate Prediction project is to predict the success of any upcoming movie using Data Mining Tools. For this purpose, we have proposed a method that will analyze the cast and crew of the movie to find the success rate of the film using existing knowledge. Many factors like the cast (actors, actresses, directors, producers), budget, worldwide gross, and language will be considered for the algorithm to train and test the data. Two algorithms will be tested on our dataset and their accuracy will be checked.

 LITERATURE REVIEW

  • They developed a model to find the success of upcoming movies based on certain factors. The number of audience plays a vital role in a movie becoming successful
  • The factorization Machines approach was used to predict movie success by predicting IMDb ratings for newly released movies by combining movie metadata with social media data
  • Using the grossattribute as a training element for the model. The data are converted into .csv files after the pre-processing is done
  • Using S-PLSA – the sentiment information from online reviews and tweets, we have used the ARSA model for predicting the sales performance of movies using sentiment information and past box office performance.
  • A mathematical Model is used to predict the success and failure of upcoming movies depending on certain criteria. Their work makes use of historical data in order to successfully predict the ratings of movies to be released
  • According to them, Twitter is a platform that can provide geographical as well as timely information, making it a perfect source for spatiotemporal models.
  • The data they collected was gathered from Box Office Mojo and Wikipedia. Their data was comprised of movies released in 2016
  • Initially having a dataset of 3183 movies, they removed movies whose budget could not be found or missed key features in the end a dataset of 755 movies were obtained. After Key feature extraction was completed.
  • some useful data mining on the IMDb data, and uncovered information that cannot be seen by browsing the regular web frontend to the database.
  • According to their conclusion, brand power, actors or directors isn’t strong enough to affect the box office.
  • Their neural network was able to obtain an accuracy of 36.9% and compromising mistakes made within one category an accuracy of a whopping 75.2%
  • They divided the movies into three classes rise, stay, and fall finding that support vector machine SMO can give up to 60% correct predictions
  • The data was taken from the Internet Movie Database or IMDb as the data source, the data they obtained was from the years 1945 to 2017.
  • A more accurate classifier is also well within the realm of possibility, and could even lead to an intelligent system capable of making suggestions for a movie in preproduction, such as a change to a particular director or actor, which would be likely to increase the rating of the resulting film.
  • In this study, we proposed a movie investor assurance system (MIAS) to aid movie investment decisions at the early stage of movie production. MIAS learns from freely available historical data derived from various sources and tries to predict movie success based on profitability.
  • The data they gathered from movie databases was cleaned, integrated, and transformed before the data mining techniques were applied.
  • They used feature, extraction techniques, and polarity scores to create a list of successful or unsuccessful movies. This was done by gathering the data using IMDb and YouTube.

PROBLEM STATEMENT

in this Movie Success Rate Prediction project, The method of using the ratings of the films by the cast and crew has been an innovative and original way to solve the dilemma of film producers. Film producers have often trouble casting successful actors and directors and still trying to keep a budget. Looking at the average ratings of each actor and director together with all the films they participated in should be able to give the producer a good idea of who to cast and who not to cast in a film that is to be out right now.

Implementation:

  • Data Preprocessing & Correlation Analysis
  • Application of Decision Tree Algorithm
  • Application of Random Forest Algorithm

RESULTS & CONCLUSION

After testing both the algorithms on the IMDb dataset i.e. Decision Tree and Random Forest algorithm, we found that the Random Forest algorithm got a better accuracy (99.6%) on the data rather than the decision tree algorithm in which we obtained just 60% accuracy.

Weather Forecasting from Historical Weather Data using Data Mining

Background:

There is a Smart Grid Center that is conducting huge research to design an Intelligent Energy Use system. This system detects the weather conditions outside and would take an action accordingly (these actions would be like turning on the air conditioner, turning off the heater, etc). A huge number of historical weather information is collected from various weather sensor devices that are located across the RVR building. The information is gathered and found to be climates for a period of time (June 2012 to Present). The purpose of this project is to extract the patterns for day-to-day weather prediction from historical weather data using data mining.

Problem Statement:

Weather forecasting is a prediction of what the weather will be like in the future, it had been invented many years ago. The purpose of this project is to extract the patterns for day-to-day weather prediction from historical weather data using data mining. In this project, a prototype of the system will be developed which includes the main components of the system such as training, analysis, and prediction.

A web framework will be developed using Java Net Beans which is used to present the predicted result in a meaningful and understandable manner. The data would be presented in a graphical form which would be much easier for a user to analyze the data. The graphical view will be developed using SSRS (SQL Server Reporting Service) tool, The SSRS tool is a Reporting Service tool developed by Microsoft which is used to prepare and deliver a variety of interactive and printed reports.

Technologies:

Database: MSSQL

Frontend: Java J2EE, SSRS

Connectivity: JDBC

Operating system: Windows/Linux

Rough Timeline:

Task Id

Task Name

Time Required

1

Setup and Research

5 weeks

2

Developing the prototype using WEKA

4 weeks

3

Building a user application using Net Beans

3 weeks

4

Developing a graph view using SSRS

2 weeks

5

Integrating graph view and user application

3 weeks

6

Testing

2 weeks

7

Documentation

4 weeks

 

Sentimental Analysis Opinion Mining for Mobile Networks

Abstract:

Sentimental Analysis and Opinion Mining for Mobile Networks is a project mainly focuses on sharing posts in the application more effectively and easily. In this application, users can share their posts whether they may be images any others.

This application provides a special feature i.e.., when one user shares a post in the application, all the registered user can see the post and leave a comment to the post. By this, all users can easily find the comments for their post very easily.

Existing System:

In the existing system, it takes time for users to find the comment for their post. All the details of the comments should be verified to see whether it is positive or negative which takes a lot of time. All the work in the existing system is done manually which requires a lot of time and effort.

Proposed System:

In the proposed system users can easily find the status of the comment for their post by a user. The user can save a lot of time. The user can easily find the status of their comment on the post with in no time and with less effort.

Modules:

User:

The user should fill all the details in the registration form to get login details. The user should enter unique username and password to get a login to the application. The user can view his profile, add images, view his uploaded images and can change the password. The user can see the positive comments and negative comments of the post and also user has an option to see the graph.

Admin:

Admin can get login by entering a valid username and password. Admin can view all the activities of the users and can view all the posts uploaded by users.

Software Requirement:

Operating System – Windows
Application Server – Tomcat.
Front End – HTML, Java, Jsp
Scripts – Java Script.
Server side Script – Java Server Pages.
Database – My SQL
Database Connectivity – JDBC

Conclusion:

Out project “Sentimental Analysis and Opinion Mining for Mobile Networks” provides easy and fast opinion on the comments made on the post uploaded by the user in the application. Our application saves a lot of time and effort for users in searching the status of the post.

Climate Data Online (CDO) Data Mining Project

OVERVIEW

In this section describe the background for your application or analysis. Be detailed enough to provide the Climate Data online or “CDO” provides access to climate data products through a simple, searchable online web mapping service.

DATA

All data we have taken is be openly available (obtained from public/open systems). We have taken a dataset from WWW.DATA.GOV which gives a detailed description about Climate Normals, monthly climate reports, and drought information, analyses of weather and climate events, increasingly comparing recent events to expectations of future climate conditions, information detailing extreme events such as heat waves, droughts, tornadoes, and hurricanes have affected the North America since the dawn of time and climate information generated from examination of the data in the archives includes record temperatures, record precipitation and snowfall, climate extremes statistics.

All data must be openly available (or obtained from public/open systems). You may use an API to obtain data if the API is free and/or the account to access it is free.

Some climate data online APIS are used to obtain this data for users in variety of formats such as CSV, XML, JSON.

Source : WWW.DATA.GOV

RESEARCH QUESTIONS

The climate of the North America varies by location and by time of year. Our Climate Data Online (CDO) Data Mining Project motivation is to bring Climate Normals, monthly climate reports, and drought information are a few of the many datasets and products found under one climate section.

So users can easily get a publicly access to our web service ‘CDO’ and get the data in a variety of formats such as CSV, XML, JSON.

Data Mining For Automated Personality Classification

Experimental Method

We conduct a set of experiments to examine whether automatically trained models can be used to recognize the personality of unseen subjects. Our approach can be summarized in five steps:

  1. Store Data related to personality in database
  2. Collect associated personality characteristics for each participant;
  3. Extract relevant features from the texts;
  4. Display features relevant to his personality traits
  5. Personality and User Behavior

The following sections describe each of these steps in more detail.

  • Store Data related to personality traits in database

The personality characteristics are stored in database. Later, when user enters his personality characteristics his personality is examined in large pre-existing databases and system will detect the personality of the user.

  • Collect associated personality characteristics for each participant;

Each user will enter his personality characteristics than system will detect the personality of the user, based on the previous data stored in database.

  • Extract relevant features from the texts

System will extract relevant features from the text entered by the user. System will compare this text with data stored in database. After comparison, system will specify the personality of the user.

  • Display features relevant to his personality traits

System will examine the personality of the user based on the personality traits mentioned by the user. And will provide user with various features which is relevant to his personality traits.

  • Personality and User Behavior

The relation between personality and user behavior is tested. The hypothesis is that conscientiousness, agreeableness and neuroticism predict unique variance attitudes.

Feasibility Study

Our Proposed system will provide information about the personality of the user. Based on the personality traits provided by the user, System will match the personality traits with the data stored in database. System will automatically classify the user’s personality and will match the pattern with the stored data. System will examine the data stored in database and will match the personality traits of the user with the data in database. Than system will detect the personality of the user. Based on the personality traits of the user, system will provide other features that are relevant to the user’s personality.

  • Economic Feasibility

This system will help advertisement people to market their products based on the personality of the user which in turn provide income to the firm who is using this system. This system can be embedded with social sites, as many users can buy and sell their product using these social networks.

  • Operational Feasibility
  • Technical Feasibility

The back end of this project is SQL server  which stores data related to personality traits and other details which is related to this project. There are basic requirement of hardware to run this application. This system is developed in .Net Framework using C#. This application will be online so this application can be accessed by using any device like (Personal Computers, Laptop and with some hand held devices).

Future Scope

  • There can be module where user will be provided with career guidance which matches his personality.
  • For example: if a user has the ability to speak well and able to convince opposite person. So, this user will be good in marketing field.

Software Requirements:

  • Windows
  • Sql
  • Visual studio 2010

Hardware Components:

  • Processor – Dual Core
  • Hard Disk – 50 GB
  • Memory – 1GB RAM
  • Internet Connection

Application

  • This system can be helpful for firms to identify the personality of the interviewee based on the personality traits of the interviewee.
  • This system is useful for the firms for marketing their products and helps them to target the correct customers.

References:

live projects on data mining

live projects on data mining in Hyderabad: 

 Students can request for live projects on data mining to us at info.1000projects{at}gmail.com mail address. Students can even find address of live project providers in the menu bar live projects category and request them about the project.

                               submit live projects on data mining to us. 


 

Efficient Class Oriented Evaluation Of Multiclass Performance Models Project Source Code

Efficient class oriented evaluation of multiclass performance models project Description:

Efficient class oriented evaluation of multiclass performance models project is a cse networking project which is implemented in java platform. This project is used to implement a efficient queuing networking model called COMOM which works with multiple classes in network. In present networking system queuing models are mostly used for capacity planning and performance evaluation in multiclass systems.

IN existing system queuing network work on Mean value analysis model which works for only limited user sessions but there are many scenarios where several classes are useful. In order to solve this problem in this paper we propose efficient class oriented evaluation of multiclass performance model.

Here we provide database files, code for different peers, Main server, compile file, base paper and entire project source code with compiling file.

download efficient class oriented evaluation of multiclass performance model project base paper pdf, source code in java and project report with ppt.

Active Learning Methods For Interactive Image Retrieval Project

Active learning methods for interactive image retrieval project Description:

Active learning methods for interactive image retrieval project are a 2008 cse project which is implemented in asp.net platform. In this project we cover the concept of image retrivel for searching images in database based on query concept.

In this paper we propose a content based image retrieval method which works on query based search. When user searches for images on search result using this method top most results in database are displayed to the user and it is followed by a interactive method through which user can refine his results based on his interest using relevance feedback loop.

Here we provide different methods of user and system interaction methods in which most of them works on binary labels for matching searched query with data in database is matched.

download Active learning methods for interactive image retrieval project base paper pdf, project report and ppt.

Hardware Enhanced Association Rule Mining With Hashing And Pipelining Project

Hardware enhanced association rule mining with hashing and pipelining project Description:

Hardware enhanced association rule mining with hashing and pipelining project is a 2008 cse project which is implemented in visual studio asp.net platform. In this project we will study about how data mining technology can be useful for knowing behavior of customers and provide effective solution using algorithms for capturing users interests in to data base and merging that data in to hardware using algorithms and also analyze different problems caused due to existing hardware systems for merging data from database when large amount of data from database doesn’t match with the memory on the hardware.

In this paper we propose hashing and pipelining methods for dealing with this problem.

 Detailed explanation about this project is explained in project base paper and project report.

download hardware enhanced association rule mining with hashing and pipelining base paper pdf, project report with ppt.

Watermarking Relational Databases Using Optimization Based Techniques Project

Watermarking Relational Databases Using Optimization Based Techniques project Description:

Watermarking Relational Databases Using Optimization Based Techniques project is a 2008 cse project . This project explains about using water marking technology for image, text, video and software’s for providing owner ship information hidden in information and provide security for data sharing along the net. Using this method owner of data can take required steps to prove the genuinely using the document.  Important features of this method is user cant modify, delete or update watermarked details in the content.

In present technology data sharing and maintaining owner ship rights had became a big issue on web. In order to provide solution for this problem this concept is implemented.

Detailed explanation about this project is provided in base paper and project documentation.

download Watermarking Relational Databases Using Optimization Based Techniques project base paper pdf, project report with ppt.