Travel and Tourism Analysis Data of Hotels using Big Data Hadoop

Name/Title of the Project

  • Travel and tourism analysis data of hotels using the Hadoop environment system.
  • Tourist Analysis Using Big Data (Tourist Place Recommendations Dataset)
  • Tourism Behaviour Analysis Using Hadoop (Big Data Analysis)

Problem Statement

Throughout the years the tourism industry was dependent on intermediaries, who enabled the interaction between the suppliers and the customers. The internet age changed the complexity of tourism distribution, enabling the entry of new virtual intermediaries characterized by a strong competitive advantage over other players in the sector.

Recommender systems are categorized into

1. Content-based system: With this, item recommendation is analyzed then it retrieves the information and filters this for research. For example, if the tourist goes to hill stations more often, then the database contains “hill station” as a recommendation

2. Collaborative filtering systems: They rely on similar factors of users or items. Preferences of different users for the same item are recommended by the system.

There are many challenges in designing and executing a Personalized Tourist Travel Package Recommendation System.

1. Usually Travel packages are location-based so they are pertained to space or time to reach the destination. For example, the package contains locations that are geographically near and also vary season-wise.
2. The older recommendation method is dependent upon rating and the travel data may not consist of this sort of rating.

Introduction/Feasibility Study

According to the World Travel & Tourism Council, in 2008 travel and tourism were equivalent to 9.9% of the total world GDP. The tourism industry is growing despite all the risks faced in recent years: terrorism, health fears related to avian flu, and high oil prices. The World Tourism Organization (WTO) reports that in 2006 international tourism arrivals reached its record of 842 million: a 4.5% growth compared with the previous year ( This number even exceeded previous long-term forecasts. For the following years, the WTO predicts constant growth, reaching 1.6 billion international arrivals in 2020. According to the WTO, in the last four years, the biggest tourism arrival growth occurred in the Middle East, followed by Africa and Asia, and finally Europe, while a sharp decline has been observed in the Americas. Europe, however, still accounts for the biggest market share of international arrivals. Tourism has thus a great influence on the world economy and it is important for the European market to maintain its leadership position. This can only happen if the industry will keep up with the newest technology innovations and will be able to quickly adopt them. Internet is especially relevant for the tourism industry due to its worldwide coverage, enabling direct worldwide interaction with tourists

Below are the modules required to develop this project and create an application user form:

1. Administrator authentication/module: This module is mainly based on admin.
2. User Registration: This module covers the details about the registration of users they can be registered by themselves by adding data like name, password, email id, and further details.
3. Package Modules: Users can view different tour packages available for tourists.
4. Testimonials module: This is the module where passengers can post feedback after the journey and they can share their experience.
5. Payment & Search Module:
a. Pay payment through PayPal.
b. Pay payment through draft, credit & debit cards, UPI, and net banking.
c. Search city-wise hotels, flights, packages, buses, rails, and events.
6. Routes module; This will display the route information of the source location and destination location. Users can also check the best routes for their destination. From this module user can also get information related to various routes connecting sources and destinations, For each route, information such as source, destination, fare, reservation details, pick-up points, etc. are provided.
7. Reservations module: This module is for passengers/customers where passengers can reserve their seats by making payments.

Hardware & Software Used

Software Used:

1. HADOOP environment system that acts as a tool for analysis & recommendations to tourists.
2. Technology: Java
3. Web Technology: HTML, Javascript, and CSS.
4. Database used: MySQL5.0

Hardware Used:

1. A portable PC or a working laptop.
2. A minimum RAM of 8GB or 32 GB hard disk drive.
3. Intel Core-i5 Processor, 8th Generation(minimum)

With this project, I am developing a tourism management system first by creating a user login and password form with the help of java language and some of the python frameworks which will be used for developing the code and then, giving recommendations on famous tourist places to visit in a particular area or throughout our country(India) based on user’s search history on the system. It will be a simple static web application type system with not that much of looking feel.

CSE Minor Project on Data Analysis of IT Sector in India using Big Data

Statement about the Problem:-

The IT industry is continuously growing in India, but there hasn’t been any tool yet that can analyze this sector’s growth with such a large dataset with immediate results. Such a problem can be addressed using a tool that can fetch any analysis-related query on huge datasets and can give immediate results.

Why is the particular topic chosen?

This topic finds its relevance in the analysis of growth of the IT industry of India to judge the increase in the number of IT companies in various states and at the central level too.

This tool would be able to handle huge sized datasets of companies which normally are found to be difficult to access in a fast manner to fetch relevant results

Objective and scope of the project

Using a dataset of companies to:

  • Observe IT growth in India for the past few decades in terms of various factors such as Sate wise growth to understand the development needed in the same.
  • Understand private and public sector growth of industries in India.
  • Understand the capital investment involved in various sectors of industry and many more.

Methodology/Process description:-

Dataset of companies which is huge in size will be first accessed through Cloudera software using Hadoop technology.

Using this technology, various queries would be coded down to use the dataset to give back all the results needed in minimal time.

Those results would then be converted into graphical representation to study the growth.

Required Resources :


  1. Cloudera
  2. Eclipse

What contribution would the project make?

  • This will help in studying the IT structure of India.
  • Various parameters needed to decide future steps to be taken for improvement in various states can be figured out using this analysis.
  • Analyzing growth patterns of various industries in India.
  • It is ultimately creating a tool that would be able to handle any big size of industry data and would give much faster statistical results than normal processors.

The Schedule of the project

  • Identify Statistics needed: (2 days)
  • Data Acquisition: (5 days)
  • Process/Clean Data: (1 week)
  • Exploratory Analysis: (1 week)
  • Designing Queries: (5 days)
  • Creating code: (5 days)
  • Implementing Code & Validation: (1 week)
  • Debugging code: (5 days)
  • Running code and fetching results: (1 week)
  • Graphical Conversion of results: (5 days)
  • Visualize Results: (5 days)