Data Mining Engineering Seminar Report

Description: The DATA MINING Engineering Seminar Report gives a basic insight into the concept of data-mining. It suggests that there has been a data flood owing to data generated from Banks, telecom industry, scientific labs, astronomical data, population statistics and many many more. The data generated from the e-world is also flood like with enormous data and information being churned out from web, text, e-commerce.

Why data is captured: The presentation suggests that more and more data is captured owing to greater storage and security techniques. These techniques have become cheaper and now there are sophisticated data base management systems which facilitate greater data storage with more accuracy and security.

Some laws have been mentioned in the presentation. According to Moore’s law computer speed doubles in every 18 months. And total storage doubles every nine months.

Data mining has been defined in the presentation as a process that identifies valid, novel, potentially useful, and understandable patterns of data. Search Engines and Telephone directories are not part of data mining. Since competitive pressure is strong data mining is gaining a never like before advantage.Data mining allows collection and storage of data at enormous speed. The information collected is from sources like

  • remote sensors on a satellite
  • telescopes scanning the skies
  • microarrays generating gene
    expression data
  • scientific simulations
    generating terabytes of data

Some Major Tasks of data mining : ( as mentioned in the presentation)

  • Classification: predicting an item class
  • Associations: e.g. A & B & C occur frequently
  • Visualization: to facilitate human discovery
  • Estimation: predicting a continuous value
  • Deviation Detection: finding changes
  • Link Analysis:  finding relationships

Data Mining techniques are used in fraud detection, direct marketing,

Conclusion: The presentation ends by quoting some challenges of data mining are privacy preservation, data quality, dimensionability, data ownership and distribution.

