Web mining refers to applying data mining techniques on web based hypertexts. As we have large amount of data available online. web mining is more popular now-a-days. E-commerce is the most important application of web mining for its rapid development.
Classification of web mining:
Web mining is classified into three types based on the type of data used: i) web content mining ii) web structure mining iii) web usage mining
Web content mining: web content mining explains the mining of data from web contents. Web is the repository of data so that there are so many techniques to extract the data. Most of the techniques are based on the semi structured HTML documents data and unstructured free text data. Web content mining is of two point views information retrieval view and database view.
Web structure mining: under web structure mining the models behind the hyperlinks is discovered. These models shows the structures of the hyperlinks present on the web site so that the similar data could be grouped together therefore it would be easy to the user to retrieve the data.
Web usage mining: different user uses different technique to retrieve the needed information from the web. These patterns are discovered using this technique to improve the quality of web content to the end user and also to improve the site design. Web usage mining is also known as web log mining.
Web data clustering:
Clustering the web data means to group the data. Similar kind of data is grouped together into one class so that when ever the user requests for particular data the information related to that particular topic will be retrieved. We have lot benefits by clustering the data like integrating various data representations, improving the web data accessibility, improving information retrieval, etc. We have different types of clustering. Those are: hierarchical clustering, partitional clustering, probabilistic clustering, graph based clustering, fuzzy clustering, neural network based clustering and hybrid clustering.
Web caching refers to storage of recently retrieved computer information to future use. Web caching decreases the traffic in the web by reducing the bandwidth usage and reduces the server load. Evaluation methods and techniques of web caching are hit rate, weighted hit rate and latency.
Download Technical Seminar Topic with PPT on Mining the Web Graph for IT Students.