Introduction to Text and Hypertext Categorization Using Support Vector Machines:
The paper is about the classification of the documents. The documents can be classified based on predefined categories. The documents can result in multiple categories and also the documents may not be in any categories.
Application of text categorization includes the categorization of the stories which are new for the sake of online retrieval. And also includes the retrieval of information from the World Wide Web, also includes the user’s search by using hypertext.
The representation of text includes, the reduction of word to level of the stem, includes the preparation of vector in feature. And also involves a removal of stop words.
The support of the vector machines for this application includes the dimensional input space which is high, and also involves the features which are irrelevant and also the document vectors to be sparsed and also categorization of problems which can be separated linearly.
The conventional methods of classification involve the classifier of Bayes, algorithm of rocchio, including the classifier of decision tree. When we consider the Bayesian approach with a document of n attributes arranged as a1, a2….,an with total target of ‘v’ values. The approach is given by.
The above formula can be written using Bayes theorem as,
We can say that we can implement SVMs for categorizing the text and it is proved theoretically that the SVMs are suited good in case of text categorization and when its performance are compared to other methods there was an consistent improvement. Hence we can conclude that SVM is an efficient approach for categorization of text.
Download Text and Hypertext Categorization Using Support Vector Machines PPT for IT Students.