Introduction to Distributional Features for Text Categorization Project:
Text categorization is known for providing a predefined category to natural language text. The “bag-of-word’” presentation was used on the values of the word that based on the existence of the word in the document and its repetition. Though the values mentioned for the Text categorization word was not able to clarify the information to be provided due to improper circulation of the word in the document.
The proposed project presents other kinds of values known for the word. The ideal values which is to be highlighted through word possesses a features called Distributional features that projects the clear expression of word and the proper place to start word as meaningful. The Distributional features presented through “tfidf style equation” and added with Ensemble learning techniques. The Distributional feature are advantageous for the text categorization projected on its appearance and meaning.
The current system is based on the values to be presented along with its presence and appearances in the document. The system also includes the statistical way of presentation depends on number of words existed continuously in the text called n-gram.
1. The current system is unable to provide the information which are mentioned in the document completely.
2. The current system is inefficient.
The Distributional features of the proposed system are based on tfidf style equation and other features are assembled through Ensemble learning techniques. The distributional features are applied along with inverted index. The inverted index not only mentions the number of times of word presentation, also the places of presentation in the document. Depending on the word presented and size of document the features of word is structured and computed.
- The text categorization of Distributional feature can be made in less cost.
- The Distributional system has superior performance.
- The Distributional system features has shown its quality when documents are lengthy and have informal style.