Text classification method review

Date published

2007-04-01T00:00:00Z

Free to read from

Supervisor/s

Journal Title

Journal ISSN

Volume Title

Publisher

Department

Type

Report

ISSN

Format

Citation

Abstract

With the explosion of information fuelled by the growth of the World Wide Web it is no longer feasible for a human observer to understand all the data coming in or even classify it into categories. With this growth of information and simultaneous growth of available computing power automatic classification of data, particularly textual data, gains increasingly high importance. This paper provides a review of generic text classification process, phases of that process and methods being used at each phase. Examples from web page classification and spam classification are provided throughout the text. Principles of operation of four main text classification engines are described – Naïve Bayesian, k Nearest Neighbours, Support Vector Machines and Perceptron Neural Networks. This paper will look through the state of the art in all these phases, take note of methods and algorithms used and of different ways that researchers are trying to reduce computational complexity and improve the precision of text classification process as well as how the text classification is used in practice. The paper is written in a way to avoid extensive use of mathematical formulae in order to be more suited for readers with little or no background in theoretical mathemati

Description

Software Description

Software Language

Github

Keywords

Text classification, Bayes, kNN, SVM, Neural network, Feature extraction, Feature reduction, Web page classification

DOI

Rights

Relationships

Relationships

Supplements

Funder/s