Text classification method review

Mahinovs, Aigars; Tiwari, Ashutosh; Roy, Rajkumar; Baxter, David

Text classification method review

Files

mahinovs.pdf (326.27 KB)

Date published

2007-04-01T00:00:00Z

Authors

Mahinovs, Aigars
Tiwari, Ashutosh
Roy, Rajkumar
Baxter, David

Type

Report

URI

http://dspace.lib.cranfield.ac.uk/handle/1826/1860

Abstract

With the explosion of information fuelled by the growth of the World Wide Web it is no longer feasible for a human observer to understand all the data coming in or even classify it into categories. With this growth of information and simultaneous growth of available computing power automatic classification of data, particularly textual data, gains increasingly high importance. This paper provides a review of generic text classification process, phases of that process and methods being used at each phase. Examples from web page classification and spam classification are provided throughout the text. Principles of operation of four main text classification engines are described – Naïve Bayesian, k Nearest Neighbours, Support Vector Machines and Perceptron Neural Networks. This paper will look through the state of the art in all these phases, take note of methods and algorithms used and of different ways that researchers are trying to reduce computational complexity and improve the precision of text classification process as well as how the text classification is used in practice. The paper is written in a way to avoid extensive use of mathematical formulae in order to be more suited for readers with little or no background in theoretical mathemati

Keywords

Text classification, Bayes, kNN, SVM, Neural network, Feature extraction, Feature reduction, Web page classification

Collections

Decision Engineering Report (DEG) Series

Full item page

Text classification method review

Files

Date published

Free to read from

Authors

Supervisor/s

Journal Title

Journal ISSN

Volume Title

Publisher

Department

Type

ISSN

Format

URI

Citation

Abstract

Description

Software Description

Software Language

Github

Keywords

DOI

Rights

Relationships

Relationships

Resources

Funder/s

Collections