Author

Amir Tavasoli

Date of Award

9-2010

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

Supervisor

Norm Archer

Language

English

Abstract

Classification or Categorization is a text mining technique in which the given text documents are classified into specified categories. There are several techniques for classifying messages, ranging from simple K Nearest Neighbours to complicated Support Vector Machines. These classifiers have proven to be effective in cases where the documents in each category do not have a great deal of overlap with other documents. Designing a classifier that is effective in environments where there is no way to avoid this overlap, like em ails, text messages, or user opinions and comments, has remained a continuing challenge. This work is a proposal for a system that classifies such documents based on their content so they can be sorted by semantic significance. This has several applications in the real world, like triaging patient messages to physicians in the healthcare field or sorting user opinions on a product webpage. We have combined and tailored different classifiers to build a high performance classifier that supports this type of classification. The system has been tested and proven to have good performance with real-world user messages that were exchanged between patients and physicians during a hypertension prevention study.

McMaster University Library

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS