On Feature Extraction and Classification in Speech and Image Processing

Document type: Dissertations
Full text:
Author(s): Mikael Nilsson
Title: On Feature Extraction and Classification in Speech and Image Processing
Series: Blekinge Institute of Technology Doctoral Dissertation Series
Year: 2007
Issue: 15
Pagination: 166
ISBN: 978-91-7295-123-5
ISSN: 1653-2090
Publisher: Blekinge Institute of Technology
City: Karlskrona
Organization: Blekinge Institute of Technology
Department: School of Engineering - Dept. of Signal Processing (Sektionen för teknik – avd. för signalbehandling)
School of Engineering S- 372 25 Ronneby
+46 455 38 50 00
Authors e-mail: mikael.nilsson@bth.se
Language: English
Abstract: The natural world is home to innumerable patterns in various forms, which humans are able to locate and interpret by means of the senses. This thesis presents and explores different techniques that mimic such behavior through the use of artificial sensors and computational power, i.e. aspects of machine learning with particular emphasis on pattern recognition. Theory and practical issues are explored with respect to two main operations; feature extraction and classification.
On the topic of feature extraction, this thesis introduces a new signal processing transform, denoted the Successive Mean Quantization Transform (SMQT). The relevant theory, extensions and numerical transformations are presented, along with possible usage of this transform in various situations.
Two different classifiers are investigated; the hidden Markov model and the sparse network of winnows. The hidden Markov model is a stochastic model which has been used successfully in the context of various pattern recognition applications. During the implementation of a complete system using the hidden Markov model, a number of possible numerical issues can arise. The relevant theory behind these numerical issues is presented, as are a number of possible solutions. The sparse network of winnows is a general purpose classifier.
In the context of this thesis, it is tailored for the task of fast binary classification using lookup tables. Further, a scheme is proposed to split up this classifier in order to perform faster classification. This scheme is denoted the split up sparse network of winnows. The sections of this thesis dedicated to feature extraction and classification present a number of tools which are utilized further in three applications.
The first application is concerned with the enhancement of noise degraded speech. Specifically, this application addresses the task of reducing non-stationary noise from speech using the hidden Markov model. The second application addresses the task of automatic image enhancement. For this task, the Successive Mean Quantization Transform is investigated. The final application is concerned with face detection. For this task, illumination problems and speed issues are discussed, along with proposed solutions.
Subject: Signal Processing\General
URN: urn:nbn:se:bth-00380