Sarath Chandra Uppalapati; Deepak Penugonda , pp. 80. ING/School of Engineering, 2012.
In present conference environments where video recording is required, a set of
cameras operated by a human being is needed to track the active speaker as he discusses in
the conference. In order to automate this procedure, different methods have been developed
in acoustic and visual tracking.
In this thesis work, a robust speaker tracking system is developed namely, Steered
Response Power PHase Alignment Transform (SRP-PHAT) and Steered Response Kurtosis
PHase Alignment Transform (SRK-PHAT) which compute the likelihood of each source
position using the generalized cross correlation estimations between each pair of
microphones. While developing the hands-free speech applications in a smart room
environment, speech source will be located at a distance from the microphones and the effect
of presence of noise and reverberation is high in estimating the location of the speech source.
The accuracy of the SRP-Phat and SRK-Phat methods in estimating the source location is
limited by the time resolution of weighted PHAT function. In this thesis work, SRP-Phat and
SRK-Phat has been implemented using 2 element microphone array and 4 element
microphone array and to compare the above methods in detail, the performance of the
methods has been analyzed for 64,128 and 256 subbands in a WOLA filter bank. The
estimated Time difference of arrival (TDOA’s) and Direction of Arrival (DOA’s) of SRPPhat
the speech source location. Mean estimation error and Standard deviation are
calculated to determine the accuracy of the TDOA’s estimated.
In this thesis work, Wiener Beamforming is implemented for removing noise and
reverberation in a room environment using a 2 element microphone array. The performance
of the method is analyzed using Signal-to-Noise Ratio (SNR) and Perceptual Evaluation of
Speech Quality (PESQ). In order to improve the results obtained, a De-reverberation
procedure is also included in the Wiener Beamforming method and the improvement in
PESQ values is discussed in chapter 4.The performance of the wiener beamforming method
is tested for brown noise, babble noise, fan noise and white noise.