\documentclass[12pt]{report}
\usepackage[textures]{graphicx}
\raggedbottom
%%\renewcommand{\bottomfraction}{0.89}
%%\renewcommand{\topfraction}{0.4}
%%\renewcommand{\textfraction}{0.01}
%%\renewcommand{\floatpagefraction}{0.6}
%%\setcounter{totalnumber}{4}
\setlength{\unitlength}{1cm}
\textwidth 158mm
\oddsidemargin 0mm
\evensidemargin 0mm
\topmargin -20mm
\textheight 253 mm
\input{picdefe}
\input{picdefq}
\input{option_keys}
\begin{document}
\title{Sub-band Acoustic Echo Cancellation}
\author{Sven Nordholm Jšrgen Nordberg Sven Nordebo}
\date{August 19, 1997}
%\thanks{The
%results described
%in this paper have been developed as a part of NUTEK:s
%telecommunications
%program, Email address Sven.Nordholm@isb.hk-r.se}}
\maketitle
\ \\
\ \\
\ \\
\ \\
\ \\
\ \\
\ \\
\ \\
\ \\
\ \\
\section*{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Sammanfattning}
Eko-utslŠckning Šr en viktig del i dagens moderna kommunikationssystem. AnvŠndningen av "hands-free" utrustning i bilar,
datorapplikationer, videokonferanser e.t.c, har skapat ett škande behov av hšgkvalitativ akustisk eko-utslŠckning.
Inom dessa anvŠndnings omrŒden Šr typiska lŠngder pŒ den akustiska kanalens impulssvar ca. 500-1500 filterkoeficienter, vid en samplingsfrekvens pŒ 12 khz.
Genom att lŒta eko-utslŠckaren jobba med filter-strukturer som delar in signalerna i olika frekvensband
fŒs en minskad berŠkningskomplexitet och en snabbare konvergens fšr eko-utslŠckaren. I denna studie visas att med den hŠr
typen av filter, kan en ekoundertryckning pŒ 30 dB uppnŒs och en fšrbŠttrad konvergenshastighet jŠmfšrt
med en traditionell ekoutslŠckare. I den hŠr rapporten Šr tyngdpunkten lagd pŒ konstruktionen av filter-banken som delar up signalerna i olik delband
och infšrandet av en enkel taldetektor som pŒ ett mŠrkbart sŠtt škar ekoutslŠckarens effektivitet.
\pagestyle{empty}
\begin{abstract}
Echo suppression is a vital part of every communications system.
The use of hands-free communication in cars, computer applications
and video conferencing have created further demands for high-quality
acoustic echo
cancellation. In these applications the acoustic channel has,
typically, a long
impulse response in the order of 100ms. Typical lengths of adaptive
FIR-filters are 500-1500 taps, assuming a 12 kHz sampling frequency.
In order to reduce the computational load and also to improve the
convergence rate, sub-band
processing schemes have been suggested. This paper presents a
study of a delayless sub-band adaptive filter. The study shows a
possible
echo suppression of about 30 dB and also an improved convergence
rate when compared to a fullband
LMS-filter. The main issues discussed are filter bank design and a
simple speech detector
that gives a drastic performance improvement.
\end{abstract}
\pagestyle{plain}
\chapter{Introduction}
In modern hands-free communication systems such as hands-free car
phones, loudspeaker phones and
video conference systems, it is necessary to perform an acoustic echo
cancellation of the far-end
speaker\cite{Wst85, Son92, Mo95}. In order to track variations in
the acoustic channel, the echo
cancellation is made adaptive. The filter length of the acoustic
canceller is typically 500-1500 FIR taps for normal sampling
frequencies. Long filters imply
a large computational burden and slow convergence rate. The slow
convergence rate is especially
obvious in signals with a large spectral dynamic range such as
speech
signals. A sub-band echo canceller \cite{Ono94,MThi95} gives several
advantages when compared to a full-band echo canceller such as:
\begin{enumerate}
\item The computational burden is essentially
reduced by the number of sub-bands due to decimation.
\item A faster convergence since the spectral dynamic range in each
sub-band will be
less.
\item The signal controlled adaptation can be performed in each
sub-band individually, hence enhanced performance.
\item A well separated structure for parallel implementation.
\end{enumerate}
This paper presents an improved version of a delayless sub-band
adaptive filter (DSAF), presented by
Morgan and Thi \cite{MThi95}. This adaptive filter structure employs
the benefits of adaptive sub-band
filtering, but does not suffer from the inherent delay usually found
in sub-band schemes. This is due
to the fact that the FIR filtering is performed without delay
directly on the full-band signal.
The following improvements are presented in this paper:
\begin{enumerate}
\item Improved filter bank design which makes it possible to
improve the convergence rate.
\item A signal detection scheme operating in each sub-band, thereby
improving
the convergence rate for signals with a highly varying spectral
content.
\item A detailed analysis of the polyphase decomposition.
\end{enumerate}
The outline of the paper is as follows:
\begin{itemize}
\item In chapter \ref{sec2} the sub-band adaptive filter is
presented
\item Chapter \ref{sec6} shows how
different prototype filters in the filter bank affect the
performance of the echo-canceller.
\item Chapter \ref{sec7} presents improvements to the standard
scheme. These improvements give better
performance and convergence rates when using speech signals for
identification of the acoustic channel.
\item In chapter \ref{sec4} the results derived from real signals
(gathered
in a car environment) are presented.
\item Chapter \ref{sec8} concludes the paper and suggests further
improvements.
\end{itemize}
\chapter{ Sub-band Adaptive Filters}\label{sec2}
An acoustic echo canceller, see Fig. \ref{fig:AEC}, identifies the
channel between the
loudspeaker and the hands-free microphone. This identified impulse
response is then employed to
achieve an inhibition of the echo. One of the fundamental
characteristics of this channel is the bulk
delay. A typical distance between loudspeaker and microphone is 1 m.
This corresponds to a
3 ms delay and with 8-12 kHz sample frequency this corresponds to
about 20-30 samples. However, a 50 taps
long FIR filter will only characterize the direct wave and give a
suppression of about 5-10 dB. In
order to achieve the suppression goal which is to suppress 30-40
dB, filter lengths of 500-1000
FIR taps become necessary. The filter should also be able to track
variations in the acoustic
environment. An appealing approach is to use a multirate technique
since this technique reduces the
computational burden and also gives a faster convergence rate. The
latter
is due to the reduction of
spectral dynamic range in each sub-band. A major drawback is the
delay
that is introduced by the filter
bank. This can, however, be circumvented by using a modified
structure
for the sub-band adaptive filter
\cite{MThi95}.
\section{The Delayless Sub-band Adaptive Filter}
The delayless attribute of this technique comes from the fact that
the new adaptive weights are computed in sub-bands and then
transformed to an
equivalent full-band filter with means of an inverse FFT, see Fig.
\ref{fig:AEC}.
The filter works in real time on the loudspeaker signal. The
coefficients are calculated separately in each band. They can be
calculated either by
employing the error signal $e(k)$ (closed loop case) or the
microphone input signal
$d(k)$ (open loop case). If the signal $d(k)$ is used, a local error
signal in each
band is created and the calculations do not need to be performed in
real time. This approach
will, however, give less suppression since the algorithm is working
blind with
respect to the real error signal. The full-band signal is divided
into several
sub-band signals by using a polyphase FFT technique \cite{Vai93}.
\begin{figure}[htb]
\centerline{\includegraphics[width=10cm]{metodsw}}
\caption{Delayless sub-band acoustic echo canceller; position A open
loop configuration and position B
closed loop configuration}
\label{fig:AEC}
\end{figure}
\section{Polyphase FFT Filter Banks}
A set of M filters is said to be a uniform DFT filter bank if they
are related as
\begin {equation}
H_l(z)=H_0(zW^l)=\sum_{n=-\infty}^{\infty}h_0(n)(zW^l)^{-n},
\label{eq:hl}
\end{equation}
where $W=e^{-j2\pi /M} $ and $l \in [0,M-1]$. The polyphase
decomposition can be used to implement such a
filter bank in a very efficient manner \cite{Vai93}.
The number of filters in the filter bank is M, thus the passband
frequency of the prototype filter is set to $\frac{1}{2M}$. Since only
full-band filters with real coefficients are considered, it is
enough to calculate $\frac{M}{2}+1$ complex
sub-band signals and, in order to avoid aliasing, the signals in
the
filter bank are decimated by a factor of only $\frac{M}{2}$ .
The polyphase decomposition of the DFT filter bank is
performed accordingly.
The resulting filter after decimation will have even sub-bands
centered at dc, while odd sub-bands are
centered at $\frac{1}{2}$ , see Fig.
\ref{fig:DSEC}.
\begin{figure}[htb]
\centerline{\includegraphics[width=7cm]{subfiltps1}}
\caption{Filter-bank response for odd and even sub-bands after
decimation.}
\label{fig:DSEC}
\end{figure}
The prototype filter $H_0(z)$ is polyphase decomposed as
\begin {equation}
\label{eq:h0}
H_0(z)=
\sum_{m=-\infty}^{\infty}h_0(n)z^{-n}=\sum_{m=0}^{M/2-1}z^{-m}\sum_{n=-\infty}^{\infty}h_0(n\frac{M}{2}+m)z^{-nM/2}.
\end {equation}
An arbitrary filter in the filter bank Eq. (\ref{eq:hl} ) and
(\ref{eq:h0}) yields,
\begin {equation}
\label{eq:hl1}
H_l(z) = \sum_{m=-\infty}^{\infty}h_0(n)(W^lz)^{-n} =
\sum_{m=0}^{M/2-1}(W^lz)^{-m}\sum_{n=-\infty}^{\infty}h_0(n
\frac{M}{2}+m)(W^lz)^{-nM/2}.
\end {equation}
where
\begin {equation}
W^{-lnM/2}=(e^{j \pi l})^n
= \left\{ \begin{array}{ll}
(-1)^n & \textrm{l odd} \\
1 & \textrm{l even} \\
\end{array} \right.
\label{eq:W}
\end {equation}
Eq. (\ref{eq:W}) indicates that odd and even sub-bands must be
treated separately, see Fig. \ref{fig:pfft}.\\ For
odd $l$ Eq. (\ref{eq:hl1}) yields, \newline
\begin{equation} \label{eq:odd}
H_l(z) = \sum_{m=0}^{M/2-1}(W^lz)^{-m}\sum_{n=-\infty}^{\infty}h_0(n
\frac{M}{2}+m)(-1)^nz^{-nM/2}
\end{equation}
defining $E'_m(z)$ as
\begin{equation} \label{eq:odd2}
E'_m(z) = \sum_{n=-\infty}^{\infty}h_0(n \frac{M}{2}+m)(-1)^nz^{-n} ,
\end{equation}
then Eq. (\ref{eq:odd}) can be rewritten as
\begin{equation}
H_l(z) = \sum_{m=0}^{M/2-1}(W^lz)^{-m}E'_m(z^{M/2}) .
\end{equation}
For even \textsl{l} Eq. (\ref{eq:hl1}) yields
\begin{equation} \label{eq:even}
H_l(z) = \sum_{m=0}^{M/2-1}(W^lz)^{-m}\sum_{n=-\infty}^{\infty}h_0(n
\frac{M}{2}+m)z^{-nM/2}
\end{equation}
defining $E_m(z)$ as
\begin{equation} \label{eq:even2}
E_m(z) = \sum_{n=-\infty}^{\infty}h_0(n \frac{M}{2}+m)z^{-n}
\end{equation}
then Eq. (\ref{eq:even}) can be rewritten as
\begin{equation}
H_l(z) = \sum_{m=0}^{M/2-1}(W^lz)^{-m}E_m(z^{M/2})
\end{equation}
This means that the polyphase filter bank is divided into two filter
structures: one for even sub-bands and one
for odd sub-bands, see Fig. \ref{fig:pfft}
\begin{figure}[htb]
\centerline{\includegraphics[width=10cm]{pfft.ps}}
\caption{A filter bank design with polyphase FFT where even and
odd sub-bands are calculated separately.}
\label{fig:pfft}
\end{figure}
\section {Transformation from sub-band coefficients to full-band
coefficients:}
If the full-band filter has $N$ taps, the filter length in each
sub-band will be
$\frac{N}{D}$, $D=\frac{M}{2}$. A $\frac{N}{D}$ point FFT will be
calculated on the adaptive weights in
each sub-band. These are subsequently stacked to form a
$[0...(\frac{N}{2}-1)]$ element array. The array is then
completed by setting element $N/2$ to zero and using the complex
conjugate of elements $[1...(\frac{N}{2}-1)]$ in
reverse order. Finally, the $N$ element array is transformed by a $N$
point inverse FFT to obtain the
full-band filter weights. The method is best described by an
example, in this example $N=512$ and
$M=32$. The correspondence between the FFT bins for the sub-band
filter and the full-band
filter are given according to table \ref{fig:stackning}.
\begin{figure}[htb]
\centerline{\includegraphics[width=15cm]{stackningps}}
\caption{Frequency mapping from sub-band FFT bin numbers to wideband
FFT bin numbers for a 32-sub-band polyphase FFT implementation
with 512-point impulse responses and 32 taps per sub-band.}
\label{fig:stackning}
\end{figure}
\chapter{The Design of Prototype Filter for Polyphase DFT Filter
Bank}\label{sec6}
In this chapter a method to design general prototype filters is
outlined. This method allows
filter design parameters, such as magnitude response, length, phase
linearity and group delay of the
filter to be included in the design. These parameters will affect the
convergence rate of the echo
canceller as well as the suppression if not properly chosen.
\section{Parameters and Convergence}
It is well known that the convergence rate of the LMS algorithm is
heavily dependent on the
eigenvalue spread of the covariance matrix, such eigenvalues being
related to the spectral dynamics
of the input signal\cite{gray}. The DSAF employs a M-band filter
bank,
but the signals are only
decimated with a factor $\frac{M}{2}$ which results in that the
corresponding filter function in each band being as given
in Fig. \ref{fig:DSEC}. The consequence of this is that the
adaptive filter in each band has only
an excitation signal over half the band. This will yield a slow
convergence if one is interested in the adaptive filter parameters
outside that band \cite{icl92}. In this case, it is only necessary
to obtain the adaptive filter parameters over half the band (only half
of the DFT bins are used in the conversion to the full band filter).
It is, however, essential that the prototype filter has low ripple in
the passband.
A parameter of essential interest is the filter bank delay,
as any delay will affect the speed with which
the adaptive filters respond to any sudden change in the acoustic
channel. This is especially
important for a closed loop type of implementation (the method using
the error signal
$\varepsilon(k)$). A linear phase FIR filter has a group delay of
$T_d =
\frac{N}{2} $. A prototype filter can be
designed with less group delay compared to a linear phase FIR
filter, whilst retaining a similar
magnitude response. This type of filter creates a system which
can detect changes in the environment faster. An essential part is
that the filter has a good phase linearity in the passband otherwise,
the adaptive filters must compensate for the phase distortion (this
argument is posed for the
closed loop case). This will slow down the convergence rate.
\section{Filter Design Procedure}
In this chapter a filter design procedure will be presented which
allows the designer to design filters with an arbitrary
group delay using the ordinary filter
types LP, BP, BS and HP. The specification in this context is such
that the linear phase requirement
is set only in the pass-band . Other filter types can also be
designed but these will, however, require a change in the desired
filter specification, Eq. (\ref{eq:hd}). The
design filter is the result of a minimisation of the mean square
error between the specified desired
filter and the design filter. The result will be a compromise
between the two design
parameters, magnitude response and group delay. The designer can
choose to emphasize either of the design parameters. This is done by
employing two
different weighting matrices (one for magnitude and one for group
delay).
\subsection{Mathematical Outline}
The frequency function of a N-tap (causal) FIR filter is given by
\begin{equation} \label{eq:fd1}
H(\omega) = \sum_{n=0}^{N-1}h(n)e^{-j \omega n}
\end{equation}
where the impulse response is assumed real. The Eq. (\ref{eq:fd1})
can also be written as
\normalsize
\begin{equation} \label{eq:fd2}
%H(\omega) = \mbox{ \boldmath $ \phi$}^H(\omega)\mathbf{h}
H(\omega) = \mathbf{\phi}^H(\omega)\mathbf{h}
%H(\omega) = \mbox{ \boldmath $ \phi$}
\end{equation}
where
\begin{equation}
\mathbf{\phi}
% \mbox{ \boldmath $ \phi$}
(\omega) =
\left( \begin{array}{c}
1 \\
e^{j\omega} \\
\vdots \\
e^{j\omega(N-1)}
\end{array} \right)
\textrm{and} \quad
\mathbf{h} =
\left( \begin{array}{c}
h(0) \\
h(1) \\
\vdots \\
h(N-1)
\end{array} \right)
\end{equation} \\
$H_{di}$ is the desired complex filter specification, Eq.
(\ref{eq:hd}), where $T_d$ is the group
delay
\begin{equation} \label{eq:hd}
H_{d}(\omega) = e^{-j\omega T_d}
\end{equation}
Let $\omega_i$ be discrete frequency points between $[0,\pi]$ for $ i
\in [1,\ldots ,I]$ . If
$\mathbf{\phi}_i = \mathbf{\phi}{(\omega _i)}$ and
$\begin{array}{c} \mathbf{\Phi} = [ \mathbf{\phi} _1 \ldots
\mathbf{\phi} _I] \end{array}$
then Eq. (\ref{eq:fd2}) can be rewritten as
\begin{equation}
\mathbf{H = \Phi}^H \mathbf{h} %(\omega).
\end{equation}
Defining $\mathbf{H}_d ,\mathbf{W}_m$ (the magnitude weighting
matrix) and $\mathbf{W}_g$ (the group delay
weighting matrix) accordingly to Eq.
(\ref{eq:fd3})\\
\begin{equation} \label{eq:fd3}
\mathbf{H}_d=
\left( \begin{array}{c}
H_{d1} \\
\vdots \\
H_{dI} \\
\end{array} \right) \ \
\mathbf{W}_m =
\left( \begin{array}{ccc}
v^2_{m1} & & \\
& \ddots & \\
& & v^2_{mI}\\
\end{array} \right) \ \
\mathbf{W}_g =
\left( \begin{array}{ccc}
v^2_{g1} & & \\
& \ddots & \\
& & v^2_{gI}\\
\end{array} \right)
\end{equation}
\\
where $v_{mi}$ and $v_{gi}$ are positive frequency weights.
The derivation is split into two parts; a magnitude and a group delay
solution. These two will later be combined to create a total
solution. \\
The mean square solution to the magnitude specification is the
impulse response $\mathbf{h}$, which minimises the
cost function, Eq. (\ref{eq:fd5}).
\begin{equation} \label{eq:fd5}
J = \sum_{i=1}^Iv^2_{mi} \vert \mathbf{H}_{di}- \mathbf{H}(\omega _i)
\vert ^2 = (\mathbf{H}_d -
\mathbf{H})^H \mathbf{W}_m(\mathbf{H}_d - \mathbf{H})
\end{equation}
Inserting Eq. (\ref{eq:fd2}) into Eq. (\ref{eq:fd5}), it can be
rewritten as,
\begin{equation} \label{eq:fd7}
J = Re \{ ( \mathbf{\phi}^H\mathbf{h} -
\mathbf{H}_d)^H\mathbf{W}_m( \mathbf{\phi}^H\mathbf{h}
-\mathbf{H}_d)\}.
\end{equation}
Eq. (\ref{eq:fd7}) yields
\begin{equation} \label{eq:fd8}
J=\mathbf{h}^T
\mathbf{R}\mathbf{h}-2\mathbf{h}^{T}\mathbf{P}+\mathbf{H}^H_d
\mathbf{W}_m\mathbf{H}_d,
\end{equation}
where
\begin{equation} \label{eq:fd9}
\mathbf{R}= Re \{ \mathbf{\Phi}\mathbf{W}_m\mathbf{\Phi}^H\}
\textrm{and} \quad \mathbf{P} = Re
\{\mathbf{\Phi}\mathbf{W}_m\mathbf{H}_d\} .
\end{equation}
By inserting Eq. \ref{eq:fd9} into Eq. \ref{eq:fd7} and using
completion of squares that equation,
\begin{equation} \label{eq:fd10}
J=(\mathbf{h}-\mathbf{R}^{-1}\mathbf{P})^T\mathbf{R}(\mathbf{h}-\mathbf{R}^{-1}\mathbf{P})-\mathbf{P}\mathbf{R}^{-1}\mathbf{P}+\mathbf{H}^H_d\mathbf{W}_m\mathbf{H}_d.
\end{equation}
The fact that $\mathbf{R}$ is a positive definite matrix yields the
solution,
\begin{equation} \label{eq:fd10}
\mathbf{h}_*=\mathbf{R}^{-1}\mathbf{P}
\end{equation}
where $\mathbf{h}_*$ is the impulse response that minimized the cost
function J.\\
An approximate expression for the group delay error in the passband
was derived in \cite{ChPa},
\begin{equation} \label{eq:td1}
e_\tau(\omega)\approx \sum_{n=0}^{N-1}(n-T_d)h(n)cos(\omega(n-T_d)),
\end{equation}
where $\tau_d$ is the desired group delay.
Eq. \ref{eq:td1} can be rewritten as
\begin{equation} \label{eq:td2}
e_\tau(\omega)=\mathbf{\psi}^T(\omega)\mathbf{h},
\end{equation}
where
\begin{equation}
\mathbf{\psi}
(\omega) =
\left( \begin{array}{c}
(T_d)\cos(\omega(T_d)) \\
(1-T_d)\cos(\omega(1-T_d)) \\
\vdots \\
(N-1-T_d)\cos(\omega(N-1-T_d))
\end{array} \right)
\end{equation} \\
Let $\omega_k$ be discrete frequency points between $[0,\pi]$ for $ k
\in [1,\ldots ,K]$ . If ${\mbox{ $\psi$}} _k =
{\mbox{$\psi$}}{(\omega
_k)}$ and $ \begin{array}{c}\mathbf{\Psi} = [\mathbf\psi_1
\ldots \mathbf\psi_K] \end{array}$, a cost function that can be used
to minimise the group
delay error can be expressed as follows
\begin{equation} \label{eq:td3}
J_{T_d}=\mathbf{h}^T\mathbf{S}\mathbf{h},
\end{equation}
where
\begin{equation} \label{eq:td4}
\mathbf{S}=\mathbf{\Psi}\mathbf{W}_g\mathbf{\Psi}^T.
\end{equation}
Eq. (\ref{eq:fd8}) and Eq. (\ref{eq:td4}) are combined to form a
common cost function. By optimising this cost
function
$ \mathbf h_{opt}$ is obtained. This solution satisfies both the
magnitude
and the group delay cost functions,
\begin{equation} \label{eq:td5}
J_{tot}=(\mathbf{h}-\mathbf{R}^{-1}\mathbf{P})^T\mathbf{R}(\mathbf{h}-\mathbf{R}^{-1}\mathbf{P})-\mathbf{P}\mathbf{R}^{-1}\mathbf{P}+\mathbf{H}^H_d\mathbf{W}_m\mathbf{H}_d+\mathbf{h}^T\mathbf{S}\mathbf{h}.
\end{equation}
Using completion of squares in Eq. (\ref{eq:td5}) yields,
\begin{equation} \label{eq:td6}
J_{tot}=(\mathbf{h}-(\mathbf{R+S})^{-1}\mathbf{P})^T(\mathbf{R+S})(\mathbf{h}-(\mathbf{R+S})^{-1}\mathbf{P})-\mathbf{P}(\mathbf{R+S})^{-1}\mathbf{P}+\mathbf{H}^H_d\mathbf{W}_m\mathbf{H}_d
\end{equation}
and the optimal solution is given as
\begin{equation} \label{eq:t7}
\mathbf{h}_{opt}=(\mathbf{R+S})^{-1}\mathbf{P}
\end{equation}
\\
Utilizing this solution a filter with a specified group delay can be
obtained. The filter designer
only needs to determine a magnitude response ($\mathbf{H}_d$), a
filter length ($I$) and a group delay
(${T}_d$) and then calculate
$\mathbf{h}_{opt}$ from Eq. (\ref{eq:t7}).
\chapter{Echo Cancellation Employing Speech Signals}\label{sec7}
The echo cancellation problem is an identification problem and as
such it is necessary to have a persistingly exiting \cite{sod} input
signal.
A straight forward approach is to use self
generated noise as the input signal. This will give a very efficient
scheme, but it is desired to be able to identify directly on the
speech signal. The speech signal
is a non-stationary
signal which can be characterized as short time stationary.
The typical length of a
short time frame is 20 ms. The speech signal\cite{del} is a
combination of
unvoiced noise like
sounds(broad band), for example the \/S\/ sound; voiced
quasi-periodic sounds(narrow band),
for example the vowel \/I\/ sound, and long periods of silence.
The acoustic echo canceller must be adaptive. Therefore, an echo
canceller that can adapt effectively to the
speech signal itself must be developed. When the input signal is a
speech signal there will be frequency bands
that are not excited, thus using a
NLMS algorithm directly will give
a slow convergence. Some sort of speech detector needs to be
developed.
\section{Speech Detector}
We have found that a speech detector that works as a simple
signal energy detector in each sub-band with a threshold will work
effectively. This simple detector
controls the adaptive algorithm individually in
each sub-band, see Fig. \ref{fig:AECv2}, and will enhance the
convergence
rate. The weights will only be updated in frequency bands where the
signal energy is above the threshold,
whilst in the others the old weights will
remain. The approach is especially good when the
signal energy is concentrated in one or a few frequency bands or is
of highly varying spectral content such as a speech signal.
When some of the sub-bands contain little signal energy, their
contribution to the over all solution will be
of negative nature because
the NLMS algorithm has too noisy gradient estimate. Noisy gradient
estimates or poor signal excitation in certain
frequency bands are especially crucial in fixed point implementations
\cite{gitl}.
This type of adaptive system will therefore utilize the speech signal
much better than a full band echo
suppressor or a system with a full band speech detector.
The used signal detector is very
rudimentary but effective, it works in each sub-band individually and
takes the absolute value of the $"x"$ signal used in
the NLMS and averages it. The obtained
value is compared to a threshold. If the value is below a threshold
the
adaptation in that sub-band is switched
off, see Fig. \ref{fig:sdlms}. This simple signal detector worked
very well and made it possible to achieve good
suppression results (over 20 dB) on speech signals.
In the start up phase it is advantageous to use
a broad band self generated noise sequence to get a good initial
channel estimate.
\begin{figure}[htb]
\centerline{\includegraphics[width=10cm]{metsdlms}}
\caption{Delayless sub-band acoustic echo canceller with a signal
detector in each sub-band.}
\label{fig:AECv2}
\end{figure}
\begin{figure}[htb]
\centerline{\includegraphics[width=10cm]{sdlms.eps}}
\caption{The SDLMS configuration for sub-band $j$ in the open loop
case}
\label{fig:sdlms}
\end{figure}
\chapter{Simulation Examples}\label{sec4}
In this section is results from the study of the acoustic echo
canceller are presented. In
this study the acoustic signals gathered in a car as well as
computer simulated signals have been employed. The adaptations
have been evaluated by using bandlimited flat noise and speech
signals. The simulations are performed using
a 512 tap full-band filter and a 16 sub-band echo canceller.
The SuPpression Ratio, $SPR$ is defined as:
\begin{equation}
SPR=10\log \left( \frac{\sum\limits_{n=1}^N\left(x[n]\right) ^2}
{\sum\limits_{n=1}^N\left( e[n]\right) ^2}\right) .
\end{equation}
$N=400$ corresponding to 33 ms.
We have assumed no near-end speech.
\section{Fullband v.s Sub-band}
The convergence rate for the DSAF(open loop
type scheme) shows an improvement when compared to the full-band
filter, see Fig.
\ref{fig:adapt}. The filter weight error norm
$||{\bf h}-{\bf w}||$ is used, where
${\bf h}$ is the true impulse response of a known channel and ${\bf
w}$ is the
adaptive FIR filter weights.
In this comparison, noise with a frequency content of between $[80,
5000]$ Hz been used
as input signal, which means that all bands with the exception of the
uppermost sub-band have been excited . The open loop type of
DSAF has a larger distance to the true weights than the full band
scheme.
This is due
to the fact that it is blind to the actual error. The error signals
are created in each sub-band and they will
overlap spectrally between the bands and thus they will not give the
same solution. This can be improved by
using different choices of filter banks. In this paper we have not
made such an evaluation. In a practical situation
the echo suppression will be the same but the convergence rate will
be faster. An other advantage with the DSAF is
that the number of floating point operations are reduced by
approximately 30\% in the studied case.
\section{Closed Loop v.s Open Loop}
In this section suppression results for the open loop DSAF and the
closed loop DSAF are compared. The first
results presented are those which reflect the difference between the
open loop
and closed loop DSAF, see Fig.
\ref{fig:rvk3}. In order to clearly show these differences, a
measured room
acoustic impulse response was used in combination
with self generated noise. This means that the input signal and
output signals are perfectly known and the output
signal is a linearly filtered version of the input signal. We
therefore have a coherence factor of 1 for every
frequency. Ideally the output signal can be totally canceled. The
closed loop algorithm resulted in
a 70 dB suppression and the open loop in a 40 dB suppression. For a
real situation, i.e., signals which have been
gathered in a car before the loudspeaker and after the hands-free
microphone,
there will be less difference between the two methods, see Fig
\ref{fig:rvk5}. In this situation there will be noise in the
microphone and
nonlinearities in the amplifiers and loudspeakers that will limit the
possible
suppression level. This is advantageous to the open loop DSAF. Both
methods achieve a 30 dB echo
suppression. The open loop method will converge faster than the
closed loop method, which will
give slightly better suppression once it has converged.
\section{Closed Loop DSAF Study}
In the sequel, the experiments concentrate on the closed loop DSAF
configuration. This type of scheme has shown
better suppression. The convergence rate is, however, initially a bit
slower than for the open loop. This is due
to the extra delay in the filter bank that is apparent in the error
signal(for the open loop situation the
error is formed locally in each sub-band). In this section some
results from experiments using alternative
prototype filters in the filter bank are presented. Results are also
presented for the modified scheme which
operates directly on the speech signal.
\subsection{The Filter Bank's Importance in the Overall Performance
of
the DSAF Scheme}
The main objectives in the prototype filter design are:
\begin{itemize}
\item a low passband ripple
\item a short group delay
\item a stopband suppression that is sufficient to avoid aliasing
\item a linear phase in passband
\end{itemize}
Fig. \ref{fig:gl1ps} - \ref{fig:gpl2b} show the importance of
choosing a prototype filter with an appropriate group delay. Fig.
\ref{fig:gl1ps} shows the difference in convergence rates for
prototype filters with the same amplitude function,
see Fig. \ref{fig:h3220}, but different group delays (self generated
noise is
filtered through a measured channel). Fig. \ref{fig:gl1bps} shows
the same as in
Fig. \ref{fig:gl1ps}, but for real signals. When a prototype filter
with a reduced
group delay is used, the canceller responds more quickly to sudden
changes
in the channel H. This fact is shown in
Fig. \ref{fig:gpl2ps} -\ref{fig:gpl2b}, both for self generated
signals and real signals.
\\
The importance of having a linear phase in the passband in the
prototype filter
is shown in Fig. \ref{fig:olinsim} - \ref{fig:olinreal}. The
nonlinear prototype filter used
in this comparison between the linear and nonlinear phase is a
minimum phase version of
the ordinary linear prototype filter. A minimum phase filter is
constructed in the following
way:
\begin{enumerate}
\item Design a FIR filter.
\item Determine the zero-plot in the z-domain.
\item {Take those zero-passages ,$Z_k$, that are outside the unit
circle and
mirror them inside by the following
transformation, $Z'_k=\frac{1}{Z_k}.$}
\end{enumerate}
The new nonlinear filter will have the same magnitude response as the
linear FIR-filter, see Fig. \ref{fig:h20olin}, but it will have all
the "zeros" inside the unit circle.
\subsection{Results from employing the DSAF scheme directly on speech
signals}
In order to obtain a good suppression result (20 - 35 dB) using
speech
signals, the echo canceller has been fitted
with a signal detector in each sub-band. Fig. \ref{fig:aec31} shows
that when the signal detector is included, the
filter coefficients converges faster and the resulting echo
suppression will be increased. When the sub-detector is switched off
because of a low energy level, the adaption is frozen and
the last set of established coefficients is used.
%Since, the detector
%switch off the adaptation in the subbands which contain low energy
%the last coefficients are saved.
The next time
there is a signal in that/those sub-bands, a good start value is at
hand, see Fig. \ref{fig:aec33}. This learning feature
makes it possible to use a self-generated training sequence to
create good start estimate of the channel.
Figure \ref{fig:aec32} describes the resulting suppression when the
echo canceller has been trained with a sequence of
white noise, and thus given good values to work from.
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{konvergenst.eps }}
\begin{picture}(15.8,0)(0.5,0)
\put(1.0,4.9){\makebox(0,0)[lb]{$||{\bf h}-{\bf w}||$ [dB]}}
\put(12.6,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{Learning curves for a full-band normalized
LMS(dashed line) and a delayless adaptive acoustic
echo-canceller(solid line).}
\label{fig:adapt}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{rvk3t.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4.9){\makebox(0,0)[lb]{Suppression}}
\put(1.2,4.2){\makebox(0,0)[lb]{[dB]}}
\put(12.4,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{Computer simulations using a measured room acoustic impulse
response resulted in the following suppression results:
the dashed line shows the open loop case and solid line shows the
closed
loop case.}
\label{fig:rvk3}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{rvk5t.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4.9){\makebox(0,0)[lb]{Suppression}}
\put(1.2,4.2){\makebox(0,0)[lb]{[dB]}}
\put(12.4,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{The suppression results in a simulation with "real signals":
the dashed line shows the open loop
case and solid line shows the closed loop case. }
\label{fig:rvk5}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{h642032.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4,9){\makebox(0,0)[lb]{Magnitude [dB]}}
\put(0.6,2,2){\makebox(0,0)[lb]{Phase [Degrees]}}
\put(11.9,-0.2){\makebox(0,0)[lb]{ $\frac{f}{F}$}}
\end{picture}
\caption{The magnitude and phase response for two 64 tap prototype
filters, one with a group delay at 32
samples (solid line) and the other one with a groupdelay at 20
samples (dashed line).}
\label{fig:h3220}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{tdssigt.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4.9){\makebox(0,0)[lb]{Suppression}}
\put(1.2,4.2){\makebox(0,0)[lb]{[dB]}}
\put(12.4,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{Computer simulations with a measured room acoustic impulse
response resulted in the following suppression
results for the closed loop case: the dashed line shows the result
with
a group delay at 20 samples and
solid line shows the result with a group delay at 32 samples. In the
simulations a 64 tap prototype filter was
used.}
\label{fig:gl1ps}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{tdrsigt.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4.9){\makebox(0,0)[lb]{Suppression}}
\put(1.2,4.2){\makebox(0,0)[lb]{[dB]}}
\put(12.4,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{Computer simulations on real signals resulted in the
following suppression
results for the closed loop case: the dashed line shows the result
with
a group delay at 20 samples and
solid line shows the result with a group delay at 32 samples. In the
simulations a 64 tap prototype filter was
used.}
\label{fig:gl1bps}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{dHssigt.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4.9){\makebox(0,0)[lb]{Suppression}}
\put(1.2,4.2){\makebox(0,0)[lb]{[dB]}}
\put(12.4,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{Computer simulations with a measured room acoustic impulse
response when a sudden change occurs in the
channel, resulted in the following suppression results for the closed
loop case: the dashed line shows the result with
a group delay at 20 samples and solid line shows the result with a
group delay at 32 samples.In the simulations a
64 tap prototype filter and a self generated bandlimited flat noise
were used.}
\label{fig:gpl2ps}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{dHrsigt.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4.9){\makebox(0,0)[lb]{Suppression}}
\put(1.2,4.2){\makebox(0,0)[lb]{[dB]}}
\put(12.4,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{Computer simulations with a measured room acoustic impulse
response when a sudden change occurs in the channel,
resulted in the following suppression results for the closed loop
case: the dashed line shows the result with a group
delay at 20 samples and the solid line shows the result with a group
delay at 32 samples. In the simulations a 64 tap
prototype filter and acoustic bandlimited flat noise were used.}
\label{fig:gpl2b}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{fasolt.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,5.0){\makebox(0,0)[lb]{Magnitude [dB]}}
\put(0.6,2.4){\makebox(0,0)[lb]{Phase [Degrees]}}
\put(11.5,0.0){\makebox(0,0)[lb]{ $\omega \ [rad/s]$}}
\end{picture}
\caption{The magnitude and phase response for two 48 tap prototype
filters; one with a nonlinear (min-phase) phase (dashed line)
and the other one with a linear phase (solid line). }
\label{fig:h20olin}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{supOlt.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4.9){\makebox(0,0)[lb]{Suppression}}
\put(1.2,4.2){\makebox(0,0)[lb]{[dB]}}
\put(12.4,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{Computer simulations with a Chebychev filter as a channel
yield the
following suppression results for the closed loop case: the dashed
line
shows the result with a nonlinear phase
prototype filter and the solid line shows the result using a linear
phase
prototype filter. In the simulations a 48 tap
prototype filter and self generated bandlimited flat noise were used.}
\label{fig:olinsim}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{olinrealt.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4.9){\makebox(0,0)[lb]{Suppression}}
\put(1.2,4.2){\makebox(0,0)[lb]{[dB]}}
\put(12.4,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{Computer simulations on real signals resulted in the
following suppression
results for the closed loop case: the dashed line shows the result
with
a nonlinear phase prototype filter, and the solid line
shows the result with a linear phase prototype filter. In the
simulations a 64 tap
prototype filter was used.}
\label{fig:olinreal}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{aec31t.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4.9){\makebox(0,0)[lb]{Suppression}}
\put(1.2,4.2){\makebox(0,0)[lb]{[dB]}}
\put(12.4,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{Computer simulations on real signals resulted in the
following suppression
results for the closed loop case: the dashed line shows the result
when
signal detectors were used, and the
solid line shows the result without signal detectors. }
\label{fig:aec31}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{aec33t.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4.9){\makebox(0,0)[lb]{Suppression}}
\put(1.2,4.2){\makebox(0,0)[lb]{[dB]}}
\put(12.4,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{Computer simulations on real signals resulted in the
following suppression
results for the closed loop case. The result depicts the learning
feature that signal detector introduced. The echo canceller
has much better performance on the second speech sequence than the
first (these are two different sequences) .}
\label{fig:aec33}
\end{figure}
\begin{figure}[htbp]
\centerline{\includegraphics[width=10cm,height=5cm]{aec32t.eps}}
\begin{picture}(15.8,0)(0.5,0)
\put(0.6,4.9){\makebox(0,0)[lb]{Suppression}}
\put(1.2,4.2){\makebox(0,0)[lb]{[dB]}}
\put(12.4,0.0){\makebox(0,0)[lb]{ $t\ [s]$}}
\end{picture}
\caption{Computer simulations on real signals resulted in the
following suppression
results for the closed loop case. In this simulation a calibrated
echo canceller was used, which should be compared with the result
in Fig. \ref{fig:aec31}. Bandlimited flat noise was used to calibrate
the
echo canceller.}
\label{fig:aec32}
\end{figure}
\chapter{Summary and Conclusions}\label{sec8}
In this paper a sub-band adaptive filter scheme has been studied.
It is called the
Delayless Sub-band Adaptive Filter (DSAF) scheme, its main feature
is that the filtering is performed directly on the full band
signal, thereby avoiding delays associated with sub-band filtering
and the adaptive process is separated from the filtering operation.
The scheme has been used as an adaptive echo canceller and showed
good results. The scheme has faster
convergence and demands fewer operations compared to a full band
scheme. The original scheme\cite{MThi95} has been
improved to get faster convergence and also enhanced for adaptation
directly on speech signals. The scheme shows
good suppression results, up to 20-35 dB suppression on a speech
signal. A method for designing filters with very
general specifications has been presented. Filters designed with this
design method have improved the convergence
rate for the acoustic echo canceller.
The biggest improvement has been achieved through using a simple
speech-detector in each sub-band.
\newpage
\begin{thebibliography}{99}
\bibitem{Wst85} B. Widrow, S. D. Stearns\\
{\em Adaptive Signal Processing}\\
Prentice Hall, 1985
\bibitem{Son92} M. M. Sondhi, W. Kellermann\\
''Adaptive Echo Cancellation for Speech Signals''\\
in {\em Advances in Speech Signal Processing}, New York: Marcel
Decker, 1992 , ch 11
\bibitem{Mo95} D. R. Morgan\\
''Slow Asymptotic Convergence of LMS Acoustic Echo Cancelers''\\
{\em IEEE Trans. on Speech and Audio Processing}, vol. 3, no. 2., pp.
126-136, March 1995
\bibitem{Ono94} Y. Ono, H. Kiya\\
''Performance Analysis of Sub-band Adaptive Systems using an
Equivalent Model''\\
{\em IEEE Proc ICASSP'94}(Adelade, Australia), part III, pp. 53-56,
1994
\bibitem{MThi95} D. R. Morgan, J. C. Thi\\
''A Delayless Sub-band Adaptive Filter Architecture''\\
{\em IEEE Trans. on Signal Processing}, vol. 43, no. 8., pp.
1819-1830, Aug 1995
\bibitem{Vai93} P.P. Vaidyanathan\\
{\em Multirate Systems and Filter Banks}\\
Prentice Hall, 1993
\bibitem{ChPa} D. R. Xiangkun Chen, Thomas W. Parks\\
''Design of FIR Filters in the Complex Domain''\\
in {\em IEEE Trans. on Acoustic,Speech and Signal Processing}, vol.
ASSP-35, no. 2, Feb 1987
\bibitem{gray} R.M. Gray\\
''On the Asymptotic Eigenvalue Distribution of Toeplitz Matrices''\\
in {\em IEEE Trans. on Information Theory}, vol. IT-16, p.p. 725-730,
1972
\bibitem{sod} T. Sšderstršm, P. Stoica \\
{\em System Identification}\\
Prentice Hall International, 1989
\bibitem{gitl} R. D. Gitlin, H. C. Meadors, S. B. Weinstein\\
''The tap-leakage algorithm: An algorithm for stable operation of
digitally implemented, fractionally spaced
adaptive equalizer''\\ in {\em bell Syst. Tech. J.}, vol. 61, no. 8,
oct. 1982
\bibitem{del} J. R. Deller, J. G. Proakis, J. H. L. Hansen\\
{\em Discrete-Time Processing of Speech Signal}\\
Macmillan, 1993
\bibitem{icl92} I. Claesson, S. Nordholm, P. Eriksson\\
''Noise Cancelleing Convergence Rates for the LMS Algorithm''\\
in {\em Mechanical Systems and Signal Processing}, vol. 5, p.p.
375-388, 1991
\end{thebibliography}
%\setcounter{totalnumber}{4}
\end{document}