Convolution, correlation, and the Wiener-Hopf equations

Sebastian Seung

9.29 Lecture 2: February 7, 2002

In this lecture, we'll learn about two mathematical operations that are commonly used in signal processing, convolution and correlation. The convolution is used to linearly filter a signal, for example to smooth a spike train to estimate probability of firing. The correlation is used to characterize the statistical dependencies between two signals.

When analyzing neural data, the firing rate of a neuron is sometimes modeled as a linear filtering of the stimulus. Alternatively, the stimulus is modeled as a linear filtering of the spike train. To construct such a model, the optimal filter must be determined from the data. This problem was studied by the famous mathematician Norbert Wiener in the 1940s. It requires the solution of the famous Wiener-Hopf equations.

1  Convolution

Let's consider two time series, gi and hi, where the index i runs from -¥ to ¥. The convolution of these two time series is defined as
(g*h)i = ¥
å
j = -¥ 
gi-j hj
(1)
This definition is applicable to time series of infinite length. If g and h are finite, they can be extended to infinite length by adding zeros at both ends. After this trick, called zero padding, the definition in Eq. (1) becomes applicable. For example, the sum in Eq. (1) becomes
(g*h)i = n-1
å
j = 0 
gi-jhj
(2)
for the finite time series h0,¼,hn-1. Another trick for turning a finite time series into an infinite one is to repeat it over and over. This is sometimes called periodic boundary conditions, and will be encountered later in our study of Fourier analysis.

The convolution operation, like ordinary scalar multiplication, is both commutative g*h = h*g and associative f*(g*h) = (f*g)*h. Although g and h are treated symmetrically by the convolution, they generally have very different natures. Typically, one is a signal that goes on indefinitely in time. The other is concentrated near time zero, and is called a filter or convolution kernel. The output of the convolution is also a signal, a filtered version of the input signal.

In Eq. (2), we chose hi to be zero for all negative i. This is called a causal filter, because g*h is affected by h in the present and past, but not in the future. In some contexts, the causality constraint is not important, and one can take h-M,¼,hM to be nonzero, for example.

Formulas are nice and compact, but now let's draw some diagrams to see how this works. Let m and n be the dimensions of g and h respectively. For simplicity, assume zero-offset indexing, so that the first components of g and h are g0 and h0 (not g1 and h1 as in MATLAB). Then (g*h)0 is given by summing g-jhj over j, which can be visualized as

¼
gm-1
¼
g1
g0
0
0
0
0
¼
¼
0
¼
0
h0
h1
¼
hn-1
0
¼
Next, (g*h)1 is found by summing g1-jhj over j, which can be visualized as
¼
0
gm-1
¼
g1
g0
0
0
0
¼
¼
0
¼
0
h0
h1
¼
hn-1
0
¼
The rest of the components of g*h are generated by sliding the g vector to the right. The last nonzero component (g*h)m+n-2 can be visualized as
¼
0
0
¼
gm-1
¼
g1
g0
0
¼
¼
h0
h1
¼
hn-1
¼
0
0
0
¼
Therefore g*h has m+n-1 nonvanishing components, which is why the MATLAB function conv returns an m+n-1 dimensional vector.

2  Probability of firing

The spike train ri is a binary-valued time series. Since linear models are best suited for analog variables, it is helpful to replace ri with a probability pi of firing per time bin. Many methods for doing this can be expressed in the convolutional form
pi =
å
j 
ri-jwj
where w satisfies the constraint åj wj = 1. According to this formula, pi is the weighted average of ri and its neighbors, so that 0 £ pi £ 1.

There are many different ways to choose w, depending on the particulars of the application. For example, w could be chosen to be of length n, with nonzero values equal to 1/n. This is sometimes called a ``boxcar'' filter. MATLAB comes with a lot of other filter shapes. Try typing help bartlett, and you'll find more information about the Bartlett and other types of windows that are good for smoothing. Depending on the context, you might want a causal or a noncausal filter for estimating probability of firing.

Another option is to choose the kernel to be a decaying exponential,

wj = ì
í
î
0,
j < 0
g(1-g)j,
j ³ 0
This is causal, but has infinite duration. As an exercise, you could try proving that this is equivalent to
pi = (1-g) pi-1 + gri

The probability p of firing in a time bin is closely related to frequency n of firing by p = nDt, where Dt is the sampling interval. Probabilistic models of neural activity will be treated more formally in a later lecture.

3  Correlation

The correlation of two time series is
Corr
[g,h]j = ¥
å
i = -¥ 
gi hi+j
The case j = 0 corresponds to the correlation that was defined in the first lecture. The difference here is that g and h are correlated at times separated by the lag j. 1. As with the convolution, this definition can be applied to finite time series by using zero padding. Note that Corr
[g,h]j = Corr
[h,g]-j, so that the correlation operation is not commutative. Typically, the correlation is applied to two signals, while its output is concentrated near zero.

If g and h are n-dimensional vectors, then the MATLAB command xcorr(g,h) returns a 2n-1 dimensional vector, corresponding to the lags j = -n to n. Lags beyond this range are not included, as the correlation vanishes. The zero lag case looks like

¼
0
g1
g2
¼
gn
0
¼
¼
0
h1
h2
¼
hn
0
¼
and the other lags correspond to sliding h right or left. A maximum lag can also be given, xcorr(g,h,maxlag), restrict the range of lags computed to -maxlag to maxlag. The default is the unnormalized correlation given above, but there are other options too.

The autocorrelation is a special case of the correlation, with g = h. If g ¹ h, the correlation is sometimes called the crosscorrelation to distinguish it from the autocorrelation. In the first lecture, we distinguished between correlation and covariance. The covariance was defined as the correlation with the means subtracted out. Similarly, the cross-covariance can be defined as the correlation left between two time series after subtracting out the means. The auto-covariance is a special case. The command xcov can be used for this purpose.

4  Spike-triggered average

Demonstration of these ideas:

5  The Wiener-Hopf equations

Suppose that we'd like to model the time series yi as a filtered version of xi, i.e. find the h that optimizes the approximation
yi »
å
j 
hj xi-j
We assume that both x and y have had their means subtracted out, so that no additive constant is needed in the model. Also, hj is assumed to be zero for j < M1 or j > M2. This constrains how far forward or backward in time the kernel extends. For example, M1 = 0 corresponds to the case of a causal filter.

The best approximation in the least squares sense is obtained by minimizing the squared error

E = 1
2

å
i 
æ
è
yi - M2
å
j = M1 
hj xi-j ö
ø
2
 
relative to hj for j = M1 to M2. This is analogous to the squared error function for linear regression, which we saw in the first lecture.

The minimum is given by the equations, E/hk = 0, for k = M1 to M2. These are the famous Wiener-Hopf equations,

Cxyk = M2
å
j = M1 
hj Cxxk-j        k = M1,¼,M2
(3)
where the shorthand notation
Cxyk =
å
i 
xi yi+k       Cxxl =
å
i 
xi xi+l
has been used for the cross-covariance and auto-covariance. You'll be asked to prove this in the homework. This is a set of M2-M1+1 linear equations in M2-M1+1 unknowns, so it typically has a unique solution. For our purposes, it will be sufficient to solve them using the backslash (\) and toeplitz commands in MATLAB. If you're worried about minimizing computation time, there are more efficient methods, like Levinson-Durbin recursion.

Recall that in simple linear regression, the slope of the optimal line times the variance of x is equal to the covariance of x and y. This is a special case of the Wiener-Hopf equations. In particular, linear regression corresponds to the case M1 = M2 = 0, for which

h0 = Cxy0/Cxx0

6  White noise analysis

If the input x is Gaussian white noise, then the solution of the Wiener-Hopf equation is trivial, because Cxxk-j = Cxx0dkj. Therefore
hk = Cxyk
Cxx0
(4)
So a simple way to model a linear system is to stimulate it with white noise, and correlate the input with the output. This method is called reverse correlation or white noise analysis.

If the input x is not white noise, then you must actually do some work to solve the Wiener-Hopf equations. But if the input x is close to being white noise, you might get away with being lazy. Just choose the filter to be proportional to the xy cross-correlation, hk = Cxyk/g, as in the formula (4). The optimal choice of the normalization factor g is

g =

å
jl 
Cxyj Cxxj-l Cxyl


å
m 
Cxym Cxym
where the summations run from M1 to M2. Note this reduces to g = Cxx0 in the case of white noise, as in Eq. (4).


Footnotes:

1 Warning: This is the convention followed by Dayan and Abbott, and by MATLAB. Some other books, like Numerical Recipes, call the above sum Corr
[h,g]j


File translated from TEX by TTH, version 2.34.
On 13 Feb 2002, 00:00.