More about convolution and correlation

Sebastian Seung

9.29 Lecture 3: February 12, 2002

1  Some odds and ends

Consider a spike train r1,¼,rN. One estimate of the probability of firing is
p = 1
N

å
i 
ri
(1)
This estimate is satisfactory, as long as it makes sense to describe the whole spike train by a single probability that does not vary with time. This is an assumption of statistical stationarity.

More commonly, it's a better model to assume that the probability varies slowly with time (is nonstationary). Then it's better to apply something like Eq. (1) to small segments of the spike train, rather than to the whole spike train. For example, the formula

pi = (ri+1+ri+ri-1)/3
(2)
estimates the probability at time i by counting the number of spikes in three time bins, and then dividing by three. In the first problem set, you were instructed to smooth the spike train like this, but to use a much wider window. In general, choosing the size of window involves a tradeoff. A larger window minimizes the effects of statistical sampling error (like flipping a coin many times to more accurately determine its probability of coming up heads). But a larger window also reduces the ability to follow more rapid changes in the probability as a function of time.

Note that the formula (2) isn't to be trusted near the edges of the signal, as the filter operates on the zeros that surround the signal.

In the last lecture, we defined the unnormalized correlation. There is also a normalized version that looks like

Qxyj = 1
m
m
å
i = 1 
xi yi+j
To compensate for boundary effects, the form
Qxyj = 1
m-|j|
m
å
i = 1 
xi yi+j
is sometimes preferred. Both forms can be obtained through the appropriate options to the xcorr command.

A signal is called white noise if the correlation vanishes, except at lag zero.

2  Using the conv function

We learned last time that if g0,g1,¼,gM-1 and h0,h1¼,hN-1 are given as arguments to the conv function, then the output is f0,f1,¼,fM+N-2, where we denote f = g *h.

Let's generalize this: if gM1,¼,gM2 and hN1,¼,hN2 are given as arguments to the conv function, then the output is fM1+N1,¼,fM2+N2.

For example, suppose that g is a signal, and h represents an acausal filter, with N1 < 0 and N2 > 0. Throwing out the first |N1| and last N2 elements of f leaves us with fM1,¼,fM2, which are at the same times as the signal g.

Note that this prescription for discarding the elements is intended for time aligning the result of the convolution with the input signal, and for producing a result that is the same length.

A different motivation for discarding elements at the beginning and end is that they may be corrupted by edge effects. If you are really worried about this, you may have to discard more than was prescribed above.

3  Impulse response

Consider the signal consisting of a single impulse at time zero,
dj = ì
í
î
1,
j = 0
0,
j ¹ 0
The convolution of this signal with a filter h is
(d*h)i =
å
k 
dj-khk = hj
which is just the filter h again. In other words h, is the response of the filter to an impulse, or the impulse response function. If the impulse is displaced from time 0 to time i, then the result of the convolution is the filter h, displaced by i time steps.

A spike train is just a superposition of impulses at different times. Therefore, convolving a spike train with a filter gives a superposition of filters at different times.

The ``Kronecker delta'' notation dij is equivalent to di-j.

4  Matrix form of convolution

The convolution of g0,g1,g2 and h0,h1,h2 can be written as

g *h = G h
where the matrix G is defined by
G = æ
ç
ç
ç
ç
ç
ç
è
g0
0
0
g1
g0
0
g2
g1
g0
0
g2
g1
0
0
g2
ö
÷
÷
÷
÷
÷
÷
ø
(3)
and g*h and h are treated as column vectors. Each column of G is the same time series, but shifted by a different amount. You can use the MATLAB function convmtx to create matrices like G from time series like g. This function is found in the Signal Processing Toolbox.

If you don't have this toolbox installed, you can make use of the fact that Eq. (3) is a Toeplitz matrix, and can be constructed by giving its first column and first row to the toeplitz command in MATLAB.

5  Convolution as multiplication of polynomials

If the second degree polynomials g0 + g1 z + g2 z2 and h0 +h1 z + h2 z2 are multiplied together, the result is a fourth degree polynomial. Let's call this polynomial f0 + f1 z + f2 z2+ f3 z3 + f4 z4. This is equivalent to f = g*h.

6  Discrete versus continuous time

In the previous lecture, the convolution, correlation, and the Wiener-Hopf equations were defined for data sampled at discrete time points. In the remainder of this lecture, the parallel definitions will be given for continuous time.

Before the advent of the digital computer, the continuous time formulation was more important, because of its convenience for symbolic calculations. But for numerical analysis of experimental data, it is the discrete time formulation that is essential.

7  Convolution

Consider two functions g and h defined on the real line. Their convolution g*h is defined as
(g*h)(t) = ó
õ
¥

-¥ 
dt¢g(t-t¢)h(t¢)
The continuous variables t and t¢ have taken the place of the discrete indices i and j. Again, you should verify commutativity and associativity.

If g and h are only defined on finite intervals, they can be extended to the entire real line using the zero padding trick. For example, if h vanishes outside the interval [0,T], then

(g*h)(t) = ó
õ
T

0 
dt¢g(t-t¢)h(t¢)

8  Firing rate

To define the continuous-time representation of a spike train, we need to make use of a mathematical construct called the Dirac delta function. The delta function is zero everywhere, except at the origin, where it is infinite. You can imagine it as a box of width Dt and height 1/Dt centered around the origin, with the limit Dt ® 0. The delta function is defined by the identity
h(t) = ó
õ
¥

-¥ 
dt¢d(t-t¢) h(t¢)
In other words, when the delta function is convolved with a function, the result is the same function, or h = d*h. A special case of this formula is the normalization condition
1 = ó
õ
¥

-¥ 
dt¢d(t-t¢)
Note that the delta function has dimensions of inverse time.

The delta function represents a single spike at the origin. A spike train with spikes at times ta can be written as a sum of delta functions,

r(t) =
å
a 
d(t-ta)

A standard way of estimating firing rate from a spike train is to convolve it with a response function w

n(t)
=
ó
õ
dt w(t-t¢)r(t¢)
(4)
=
ó
õ
dt w(t-t¢)
å
a 
d(t¢-ta)
(5)
=

å
a 
ó
õ
dt w(t-t¢) d(t¢-ta)
(6)
=

å
a 
w(t-ta)
(7)
So the convolution simply adds up copies of the response function centered around the spike times. Note that it's important to choose a kernel satisfying
ó
õ
dt w(t) = 1
so that
ó
õ
dt n(t) = ó
õ
dt r(t)
Since the Dirac delta function has dimensions of inverse time, smoothing r(t) results in an estimate of firing rate. In contrast, the discrete spike train ri is dimensionless, so smoothing it results in an estimate of probability of firing. You can think of r(t) as the Dt® 0 limit of ri/Dt.

9  Low-pass filter

To see the convolution in action, consider the differential equation
t dx
dt
+ x = h
This is an equation for a low-pass filter with time constant t. Given a signal h, the output of the filter is a signal x that is smoothed over the time scale t. The solution can be written as the convolution x = g*h, where the ``impulse response function'' g is defined as
g(t) = 1
t
e-t/tq(t)
and we have defined the Heaviside step function q(t), which is zero for all negative time and one for all positive time. The response function g is zero for all negative time, jumps to a nonzero value at time zero, and then decays exponentially for positive time.

To construct the function x, the convolution places a copy of the response function g(t-t¢) at every time t¢. Each copy gets weighted by h(t¢), and they are all summed to obtain x(t). The response function is sometimes called the kernel of the convolution.

To see another application of the delta function, note that the impulse response function for the low-pass filter satisfies the differential equation

t dg
dt
+ g = d(t)
In other words, g is the response to driving the low-pass filter with an ``impulse'' d(t), which is why it's called the impulse response.

10  Correlation

The correlation is defined as
Corr
[g,h](t) = ó
õ
¥

-¥ 
dt¢g(t¢)h(t+t¢)
This compares g and h at times separated by the lag t.1 Note that Corr
[g,h](t) = Corr
[h,g](-t), so that the correlation operation is not commutative.

As before, if g and h are only defined on the interval [0,T], they can be extended by defining them to be zero outside the interval. Then the above definition is equivalent to

Corr
[g,h](t) = ó
õ
T

0 
dt¢g(t¢)h(t+t¢)
This is the unnormalized version of the correlation. In the Dayan and Abbott textbook, Qgh(t) = (1/T) Corr
[g,h](t), which is the normalized correlation.

11  The spike-triggered average

Dayan and Abbott define the spike-triggered average of the stimulus as the average value of the stimulus at time t before a spike,
C(t) = 1
n

å
a 
s(ta-t)
where n is the number of spikes. Then in Figure 1.9 they plot C(t) with the positive t axis pointing left. This sign convention may be standard, but it is certainly confusing.

Exactly the same graph would be produced by the alternative convention of taking C(t) to be the average value of the stimulus at time t after a spike, and plotting it with the positive t axis pointing right. Note that in this convention, C(t) would have the same shape as the cross-correlation of r and s,

Corr
[r,s](t)
=
ó
õ
dt r(t)s(t+t)
(8)
=
ó
õ
dt
å
a 
d(t-ta) s(t+t)
(9)
=

å
a 
s(ta+t)
(10)

12  Visual images

So far we've discussed situations where the neural response encodes a single time-varying scalar variable. In the case of visual images, the stimulus is a function of space as well as time. This means that a more complex linear model is necessary for modeling the relationship between stimulus and response. Let the stimulus be denoted by siab, where the indices a and b specify pixel location in the two-dimensional image. Construct xiab = siab-ásabñ by subtracting out the pixel means. Similarly, let yi denote the neural response with the mean subtracted out. Then consider the linear model
yi »
å
jab 
hjab xi-jab
We won't derive the Wiener-Hopf equations for this case, as the indices get messy.

But for white noise the optimal filter is given by the cross correlation

hjab µ
å
i 
xiabyi+j

definition of white noise


Footnotes:

1 The expression above is the definition used in the Dayan and Abbott book, but take note that the opposite convention is used in other books like Numerical Recipes, which call the above integral Corr
[h,g](t).


File translated from TEX by TTH, version 2.34.
On 13 Feb 2002, 00:01.