Bird Song Page
by Andrew Wong
April 1, 2003
Note: clicking on images views them at full resolution.
Introduction:
In this project, I use the data set of the call of a single Zebra Finch over a period of 3 months (Tchernikovski et al., 2001). The bird is tutored and by a model voice during this time. For details on the experimenal setup and background, visit Simon Overduin's site. On this page I will discuss one method of analyzing the evolution of the call as the bird matures.
2. Choosing a syllable of interest
3. Visualization techniques and organizing data
4. Qualitative assessment of syllable B evolution
5. Developing pitch detection algorithm
6. Mean pitch estimation and variance
Unfortunately, it was ambiguous whether or not the recordings in each folder were recordings of the tutor song or the actual subject bird. In order to weed out the files that contained recordings of the tutor, the tutor song used for the experiment was obtained and the correlation of the tutor song and each file was calculated. Files with recordings of the tutor song yielded a very large maximum correlation peak (about 60-80) so the threshold was chosen to be 30.
Careful analysis of the Oct 27 files showed that the files were true recordings of the subject bird and not the tutor song. In fact, the recordings of the tutor song were only present in 8 files dating from Aug. 11 – Aug. 16.
2. Choosing a syllable of interest
Bird calls were chosen from Oct 26 and Aug. 31 and Aug 11, and the spectrograms were analyzed for regions of interest.
As the figure shows, mature calls (oct. 28 and aug. 31) had an extra feature, denoted as syllable “B”, in addition to the region “A” that is common to all three samples. Syllable B was chosen as the focus. Listen to the syllable here clip.
The following research aimed at answering these questions:
a) When and how does region B appear in the bird’s call?
b) How does the sound of the syllable change as the bird matures?
c) Is the variability of the pitch different as the bird matures?
3. Visualization techniques and organizing data
The incredible amount of files required an efficient method of gathering the regions of interest, in this case, syllable “B”. A few essential MATLAB functions were written for this task: bsplay.m, which displayed the spectrogram of the birdsong in optimal frequency/time resolution, bpack.m, a function that allows the user to play back the region of the spectrogram selected in varying speeds, and getclip.m, a parsing function that outputs separate .wav files for each syllable selected. In gathering the data, 5-15 syllable “B’s” were extracted from each day. (See matlab files written).
4. Qualitative assessment of syllable B evolution
From Aug 8 – Aug 11 there was no clear distinction between a syllable A and B, all calls tended to be of type A.
Here is an example:
Hear clip.
In some cases, the call’s length was equivalent to A+B in the tutor song (approx 0.5 sec), in which an estimated region of syllable B was determined.
From Aug 10 - Aug 11, however, the fine structure of A began to develop, and
a trace signal after A (denoted as B’) would appear sporadically, similar
to the structure of B. B’ is separated from A by a greater time interval
than B from A in a mature call.
Here is another example of B'
Later calls in Aug. 11 exhibit this phenomena again, with the gap between A
and B’ slightly less. Still the presence of B’ occurs only sporadically.
The, gap between A and B’ would seem like an interesting quantity to determine,
however the data set is not entirely complete and the next occurance of B’
is in Aug. 20, where B’ immediately proceeds A twice in a row. By Aug.
22, all occurances of A were proceeded by B’, which should be then denoted
as B.
5. Developing pitch detection algorithm
In attempts to characterize syllable “B”, two pitch detection algorithms were written.
Autocorrelation of time signal (AT)
This pitch detector divided up the input signal into small overlapping windows
(width = 600 samples, overlap = 30 samples). For each window, the autocorrelation
was calculated and the second highest peak was identified. The lag difference
between this peak and the center peak determined the pitch frequency (pitch
= sampling rate / lag). Given the pitch estimates of syllable B, (650-750Hz),
the frequency resolution was approx. 20Hz.
Autocorrelation of spectrogram (AF)
An alternative approach was made in attempt to harness the pitch from the spectrogram directly. This would supposedly give a more stable estimate for the pitch. For each time bin in the spectrogram, the autocorrelation of the signal in frequency space was calculated. Unlike the previous detector, the closest peak to the center peak was chosen, since unequal waiting in the frequency domain would easily cause integer multiples of the fundamental frequency to dominate. This procedure resulted in a stable estimate of pitch, but the tradeoff was the frequency resolution (88Hz for a window size of 400 samples, and .01% overlap).
A sample clip of syllable B
Pitch calculated by AT and AF algorithms.
Examples of problems with AT:
in this example, the spectrogram of the signal is at the top. The AT pitch detector fails on the last part of the signal marked in red. This is caused by small fluctuations in the windowed portion of the signal that cause the autocorrelation to have secondary peaks that are greater than the peak of the fundamental frequency. The AF pitch detector does not have this problem, but it is completely flat on this region, due to its 88Hz frequency resolution .
6. Mean pitch estimation and variance
The pitch of the final “wail” in each syllable B was estimated
using the AT algorithm. The mean and variance of this pitch estimate was calculated
for each day. Note: the variance of the pitch within one sample could not be
determined accurately with the algorithms developed; a more accurate pitch detector
is needed to perform such an analysis.
The mean pitch had a dramatic increase towards the target value of 722Hz in the first 20 days. By Sept 8, the mean value began to stabilize around the target frequency. Analogously, the variance made a dramatic decrease, from 1200-100 in about one month. From Sept 14-Oct 28, the mean value was much more stable than from Aug. 8-Sept 14. The rapid increase in pitch accuracy agrees with the findings of Tchernichovski (Tchernikovski et al., 2001).
Zebra Finch Song
Archive
Includes recordings of Father and sons in clutches. Listen what happens when
real birds learn from real birds!
pback.m: plays back the selected region of the spectrogram
bsplay.m: displays spectrogram of birdsong
peakdet.m: finds locations of local maxima for any signal
pitchdetat.m: finds pitch of song using autocorrelation of time signal
pitchdetaf.m: finds pitch of song using autocorrelation of spectrogram (w/ Simon Overduin)
pitchrecon.m: reconstructs a song with the given pitch vector
puretone.m: simple code that outputs one second of a puretone at give frequency, handy for checking pitch estimation results
getclip.m: code that parses a spectrogram and outputs the regions as wav files
Tchernichovski, O.; Mitra, P.P.; Lints, T.; Nottebohm, F. 2001 Dynamics of the vocal imitation process: How a zebra finch learns its song. Science 291: 2564-2569.