Bird Song Page

by Andrew Wong

April 1, 2003

Note: clicking on images views them at full resolution.

Introduction:

In this project, I use the data set of the call of a single Zebra Finch over a period of 3 months (Tchernikovski et al., 2001). The bird is tutored and by a model voice during this time. For details on the experimenal setup and background, visit Simon Overduin's site. On this page I will discuss one method of analyzing the evolution of the call as the bird matures.

1. Removing the tutored song.

2. Choosing a syllable of interest

3. Visualization techniques and organizing data

4. Qualitative assessment of syllable B evolution

5. Developing pitch detection algorithm

6. Mean pitch estimation and variance

7. Conclusion

8. Links

9. MATLAB code written:

10. References

 

 

1. Removing the tutored song.

Unfortunately, it was ambiguous whether or not the recordings in each folder were recordings of the tutor song or the actual subject bird. In order to weed out the files that contained recordings of the tutor, the tutor song used for the experiment was obtained and the correlation of the tutor song and each file was calculated. Files with recordings of the tutor song yielded a very large maximum correlation peak (about 60-80) so the threshold was chosen to be 30.

 

 

Careful analysis of the Oct 27 files showed that the files were true recordings of the subject bird and not the tutor song. In fact, the recordings of the tutor song were only present in 8 files dating from Aug. 11 – Aug. 16.

2. Choosing a syllable of interest

Bird calls were chosen from Oct 26 and Aug. 31 and Aug 11, and the spectrograms were analyzed for regions of interest.

As the figure shows, mature calls (oct. 28 and aug. 31) had an extra feature, denoted as syllable “B”, in addition to the region “A” that is common to all three samples. Syllable B was chosen as the focus. Listen to the syllable here clip.

The following research aimed at answering these questions:

a) When and how does region B appear in the bird’s call?

b) How does the sound of the syllable change as the bird matures?

c) Is the variability of the pitch different as the bird matures?


3. Visualization techniques and organizing data

The incredible amount of files required an efficient method of gathering the regions of interest, in this case, syllable “B”. A few essential MATLAB functions were written for this task: bsplay.m, which displayed the spectrogram of the birdsong in optimal frequency/time resolution, bpack.m, a function that allows the user to play back the region of the spectrogram selected in varying speeds, and getclip.m, a parsing function that outputs separate .wav files for each syllable selected. In gathering the data, 5-15 syllable “B’s” were extracted from each day. (See matlab files written).

4. Qualitative assessment of syllable B evolution

From Aug 8 – Aug 11 there was no clear distinction between a syllable A and B, all calls tended to be of type A.

Here is an example:

Hear clip.

In some cases, the call’s length was equivalent to A+B in the tutor song (approx 0.5 sec), in which an estimated region of syllable B was determined.


Hear this clip


From Aug 10 - Aug 11, however, the fine structure of A began to develop, and a trace signal after A (denoted as B’) would appear sporadically, similar to the structure of B. B’ is separated from A by a greater time interval than B from A in a mature call.

Here is another example of B'


Hear clip.


Later calls in Aug. 11 exhibit this phenomena again, with the gap between A and B’ slightly less. Still the presence of B’ occurs only sporadically.
The, gap between A and B’ would seem like an interesting quantity to determine, however the data set is not entirely complete and the next occurance of B’ is in Aug. 20, where B’ immediately proceeds A twice in a row. By Aug. 22, all occurances of A were proceeded by B’, which should be then denoted as B.


5. Developing pitch detection algorithm

In attempts to characterize syllable “B”, two pitch detection algorithms were written.

Autocorrelation of time signal (AT)
This pitch detector divided up the input signal into small overlapping windows (width = 600 samples, overlap = 30 samples). For each window, the autocorrelation was calculated and the second highest peak was identified. The lag difference between this peak and the center peak determined the pitch frequency (pitch = sampling rate / lag). Given the pitch estimates of syllable B, (650-750Hz), the frequency resolution was approx. 20Hz.

 

Autocorrelation of spectrogram (AF)

An alternative approach was made in attempt to harness the pitch from the spectrogram directly. This would supposedly give a more stable estimate for the pitch. For each time bin in the spectrogram, the autocorrelation of the signal in frequency space was calculated. Unlike the previous detector, the closest peak to the center peak was chosen, since unequal waiting in the frequency domain would easily cause integer multiples of the fundamental frequency to dominate. This procedure resulted in a stable estimate of pitch, but the tradeoff was the frequency resolution (88Hz for a window size of 400 samples, and .01% overlap).

 

A sample clip of syllable B

Pitch calculated by AT and AF algorithms.

Examples of problems with AT:

in this example, the spectrogram of the signal is at the top. The AT pitch detector fails on the last part of the signal marked in red. This is caused by small fluctuations in the windowed portion of the signal that cause the autocorrelation to have secondary peaks that are greater than the peak of the fundamental frequency. The AF pitch detector does not have this problem, but it is completely flat on this region, due to its 88Hz frequency resolution .

6. Mean pitch estimation and variance

The pitch of the final “wail” in each syllable B was estimated using the AT algorithm. The mean and variance of this pitch estimate was calculated for each day. Note: the variance of the pitch within one sample could not be determined accurately with the algorithms developed; a more accurate pitch detector is needed to perform such an analysis.


7. Conclusion:

The mean pitch had a dramatic increase towards the target value of 722Hz in the first 20 days. By Sept 8, the mean value began to stabilize around the target frequency. Analogously, the variance made a dramatic decrease, from 1200-100 in about one month. From Sept 14-Oct 28, the mean value was much more stable than from Aug. 8-Sept 14. The rapid increase in pitch accuracy agrees with the findings of Tchernichovski (Tchernikovski et al., 2001).


8. Links:

Zebra Finch Song Archive
Includes recordings of Father and sons in clutches. Listen what happens when real birds learn from real birds!


9. MATLAB code written:

pback.m: plays back the selected region of the spectrogram

bsplay.m: displays spectrogram of birdsong

peakdet.m: finds locations of local maxima for any signal

pitchdetat.m: finds pitch of song using autocorrelation of time signal

pitchdetaf.m: finds pitch of song using autocorrelation of spectrogram (w/ Simon Overduin)

pitchrecon.m: reconstructs a song with the given pitch vector

puretone.m: simple code that outputs one second of a puretone at give frequency, handy for checking pitch estimation results

getclip.m: code that parses a spectrogram and outputs the regions as wav files

 

10. References:

Tchernichovski, O.; Mitra, P.P.; Lints, T.; Nottebohm, F. 2001 Dynamics of the vocal imitation process: How a zebra finch learns its song. Science 291: 2564-2569.