Hedonistic Synapses Final Project

Andrew Wong

May 23, 2003

 

Introduction:

 

This project attempts to provide a model for synaptic plasticity that is based on the principles of operant conditioning. Operant or Skinnerian conditioning states that a subject can increase it’s probability of an action if it is rewarded immediately after that action. In the case of neurons, the actions are the release of vesicles of neurotransmitters into the synaptic cleft and a reward corresponds to some chemical signal such as dopamine. Vesicle release from a presynaptic neuron is a stochastic process. When an action potential goes through the neuron, there is a probability that the neuron will release vesicles or fail to release. This probability is dependent on the calcium concentration in the synaptic cleft, as the uptake of calcium into the cell and the subsequent binding to a calcium receptor triggers the release of vesicles.

 

In order to model the memory of a presynaptic neuron, a mathematical function called the eligibility trace e(t) is introduced that retains information about the previous successes and failures to release a vesicle. The biological motivation for this function could partially be explained by the differences in cell activity during uptake of calcium that makes every presynaptic cell different from each other. It can be thought that the eligibility trace and the calcium concentration are related. This function is updated as follows:

 

 

when there is a presynaptic spike and decays exponentially with time constant τe. This allows the presynaptic neuron to remember for a period of time the recent activity of vesicle release.

 

 

In this model, release of the vesicle is governed by a probability p for every synapse. In order to make it simpler to update this probability, p was chosen to be updatable through a parameter q in the following relationship:

 

 

This yields a sigmoidal relationship as shown below:

 

 

In order to change q, the following scheme was used:

 

Here  is the learning rate, and h(t) is the reward signal. In all cases in this project, the reward signal was either -1, 1, or 0 signifying punishment, reward, and no reward respectively.

 

Part I: Training a network to respond to visual stimuli

 

The first part consisted of duplicating results from the Seung’s paper, i.e. Training 900 synapses to respond to a horizontal bar rather than a vertical bar. The visual stimuli corresponding to:

 

were produced by dividing each image in to 900 squares corresponding to 900 inputs into visual cells. Dark regions corresponded to 20 Hz Poisson train stimuli and white regions corresponded to no stimuli. The release of synaptic vesicles for each synapse was determined by the release variable r, which was either 1 or 0 with probability p and 1-p. Release of vesicles corresponded to an increase in the synaptic conductance of the postsynaptic cell, which in this case was the same output neuron for all 900 presynaptic cells. The output neuron was modeled with the simple integrate and fire model

with VL=-74 mV, gL=25 nS, and C = 500 pF. All integration was performed with an exponential Euler update.

 

 

 

This shows the response to first 500 ms of the horizontal bar, and then 500 ms of the vertical bar. In the beginning there is no preference to either stimuli.

 

 

 

By the 50th trial there is a large increase in firing rate for the horizontal bar and almost none for the vertical bar.


Simulation of calcium concentration

 

In order to make the variable p respond to calcium concentrations in the synaptic cleft, a second dynamical variable c was added with the following relationship:

 

c was incremented by a value crate for every presynaptic spike and then decayed with a time constant of 50 ms. Thus all values of p increased with each presynaptic spike regardless of the starting value of q.

 

Here 20 input neurons with 50 ms spike trains were connected to a single output neuron. As the stimulus progressed, the calcium concentrations for all synapses increased, facilitating the connection the output neuron and causing the ramping up effect as seen below. Finally the output neuron reached threshold and spiked.

 

Below is the rising calcium level for each synapse:

 

 

Temporal training

 

Because the output neuron responds to the increasing calcium concentration with a time delayed spike, it is possible to train a system to respond to a stimuli after a certain period of time.

Three methods were attempted to produce this effect, each having a different reward system. In all cases, input neurons were given a periodic stimulus train with period 50 ms and time to the first output spike was measured from the onset of stimulus.

 

Method 1

 

The chosen goal for a delay time was 5000 time steps where each time step was 0.1 ms. A reward was given for every output spike produced after this goal time and punishment for every output spike before this goal time. A trial consisted of 10000 time steps and always began with e(t) and c initialized to 0 for all synapses, thus simulating a suitable time interval between trials to let the system return to initial conditions.

 

This method did not produce any noticeable learning over 100 trials. The reason for this is because the

 

Method 2

 

In this method, the chosen goal was again fixed at 5000 time steps, but this time the trial ended after the first output spike occurred. If the output spike occurred after the goal time, a reward was administered, and if before, a punishment. This prevented the system from unlearning behavior in the right direction and also saved computation time.

 

Learning was seen as shown below:

 

 

Because of the probabilistic mechanism behind the time of the first output spike, times still deviated about 5000 time steps after it reached that point.

 

The training was performed again with different calcium increment value Crate

 

As well as different learning rates:


 

 

 

 


 

From the results, by increasing eta slightly, the learning performance would increase, but an increase in Ca increments would throw off the learning for each synapse because the facilitation would be too high to achieve a longer time delay.

 

 

Method 3

 

The final method investigated was by keeping track of the average time of the first spike and rewarding the system whenever it performed better than that time.

 

 

 

 

This method was very poor in reaching the goal of 5000, but it is probably because the average of the first spike was heavily weighted by the initial early spikes and caused the learning to die down even though the system was rewarded each time.

 

Effects of increasing # of input neurons

 

Finally the effects of increasing the number of input neurons was investigated. Because there were more neurons that had to learn to cooperate together, the learning rate slowed as number of input neurons increased.

 

 

 

References:

 

Sueng, Sebastian H. Learning in spiking neural networks by reinforcement of stochastic synaptic transmission [unpublished]

 

Code:

 

Method 1

Method 2

Method 3

Part I demo