9.031 Neural Basis of Learning and Memory: Lecture 9
Rescorla-Wagner reinforcement learning theory
Basic ideas
CS and US must occur at the same time
Delta V = (level of US processing) x (level of CS processing)
Level of US processing -> Reinforcement
Level of CS processing -> Eligibility for change (eg. attention, salience, rehearsal)
Overshadowing
Pairing two CSs A and B (eg. Light and tone) with a US leads to a stronger association between A or B and the US. That is one of the CSs will overshadow the other.
train
AB US
test
A -> weak CR
B -> strong CR
Note that this is not a simple consequence of differential stimulus associability (see association bias) since either A or B paired with US alone can lead to strong association
A-US A
or
B-US B-> strong CR
As A and B become associated with the US, the US is less unexpected, one of the CSs begins to predict the US and therefore less US processing is available for the other CS.
Blocking
train
A-US
AB-US
test
A->strong CR
B->weak CR
Learning occurs whenever events violate expections, i.e. when actual US level (lambda) differs from expected level (Vbar)
Vbar is the prediction based on total association strength of CSs
If US is present lambda = US value
If US is absent lambda = 0
With training Vbar will approach lambda
Delta V = reinforcement(+,-) x eligibility(+,0)
Delta V is proportional to (lambda Vbar)
Delta V = beta(lambda Vbar)
With (lambda Vbar) as the level of reinforcement and beta a coefficient of learning
With a dependence on the presence of the CS
Each CS defined as Xi
If CS is present Xi = 1
If CS is absent Xi = 0
With alphai a coefficient describing the salience or strength of the CS
Alphai Xi is the eligibility
DeltaVi = beta(lambda Vbar) alphai Xi
With Vbar = SUM(Vi Xi)
[walkthough of an example of blocking]
train
A - US
B - A
Test
B -> CR
[walkthrough of an example of second order conditioning]
During training with A US, A comes to predict the US by strengthening Vi for A.
During training with B A, there is no US so lambda = 0
Since deltaVi for B is beta(lambda Vbar) alphai Xi
With lambda = 0, deltaVi is at best 0, i.e. no learning should occur for B since there is no reinforcement being given and hence nothing to predict
To overcome this, lambda must be proportional to A, that is
The CS must produce reinforcement as well as a CR
train
_____________________________________
US _____________| .. |___________
.____
CS1 _______| |_______________________________________________
..____
CS2 _____________________________________________| .. |___________
test
CS1 -> positive association with US (e.g. shock)
CS2 -> negative association with US
Which suggests that
Changes in US level determine reinforcement rather that level itself
Assume that the reinforcement of each CS occurs at both onset and offset with values
+Vi at onset
-Vi at offset
and the US has values
+Vus at onset
-Vus at offset
let Y = sum of association strengths of ALL stimuli (CSs and USs)
and Ydot = Y(t) Y(t-deltat)
Ydot is therefore the time-dependent reinforcement value
Replace lambda Vbar in the standard Rescorla-Wagner with Ydot to get the Sutto Barton model
DeltaVi = beta Ydot alphai Xi