9.031 Neural Basis of Learning and Memory: Lecture 9

Rescorla-Wagner reinforcement learning theory

Basic ideas

CS and US must occur at the same time

Delta V = (level of US processing) x (level of CS processing)

Level of US processing -> Reinforcement

Level of CS processing -> Eligibility for change (eg. attention, salience, rehearsal)

Overshadowing

Pairing two CSs A and B (eg. Light and tone) with a US leads to a stronger association between A or B and the US. That is one of the CSs will overshadow the other.

train

AB – US

test

A -> weak CR

B -> strong CR

Note that this is not a simple consequence of differential stimulus associability (see association bias) since either A or B paired with US alone can lead to strong association

A-US Aà strong CR

or

B-US B-> strong CR

As A and B become associated with the US, the US is less unexpected, one of the CSs begins to predict the US and therefore less US processing is available for the other CS.

Blocking

train

A-US

AB-US

test

A->strong CR

B->weak CR

Learning occurs whenever events violate expections, i.e. when actual US level (lambda) differs from expected level (Vbar)

Vbar is the prediction based on total association strength of CSs

If US is present lambda = US value

If US is absent lambda = 0

With training Vbar will approach lambda

Delta V = reinforcement(+,-) x eligibility(+,0)

Delta V is proportional to (lambda – Vbar)

Delta V = beta(lambda – Vbar)

With (lambda – Vbar) as the level of reinforcement and beta a coefficient of learning

With a dependence on the presence of the CS

Each CS defined as Xi

If CS is present Xi = 1

If CS is absent Xi = 0

With alphai a coefficient describing the salience or strength of the CS

Alphai Xi is the eligibility

DeltaVi = beta(lambda – Vbar) alphai Xi

With Vbar = SUM(Vi Xi)

[walkthough of an example of blocking]

train

A - US

B - A

Test

B -> CR

[walkthrough of an example of second order conditioning]

During training with A – US, A comes to predict the US by strengthening Vi for A.

During training with B – A, there is no US so lambda = 0

Since deltaVi for B is beta(lambda – Vbar) alphai Xi

With lambda = 0, deltaVi is at best 0, i.e. no learning should occur for B since there is no reinforcement being given and hence nothing to predict

To overcome this, lambda must be proportional to A, that is

The CS must produce reinforcement as well as a CR

train

………………………_____________________________________

US _____________|………………………………………………….. |___________

……………….____

CS1 _______|…… |_______________________________________________

……………………………………………………………………..____

CS2 _____________________________________________|….. |___________

test

CS1 -> positive association with US (e.g. shock)

CS2 -> negative association with US

Which suggests that

Changes in US level determine reinforcement rather that level itself

Assume that the reinforcement of each CS occurs at both onset and offset with values

+Vi at onset

-Vi at offset

and the US has values

+Vus at onset

-Vus at offset

let Y = sum of association strengths of ALL stimuli (CSs and USs)

and Ydot = Y(t) – Y(t-deltat)

Ydot is therefore the time-dependent reinforcement value

Replace lambda – Vbar in the standard Rescorla-Wagner with Ydot to get the Sutto Barton model

DeltaVi = beta Ydot alphai Xi