Learning Theory
The psychological construct of learning refers to the development of a relatively lasting change in
behaviour as the result of a single or repeated experience.
Non associative learning: These are simple forms of learning demonstrated in lower animals where only
single events are used in learning paradigm - no pairing or ‘operation’ on the environment is required.
- Habituation is a non-associative learning in which repeated stimulation leads to a reduction in
response over time as the organism ‘learns’ the stimulus.
- Sensitization is an increase in response to a stimulus as a function of repeated presentations of
that stimulus. Similar to habituation, repetition of exposure is required to elicit the learning
effect, but the response rates go up, not down (i.e. opposite to the effect seen in habituation).
- Pseudoconditioning (cross-sensitization): The emergence of a response to a previously neutral
stimulus simply as a result of exposures to a different but powerful stimulus.
Associative learning: Here learning occurs through the association of two events.
- Classic conditioning: learning takes place through repeated temporal association of two
events. The learning organism is passive, respondent (i.e. shows an innate, reflexive response
such as salivation) but not instrumental (i.e. does not actively operate on its environment).
- Operant conditioning: learning results from consequences of one’s actions – operations. The
learning organism actively operates (instrumental) on the environment.
- Social learning theory: combines both classic and operant models of learning, and includes
cognitive processes and social interaction to be relevant in human learning.
Classical conditioning is produced by repeatedly pairing a neutral conditioned stimulus (CS e.g. bell)
with an unconditioned stimulus (UCS e.g. food) that naturally evokes an unconditioned response (UCR
e.g. salivation). Eventually the neutral stimulus alone eventually evokes the desired response (salivation –
now called conditioned response, CR). It is a relatively rapid process and depends upon the nature of the
unconditioned stimulus. Pavlov first demonstrated this paradigm in dogs.
The development of the association between the CS and the UCR resulting in a CR is called acquisition.
For animals this takes around 3 and 15 pairings; if sufficient emotional involvement is present acquisition
can occur with even one pairing.
Type of conditioning Pairing procedure
Delayed or forward conditioning. CS (bell) presented before UCS (food); the CS+ UCS pairing
continued till UCR (saliva) appears
Backward conditioning. UCS (food) presented before CS (bell) – not useful in
animals; used in advertising
Simultaneous conditioning. UCS + CS presented together – often the case of learning in
© SPMM Course 3
real life situations.
Trace conditioning. CS presented and removed before UCS presented –
conditioning depends on memory trace.
A delay of less than 0.5ms is proposed to be the optimum for trace conditioning.
Temporal contiguity (time between stimulus and response) is important for conditioning according to
Pavlov. But Rescorla showed that predictability is more important than temporal contiguity in humans i.e.
if one can predict painful tooth extraction on hearing the dentist’s drill, then the noise gets conditioned to
elicit fear response better than two unconnected, unpredictable events having temporal contiguity. Note
that for classical conditioning it is not necessary that the organism understands an association in cognitive
terms but such awareness facilitates the learning.
Higher-order conditioning refers to the use of an already conditioned stimulus CS1 as UCS for the next
level of conditioning and eliciting a CR for another stimulus CS2. In this way second order and
subsequently higher order conditioning are possible. Animals do not respond higher than 4
th order
usually.
Pavlovs’ experiments were conducted using human subjects by Watson & Rayner. Watson produced
‘phobia’ in an infant called Little Albert. By exposing him to loud frightening noise whenever he was
shown a white rat, eventually Albert became fearful of the white rat, even when he heard no loud noise. A
similar fear response was seen when any furry white object was shown to Albert. This ‘spread’ of
associative learning from one stimulus to other is called stimulus generalisation.
Discrimination is a process diametrically opposite to generalization; in many situations associative
learning can be very selective. In such cases, learned responses are made only to specific stimuli and not to
other similar stimuli e.g. a child may be afraid of dogs but not all four-legged animals.
Extinction: reduction/disappearance of a learned response when the UCS – CS pairing (or the reinforcer in
operant conditioning; see below) is not available anymore. Faster extinction may mean weaker learning.
Extinction does not mean loss of learning, but only a suppression of behavioural response. Spontaneous
recovery refers to regaining a previously extinguished learned response after a period of time.
Counter conditioning is a form of classical conditioning where a previously conditioned response is
replaced by a new response that may be more desirable. Utilised in behavioural therapy - systematic
desensitisation, aversion therapy.
Latent inhibition: A delay in learning the association between UCS and CS is seen if previous exposure to
an isolated presentation of CS is present.
An organism learns an appropriate behaviour after many trials because the right behaviour is followed by
appropriate (desirable) consequence. This forms the basis of the concept of operant conditioning; this
© SPMM Course 4
phenomenon is termed the law of effect and is often demonstrated using trial-and-error learning
experiments originally described by Thorndike.
A conditioning that leads to increase in the frequency of behaviour following learning is called
reinforcement. A conditioning that leads to decrease in the frequency of behaviour following learning is
called punishment. Both reinforcement and punishment can be positive (i.e. something is given) or
negative (something is taken away).
Positive Reinforcer Food for pressing a lever (given)
Negative Reinforcer Ceasing of electric shock on pressing a lever (taken away)
Positive Punishment Points on your driving license for speeding (given)
Negative Punishment A monetary fine from a parking ticket (taken away)
Primary Reinforcer Stimulus affecting biological needs (such as food)
Secondary Reinforcer Stimulus reinforcing behaviour associated with primary reinforcers
(money, praise)
Both positive and negative reinforcement increase the desired response.
The use of a “star chart,” with a variable interval schedule so that about 2 or 3 stars are administered per
day depending on the good behaviour, and none for bad behaviour. This part would be positive
reinforcement by giving something additional to increased the desired response
In a patient with OCD, compulsions provide short-term relief of obsessional anxiety via negative
reinforcement. When carrying out compulsive rituals, anxiety is reduced acutely. This provides a
reinforcement to engage in the compulsions repeatedly - the termination of the aversive anxiety cued by
obsessions, increases the compulsive behaviour that removed the anxiety, without addressing the core of
obsessions.
Reinforcement Schedules
A reinforcement schedule refers to how and when behaviour is reinforced on the basis of the number of
responses.
Reinforcement Schedule Explanation/Example
Continuous (aka
contingency
reinforcement)
Reinforcement every time the positive response occurs - e.g. food pellet
every time a rat presses a lever in an experiment
Partial Only some of the positive responses result in positive reinforcement – the
reinforcement is determined by number of responses (ratio) or time
(interval)
Fixed Interval Reward occurs after a specific period of time regardless of number of
responses e.g. a monthly salary irrespective of your level of performance!
Variable Interval Reward occurs after a variable (unpredictable) period of time, regardless
© SPMM Course 5
of the number of responses e.g. an angler catching a fish - the first may be
after 10 minutes, the next after 45, then 5 minutes etc.
Fixed Ratio Reward occurs after a specific number of responses e.g. after completing
20 MCQs, you give yourself a coffee (or chocolate) break.
Variable Ratio Reward occurs after a random number of responses e.g. gambling slot
machines. Your first win of £20 on a gamble may occur after 3 tries; then
the next win may not occur even if you play 30 times, while the third win
may follow in quick succession after the second.
Important points to note:
In fixed schedules, a pause in response is seen after reinforcement as the organism knows the
reinforcement will not be happening for some reasonable time or attempts hereafter. The pause for
fixed interval schedule is greater than the pause for fixed ratio schedule. When we interpret an
operation to be under control (as in fixed schedules) we learn more quickly.
Variable schedules generate a constant rate of response as the chance of obtaining a reward stays
the same at any time and for any instance of behaviour. In general, partial schedules are more
resistant to extinction than continuous schedule though they take longer to learn. Variable ratios
are the most resistant to extinction. This may explain why gambling is such a difficult habit to
eradicate.
Another important determinant of operant conditioning is contingency - learning the probability
of an event.
Premack’s principle (a.k.a. Grandma’s rule): high-frequency behaviour can be used to reinforce lowfrequency behaviour e.g. “eat your greens and you can have dessert”. An existing high-frequency
behaviour (eating dessert) is used to reward low-frequency behaviour (eating greens).
Avoidance learning: an operant conditioning where an organism learns to avoid certain responses or
situations. Avoidance is a powerful reinforcer and often difficult to extinguish. A special form of
avoidance is escape conditioning seen in agoraphobia where places in which panic occurs are avoided /
escaped from leading to a housebound state eventually.
Aversive conditioning: This is an operant conditioning where punishment is used to reduce the
frequency of target behaviour e.g. the use of disulfiram (noxious stimuli) to reduce the frequency of
drinking alcohol.
Covert reinforcement: In covert reinforcement schedules, the reinforcer is an imagined pleasant event
rather than any material pleasure e.g. imagining MRCPsych graduation event to reinforce the behaviour
of practicing MCQs.
Covert sensitization: The reinforcer is the imagination of unpleasant consequences to reduce the
frequency of an undesired behaviour e.g. an alcoholic may be deterred from continuing to spend on
alcohol by imagining his wife leaving him, being unable to support himself and ending up broke and
homeless.
© SPMM Course 6
Flooding: An operant conditioning technique where exposure to feared stimulus takes place for a
substantial amount of time so the accompanying anxiety response fades away while the stimulus is
continuously present e.g. a man with a phobia of heights standing on top of the Burj Khalifa or the Shard.
This will lead to the extinction of fear. When a similar technique is attempted with imagined not actual
exposure then this is called implosion.
Shaping (a.k.a. successive approximation): This is a form of operant conditioning where a desirable
behaviour pattern is learnt by the successive reinforcement of behaviours closer to the desired one. Note
that shaping is used when the target behaviour is yet to appear (i.e. it is novel and does not exist already).
Dog runs towards
a wheel but
doesn’t jump
Runs and makes
a jump close to
the wheel
Runs, jumps
through the
wheel
Runs, jumps
through the
wheel on fire
Circus on show
Gets a bone Gets a bone Gets a bone Gets a bone Behaviour is shaped
Chaining: This refers to reinforcing a series of related behaviours, each of which provides the cue for the
next to obtain a reinforcer. Chaining is used when the target behaviour is already notable in some form
but not in the fully formed sequence. An example is teaching a child to write his name. The shape of
individual alphabets is first taught using reinforcers and forward chaining can be used to link each
alphabet in the correct order, finally reinforcing the completed name. Backward chaining starts at the end
e.g. when making cupcakes, the child is first taught how to sprinkle over a fnished cupcake, the next time
icing the cake and sprinkleing, the next time placing the prepared cake mixture into cupcake wrappers
then icing then sprinkling etc.
Incubation: An emotional response increases in strength if brief but repeated exposure of the stimulus is
present. Rumination of anxiety-provoking stimuli can serve to increase the anxiety via incubation. This is
a powerful mechanism that maintains phobic anxiety and PTSD.
Stimulus preparedness (Seligman) explains why snake and spider phobia are commoner than ‘shoe
phobia’ or ‘watch phobia’. In evolutionary terms, the stimuli that were threatening to hunter-gatherer men
has been hard wired into our system, reflexively eliciting responses immediately – and phobia develops
more readily for such ‘prepared stimuli’.
Learned helplessness (Seligman): initially put forward as a behavioural model for depression. When
confronted with aversive stimuli from which escape is impossible, an animal stops making attempts to
escape. This was shown experimentally with a dog on an electrified floor unable to escape. After a while,
the dog stopped trying, as if accepting its fate. This paradigm is frequently invoked to explain the
dependence seen in victims of domestic abuse.
Reciprocal inhibition (Wolpe): If stimulus with desired response and stimulus with the undesired
response are presented together repeatedly, then the incompatibility leads to a reduction in frequency of
the undesired response. This is evident when your dog barks at your friend; try hugging her in front of
© SPMM Course 7
your dog every time the dog barks and slowly the dog will stop barking at your friend. This is used in
relaxation therapy for anxiety and in systematic desensitisation.
Cueing (a.k.a. prompting): specific cues can be used to elicit specific behaviours – e.g. in a classroom a
teacher puts her finger on her lips to reduce chatter and elicit the response of silence. The process of
unlearning such cue associations is called fading.
Bandura’s social learning theory: Bandura believed that not all learning occurred due to direct
reinforcement, and proposed that people could learn simply by observing the behaviour of others and the
outcomes. According to behaviourists, learning is defined as a relatively permanent change in behaviour
but social learning theorists differentiate actual performance from learning a potential behaviour.
Social learning theorists emphasize the role of cognition in learning; awareness and expectations rather
than the actual experience of reinforcements or punishments are sufficient to have a major effect on the
behaviours that people exhibit.
Cognitive processing during social learning:
1. Attention to observed behaviour is the basic element in learning.
2. Visual image and semantic encoding of observed behaviour memory
3. Memory permanence via retention and rehearsal
4. Motor copying of the behaviour and imitative reproduction
5. Motivation to act.
Reciprocal causation: Bandura proposed that behaviour can influence both the environment and the
individual and each of these three variables, the person, the behaviour, and the environment can have an
influence on each other. The most commonly discussed experiment illustrating Bandura’s theory is the
Bobo Doll experiment. Children watching a
model showing aggression against a bobo doll
learnt to display the aggression without any
reinforcement schedules.
Cognitive learning (Tolman): reinforcement
may be necessary for a performance of learned
response but not necessary for the learning
itself to occur (latent learning). He inferred that
rats can make cognitive maps of mazes – called
place learning - which consists of cognitive expectations as to what comes next.
Insight learning (Kohler) is diametrically opposite to associative learning and views learning as purely
cognitive and not based on S-R mechanism - a sudden idea occurs and the solution is learnt.
Hierarchy of learning: Gagne’s hierarchy of learning (see the attached table) describes that simple or
basic learning steps are prerequisites for later complex learning. This pattern of learning can also be seen
during human development and in the hierarchy of evolution
No comments to display
No comments to display