Homeworks academic service


A response paper in the concept of associative

  1. The latter aspect is discussed later; in this introduction, we focus on conditioned reinforcement. For example, a dog that repeatedly hears a click before receiving food will eventually consider the click rewarding in itself, after which it will learn to perform behaviour whose sole outcome is to hear the click [ 31 ].
  2. Our results establish associative learning as a more powerful optimizing mechanism than acknowledged by current opinion.
  3. Over repeated experiences, equation 2.

This article has been cited by other articles in PMC. Abstract Behaving efficiently optimally or near-optimally is central to animals' adaptation to their environment. Much evolutionary biology assumes, implicitly or explicitly, that optimal behavioural strategies are genetically inherited, yet the behaviour of many animals depends crucially on learning. The question of how learning contributes to optimal behaviour is largely open.

Here we propose an associative learning model that can learn optimal behaviour in a wide variety of ecologically relevant circumstances. The model learns through chaining, a term introduced by Skinner to indicate learning of behaviour sequences by linking together shorter sequences or single behaviours.

Our model formalizes the concept of conditioned reinforcement the learning process that underlies chaining and is closely related to optimization algorithms from machine learning.

1. Introduction

Furthermore, the model readily accounts for both instinctual and learned aspects of behaviour, clarifying how genetic evolution and individual learning complement each other, and bridging a long-standing divide between ethology and psychology. We conclude that associative learning, supported by genetic predispositions and including the oft-neglected phenomenon of conditioned reinforcement, may suffice to explain the ontogeny of optimal behaviour in most, if not all, non-human animals.

Our results establish associative learning as a more powerful optimizing mechanism than acknowledged by current opinion. Introduction We often marvel at animals a response paper in the concept of associative efficiently long sequences of behaviour, and theoretical and empirical studies confirm that animals behave optimally or near-optimally under many circumstances [ 1 — 3 ]. Typically, optimal behaviour has been assumed to result from natural selection of genetically determined behaviour strategies [ 4 ], yet in many species behaviour is crucially shaped by individual experiences and learning [ 5 — 7 ].

Existing work has considered how learning can optimize single responses [ 8 — 13 ] or specific sequences of two or three behaviours [ 1415 ]. However, the question of how, and how much, learning contributes to optimal behaviour is still largely open. Here we analyse in general the conditions under which associative learning can optimize sequences of behaviour of arbitrary complexity. Associative learning, however, is also considered mindless, outdated and too limited to learn complex behaviour such as tool use, foraging strategies or any behaviour that requires coordinating actions over a span of time e.

Associative learning, however, has not been evaluated rigorously as a potential route to optimal behaviour [ 2526 ]. Rather, claims about its limitations have rested on intuition rather than formal analysis and proof. In this paper, we develop an associative learning model that can be proved to closely approximate optimal behaviour in many ecologically relevant circumstances.

The model has two key features: The latter aspect is discussed later; in this introduction, we focus on conditioned reinforcement.

The power of associative learning and the ontogeny of optimal behaviour

Conditioned reinforcement also referred to as secondary reinforcement is a learning process whereby initially neutral stimuli that predict primary reinforcers can themselves become reinforcers [ 27 — 30 ]. For example, a dog that repeatedly hears a click before receiving food will eventually consider the click rewarding in itself, after which it will learn to perform behaviour whose sole outcome is to hear the click [ 31 ].

Conditioned reinforcement was a prominent topic in behaviourist psychology [ 272932 — 34 ], but interest in it waned with behaviourism [ 35 ]. As a result, conditioned reinforcement was left out of the mathematical models of the 1970s and 1980s that still form the core of animal learning theory [ 36 — 40 ].

There are two fields, however, that have carried on the legacy of conditioned reinforcement research. The first is animal training, in which methods that rely on conditioned reinforcement are the primary tool to train behaviour sequences see below and [ 31 ]. The second is the field of reinforcement learning, a branch of artificial intelligence that blends ideas from optimization theory and experimental psychology [ 4142 ], and which has also become influential in computational neuroscience e.

A response paper in the concept of associative key element of reinforcement learning algorithms, referred to as learning based on temporal differences, is closely related to conditioned reinforcement [ 45 — 49 ]. A remarkable result of reinforcement learning research is that conditioned reinforcement implements a form of dynamic programming. The latter is an optimization technique used extensively by biologists to find optimal behavioural strategies, and therefore, to assess whether animals behave optimally [ 12 ].

It is not, however, a realistic model of how animals can learn to behave optimally, as it requires perfect knowledge of the environment and extensive computation. Conditioned reinforcement, on the other hand, is computationally simple as well as taxonomically widespread, suggesting that optimal behaviour may be learned rather than inherited [ 47 ]. The conceptual connections that we just summarized have been noted previously e.

Conditioned reinforcement has not been systematically integrated with animal learning theory, nor with knowledge about instinctual behaviour from ethology, nor with the study of optimal behaviour in behavioural ecology.

Our goal is to sketch a first such synthesis. We first consider a standard model of associative learning without conditioned reinforcement. This model can optimize single behaviours but not behaviour sequences. We then add conditioned reinforcement, obtaining our chaining model. Lastly, using ideas from reinforcement learning, we show that chaining can optimize sequences of behaviour in a similar way to dynamic programming.

Our general framework is as follows. We consider an animal that can find itself in a finite albeit arbitrarily large number of environmental states, among which transitions are possible. For example, states may represent spatial locations, and state transitions movement from one location to another.

By choosing its behaviour, the animal can influence transitions from one state to the next. Transitions can be deterministic in each state, each behaviour always leads to the same next state or stochastic in each state, a behaviour may lead to different states, with fixed probabilities. Each state S has a primary reinforcement value, uS, which is genetically determined and serves to guide learning towards behaviour that promotes survival and reproduction.

For example, a state corresponding to the ingestion of food would typically have positive value, while a state representing harm to the body would have a negative value. States that describe neutral conditions, e. The animal's goal is to choose its behaviour to maximize the total value collected.

To begin with, we do not assume any innate knowledge of the environment beyond the ability to recognize a number of biologically relevant situations such as pain and the ingestion of food, which are assumed to have suitable uS values. Hence, the appropriate behaviour must be learned. Learning a single behaviour Consider first the optimization of a single behavioural choice. For example, we may consider a bird that finds a fruit and can choose out of a repertoire of m behaviours peck, fly, sit, preen, etc.

One behaviour peck leads to a food reward tasting the fruit's sweet juice ; all others have no meaningful consequences. We can imagine the animal as attempting to estimate the value of each behaviour, in order to then choose the one with highest value this notion will be made precise below. The meaning of equation 2. Over repeated experiences, equation 2. Thus, the value of choosing B in state S is equated with the primary reinforcement value that can be obtained by such a response paper in the concept of associative choice.

To complete our model, we need to specify how behaviours are chosen. The basic requirement for a viable decision rule is that it should preferentially choose behaviours that have a higher estimated value so that rewards can be collectedwhile at the same time leaving some room for exploring alternative behaviours so that accurate value estimates can be learned.