Publication highlights detail

Observational Learning of Exploration-Exploitation Strategies in Bandit Tasks

Ludwig Danwitz, Bettina Von Helversen

Cognition 259 (2025)106124

doi: https://doi.org/10.1016/j.cognition.2025.106124

In decision-making scenarios, individuals often face the challenge of balancing between exploring new options and exploiting known ones—a dynamic known as the exploration-exploitation trade-off. In such situations, people frequently have the opportunity to observe others' actions. Yet little is known about when, how, and from whom individuals use observational learning in the exploration-exploitation dilemma. In two experiments, participants completed multiple nine-armed bandit tasks, either independently or while observing a fictitious agent using either an explorative or equally successful exploitative strategy. To analyze participants' behaviors, we used a reinforcement learning model (simplified Kalman Filter) to extract parameters for both copying and exploration at the individual level. Results showed that participants copied the observed agents' choices by adding a bonus to the individually estimated value of the observed action. While most participants appear to use an unconditional copying approach, a subset of participants adopted a copy-when-uncertain approach, that is copying more when uncertain about the optimal action based on their individually acquired knowledge. Further, participants adjusted their exploration strategies in alignment with those observed. We discuss, in how far this can be understood as a form of emulation. Results on participants' preferences to copy from explorative versus exploitative agents are ambiguous. Contrary to expectations, similarity or dissimilarity between participants' and agents' exploration tendencies had no impact on observational learning. These results shed light on humans' processing of social and non-social information in exploration scenarios and conditions of observational learning.

?  2025, Attribution 4.0 International (CC BY 4.0)

Observational_Learning
Updated by: MAPEX