TAAFT
Free mode
100% free
Freemium
Free Trial
Deals
Create tool

Multi-Armed Bandit Problem

[ˈmʌlti ɑːmd ˈbændɪt ˈprɒbləm]
Machine Learning
Last updated: December 9, 2024

Definition

A classic reinforcement learning problem where an agent must choose between multiple actions with unknown reward distributions.

Detailed Explanation

The multi-armed bandit problem involves selecting from a set of actions (arms) with unknown reward distributions. The goal is to maximize cumulative reward by balancing exploration of unknown arms with exploitation of arms known to give good rewards. It's a simplified reinforcement learning setting with no state transitions.

Use Cases

Online advertising, clinical trials, website optimization, content recommendation

Related Terms