Definition
An algorithm that learns to find the optimal action-selection policy for any given finite Markov Decision Process.
Detailed Explanation
Q-Learning is a model-free reinforcement learning algorithm that learns a quality value (Q-value) for each state-action pair. It updates these values based on the reward received and the maximum future reward possible from the next state, following the Bellman equation. The algorithm converges to the optimal policy without requiring a model of the environment.
Use Cases
Game playing agents, robotic control systems, traffic light control, energy management systems