Definition
An on-policy learning algorithm that updates Q-values based on state-action-reward-state-action transitions.
Detailed Explanation
SARSA (State-Action-Reward-State-Action) is an on-policy temporal difference learning algorithm that learns Q-values. Unlike Q-learning, it uses the actual next action chosen by the current policy rather than the maximum Q-value for the next state, making it more conservative in some situations.
Use Cases
Robot navigation, game AI, process control, resource management
