Definition
A class of reinforcement learning methods that directly optimize the policy by following the gradient of expected reward.
Detailed Explanation
Policy gradient methods work by computing an estimator of the gradient of the expected reward with respect to the policy parameters. Unlike value-based methods, they directly parametrize the policy and update it using gradient ascent. This approach can handle continuous action spaces and naturally outputs action probabilities.
Use Cases
Robot motion control, game AI agents, autonomous vehicle control, financial trading systems