Definition
A learning method that combines ideas from Monte Carlo and dynamic programming to update estimates based on other estimates.
Detailed Explanation
TD learning updates value estimates based on the difference between temporally successive predictions. It bootstraps from its own estimates and can learn online without waiting for final outcomes. This makes it more efficient than Monte Carlo methods while being more flexible than pure dynamic programming.
Use Cases
Game playing AI, robot learning, predictive maintenance, dynamic pricing systems