TAAFT
Free mode
100% free
Freemium
Free Trial
Create tool

Data Leakage

[ˈdeɪtə ˈlikɪdʒ]
Machine Learning
Last updated: December 9, 2024

Definition

The inadvertent inclusion of information during training that would not be available during real-world prediction.

Detailed Explanation

Data leakage occurs when a model is trained using information that wouldn't be available at prediction time leading to overly optimistic performance estimates. Common sources include temporal leakage target leakage and train-test contamination. Preventing leakage requires careful data pipeline design and validation procedures.

Use Cases

Financial forecasting clinical trials analysis predictive maintenance

Related Terms