Definition
The practice of tracking and managing different versions of datasets used in ML systems.
Detailed Explanation
Manages changes to training data validation sets and test data over time. Includes tracking of data transformations feature engineering steps and dataset lineage. Enables reproducibility and auditability of model training processes.
Use Cases
Dataset management Training data tracking Feature evolution monitoring Data lineage tracking