Definition
The process of detecting and correcting corrupt, inaccurate, or irrelevant data from a dataset.
Detailed Explanation
Data cleaning involves identifying and fixing data quality issues including missing values, outliers, inconsistencies, and duplicates. It employs statistical methods, rule-based systems, and machine learning techniques to standardize and validate data, ensuring it meets quality requirements for analysis.
Use Cases
Training data preparation, Customer database maintenance, Financial record standardization, Research data validation