# data-quality
3 tools tagged
Showing 3 of 3 tools
Great Expectations
Data quality validation framework for Python
Great Expectations is an open-source Python framework for validating, documenting, and profiling data quality. Teams define expectations as expressive unit tests for their data using an intuitive API, then validate datasets against those rules in CI/CD pipelines or production workflows. It connects to pandas, Spark, and SQL sources, generates data documentation automatically, and integrates with orchestrators like Airflow and Prefect for continuous data quality monitoring.
Cleanlab
AI-powered data quality for ML datasets
Cleanlab is a data-centric AI library that automatically detects and fixes label errors, outliers, and data quality issues in machine learning datasets. It works with any ML model and any data type including text, images, tabular, and audio by analyzing model predictions to identify mislabeled examples, near-duplicates, and ambiguous data points. Cleanlab helps teams improve model accuracy by cleaning training data rather than tuning model architecture.
Gretel
Synthetic data generation platform for privacy and ML
Gretel is a synthetic data platform that generates realistic, privacy-preserving datasets for ML training, testing, and data sharing. It supports tabular, text, and time-series data with configurable privacy guarantees including differential privacy. Features include data augmentation for imbalanced datasets, PII detection and anonymization, and API/SDK access for pipeline integration with BigQuery, Snowflake, and Databricks.