# Data Quality

3 tools tagged

showing 3 of 3 tools

Great Expectations

Data quality validation framework for Python

Great Expectations is an open-source Python framework for validating, documenting, and profiling data quality. Teams define expectations as expressive unit tests for their data using an intuitive API, then validate datasets against those rules in CI/CD pipelines or production workflows. It connects to pandas, Spark, and SQL sources, generates data documentation automatically, and integrates with orchestrators like Airflow and Prefect for continuous data quality monitoring.

freemiumOpen Source

Cleanlab

AI-powered data quality for ML datasets

Cleanlab is a data-centric AI library that automatically detects and fixes label errors, outliers, and data quality issues in machine learning datasets. It works with any ML model and any data type including text, images, tabular, and audio by analyzing model predictions to identify mislabeled examples, near-duplicates, and ambiguous data points. Cleanlab helps teams improve model accuracy by cleaning training data rather than tuning model architecture.

freemiumOpen Source

Gretel

Synthetic data generation platform for privacy and ML

Gretel is a synthetic data platform that generates realistic, privacy-preserving datasets for ML training, testing, and data sharing. It supports tabular, text, and time-series data with configurable privacy guarantees including differential privacy. Features include data augmentation for imbalanced datasets, PII detection and anonymization, and API/SDK access for pipeline integration with BigQuery, Snowflake, and Databricks.

freemium