CLI Interactive Demos
These CLI demos showcase practical data quality workflows that you can use!
- Essential validations for everyday data quality checks
- Data exploration tools that require no Python knowledge
- CI/CD integration patterns for automated data quality
- Complete pipelines from exploration to production validation
To follow along with these demonstrations:
pip install pointblank
pb --help # Verify installation
Getting Started with the CLI
Learn the basics of Pointblank’s CLI and run your first validation:
CLI overview and your first data quality validation
Essential Data Quality Validations
See the most commonly used validation checks that catch critical data issues:
Duplicate detection, null checks, and data extract debugging
Data Exploration Tools
Discover how to profile and explore data using CLI tools that are quick and easy to use:
Preview data, find missing values, and generate column summaries
CI/CD Integration & Automation
Learn how to integrate data quality checks into automated pipelines:
Exit codes, pipeline integration, and automated quality gates
Complete Data Quality Workflow
Follow an end-to-end data quality pipeline combining exploration, validation, and profiling:
Full pipeline: explore → validate → automate
Getting Started
Ready to implement data quality workflows? Here’s how to get started:
1. Install and Verify
pip install pointblank
pb --help
2. Explore Various Data Sources
# Try previewing a built-in dataset
pb preview small_table
# Access local files (even use patterns to combine multiple Parquet files)
pb preview sales_data.csv
pb scan "data/*.parquet"
# Inspect datasets in GitHub repositories (no need to download the data!)
pb preview "https://github.com/user/repo/blob/main/data.csv"
pb missing "https://raw.githubusercontent.com/user/repo/main/sales.parquet"
# Work with DB tables through connection strings
pb info "duckdb:///warehouse/analytics.ddb::customers"
3. Run Essential Validations
# Check for duplicate rows
pb validate small_table --check rows-distinct
# Validate data from multiple sources
pb validate "data/*.parquet" --check col-vals-not-null --column customer_id
pb validate "https://github.com/user/repo/blob/main/sales.csv" --check rows-distinct
# Extract failing data for debugging
pb validate small_table --check col-vals-gt --column a --value 5 --show-extract
4. Integrate with CI/CD
# Use exit codes for automation (0 = pass, 1 = fail)
pb validate small_table --check rows-distinct --exit-code