Case study 01
NYC Yellow Taxi
48.7M records
26.6% anomaly rate. Vendor-specific financial risk isolated — 5.8% of Vendor 2 trips contained negative monetary values, pointing to a structural issue, not random noise. DST-aware timestamp logic prevents false deletion of valid records.
Open on GitHub →
Case study 02
Yelp Business Listings
1M records
Placeholder ratings (~259K records) and invalid geographic values isolated before category benchmarking and location comparisons. Demonstrates how data can look complete while producing systematically wrong conclusions.
Open on GitHub →
Case study 03
Household Power Consumption
2M+ records
Time-series cleaning — datetime reconstruction from split fields, incomplete row identification and removal, and preparation of a validated dataset for reliable hourly, daily, and seasonal usage analysis.
Open on GitHub →
Case study 04 Soon
SQL Data Audit
In review
Relational data quality audit using SQL — validation logic, anomaly detection, and structured output for downstream reporting. Publishing shortly.
GitHub →