Async · Remote · Worldwide

darkdatastream

Data quality engineering — audit, cleanup, import preparation.

I find what's wrong in your data before it reaches your dashboard.

What I fix

Common problems I solve

Dashboards nobody trusts

Numbers look off but nobody has time to investigate. I find the root cause and document it.

Import files that break in production

CSV/Excel files that fail on import to Odoo, WooCommerce, HubSpot, or CRM. I clean and validate before they go in.

Vendor data you don't fully trust

External data sources delivering files with structural anomalies. I segment by vendor and surface the patterns.

Data audits that keep getting postponed

Dataset too large for the team to audit manually. I handle scale — 1M to 50M+ records.

Case studies

Case study 01

NYC Yellow Taxi

48.7M records

26.6% anomaly rate. Vendor-specific financial risk isolated — 5.8% of Vendor 2 trips contained negative monetary values, pointing to a structural issue, not random noise. DST-aware timestamp logic prevents false deletion of valid records.

Open on GitHub →
Case study 02

Yelp Business Listings

1M records

Placeholder ratings (~259K records) and invalid geographic values isolated before category benchmarking and location comparisons. Demonstrates how data can look complete while producing systematically wrong conclusions.

Open on GitHub →
Case study 03

Household Power Consumption

2M+ records

Time-series cleaning — datetime reconstruction from split fields, incomplete row identification and removal, and preparation of a validated dataset for reliable hourly, daily, and seasonal usage analysis.

Open on GitHub →
Case study 04 Soon

SQL Data Audit

In review

Relational data quality audit using SQL — validation logic, anomaly detection, and structured output for downstream reporting. Publishing shortly.

GitHub →
Stack

Built for messy data at scale

Python Polars Pandas SQL Parquet Jupyter CSV / Excel
Workflow

No meetings needed

Send a sample file or describe the dataset. I reply within 24 hours with a direct answer on scope and timeline.

Every flagged record stays traceable — nothing disappears without an audit trail. You get a cleaned file + a full change log explaining every decision.

Live project

kodyobd.com.pl

A working public tool — OBD2 fault code reference for Polish-speaking car owners. Built from scratch: 100/100 Lighthouse scores across performance, SEO, accessibility, and best practices. Zero cookies, zero tracking, zero ads.

Not data engineering — but demonstrates what "built properly" looks like end to end.

Open site →
Contact

Send a sample file or describe the problem

data.audit.contact@proton.me

Best fit: data audits, cleanup pipelines, import preparation, anomaly isolation, validation before dashboarding or reporting. Async. No calls required to get a quote.