Posts
All the articles I've posted.
-
Data This Week #15
Spark Declarative Pipelines for financial lakehouses, ten AWS Glue & Iceberg fixes, MOR as an architectural shift, DuckDB's Quack protocol, SQL fraud patterns, Kafka checkpoint patterns, and the LLM-for-validation debate.
-
Data This Week #14
Flink CDC streaming ELT from MySQL to Kafka, the LLM engineer's stack map, Ursa's diskless Kafka fork, Iceberg write mechanics, Instacart's billion-product search, Jikkou 1.0, and the AI knowledge-base debate.
-
Data This Week #13
Spark memory tuning, row-level validation tiers, Postgres RLS pitfalls, Stripe's sharding at 5M QPS, Aurora DSQL vs Postgres, Velero joins CNCF, and SQLGlot 5x faster with mypyc.
-
Data This Week #12
Cold Postgres data to S3 lakehouse, Databricks Lakeflow Designer, vector databases & HNSW indexing, Salesforce migration best practices, SwiftLake for Iceberg, and data observability lessons.
-
Data This Week #11
Iceberg cross-account migrations, DuckLake 1.0 metadata, IaC for data engineers, Redshift Iceberg writes, agent-data patterns, LARQL for LLM graph queries, and Dagster pricing debate.
-
Data This Week #10
Data product lifecycle, semantic context layer for LLM agents, Netflix's Druid interval caching, Ursa Kafka storage engine, Iceberg v3 VARIANT type, and Ministack vs LocalStack.
-
Data This Week #9
DuckLake's 926x Iceberg speedup, Expedia's Trino Gateway for workload routing, Ontul unified SQL engine, PostgreSQL memory myths, and a 6-tier FFLIIP streaming lakehouse deep-dive.
-
Data This Week #8
Pydantic for schema contracts, Databricks Vector Search pitfalls, stateless Kafka broker Tansu, Capital One's GenAI agent, RAG as a DE problem, and testing culture in data teams.