Archives
All the articles I've archived.
-
Data This Week #15
Spark Declarative Pipelines for financial lakehouses, ten AWS Glue & Iceberg fixes, MOR as an architectural shift, DuckDB's Quack protocol, SQL fraud patterns, Kafka checkpoint patterns, and the LLM-for-validation debate.
-
Data This Week #14
Flink CDC streaming ELT from MySQL to Kafka, the LLM engineer's stack map, Ursa's diskless Kafka fork, Iceberg write mechanics, Instacart's billion-product search, Jikkou 1.0, and the AI knowledge-base debate.
-
Data This Week #13
Spark memory tuning, row-level validation tiers, Postgres RLS pitfalls, Stripe's sharding at 5M QPS, Aurora DSQL vs Postgres, Velero joins CNCF, and SQLGlot 5x faster with mypyc.
-
Data This Week #12
Cold Postgres data to S3 lakehouse, Databricks Lakeflow Designer, vector databases & HNSW indexing, Salesforce migration best practices, SwiftLake for Iceberg, and data observability lessons.
-
Data This Week #11
Iceberg cross-account migrations, DuckLake 1.0 metadata, IaC for data engineers, Redshift Iceberg writes, agent-data patterns, LARQL for LLM graph queries, and Dagster pricing debate.
-
Data This Week #10
Data product lifecycle, semantic context layer for LLM agents, Netflix's Druid interval caching, Ursa Kafka storage engine, Iceberg v3 VARIANT type, and Ministack vs LocalStack.
-
Data This Week #9
DuckLake's 926x Iceberg speedup, Expedia's Trino Gateway for workload routing, Ontul unified SQL engine, PostgreSQL memory myths, and a 6-tier FFLIIP streaming lakehouse deep-dive.
-
Data This Week #8
Pydantic for schema contracts, Databricks Vector Search pitfalls, stateless Kafka broker Tansu, Capital One's GenAI agent, RAG as a DE problem, and testing culture in data teams.
-
Data This Week #7
Netflix's RDS-to-Aurora PostgreSQL migration, DuckDB cost optimization, real-time dashboards with LISTEN/NOTIFY, Airflow on Minikube, and the dbt vs. SQLMesh debate in 2026.
-
Data This Week #6
Xiaomi's unified lakehouse with Doris & Paimon, Top-K in Postgres, dbt run monitoring, PostgreSQL internals, Netflix's DataJunction semantic layer, and schema evolution debates.
-
Data This Week #5
Spark DAG compilation deep dive, query federation with StarRocks, Pinterest's CDC migration, CyberArk AI with Iceberg, Databricks Zerobus Ingest, and data quality tooling debates.
-
Data This Week #4
How OpenAI scales PostgreSQL for ChatGPT, Dropbox's enterprise RAG, 3x faster Spark on Iceberg, dbt with DuckDB, local AWS Lakehouse setups, and new tool Alibaba ZVec.
-
Data This Week #3
BigQuery cost optimization, Apache Iceberg updates, MinIO alternatives, AWS SageMaker governance, and new tools like Nao — curated for data engineers.
-
Data This Week #2
RisingWave HTTP streaming to Iceberg, CedarDB string compression, Alibaba open-sources AliSQL (MySQL + DuckDB), Databricks Lakebase GA, and AI-powered data quality monitoring.
-
Data This Week #1
PostgreSQL dominance in 2025, Arrow-based database connectivity, Uber's petabyte-scale replication, Netflix AI graph search, and new tools OpenEverest and Pandas 3.0.