Archives

All the articles I've archived.

2026 ²⁶

August ¹

Data This Week #26

2 Aug, 2026

Amazon MSK delivers Kafka data directly to Iceberg streaming tables, Iceberg v3 introduces native row-level lineage for CDC, Spotify builds a custom indexing layer for online point queries on the data lake, and a Rust terminal database GUI called Rainfrog.

July ⁴

Data This Week #25

26 Jul, 2026

S3 Tables compaction myths debunked, DuckDB's vectorized execution internals, Snowflake's managed Iceberg at scale, NOT EXISTS rewritten as anti-joins, partition affinity over Redis, AWS Glue view automation, and Supabase Pipelines in public alpha.
Data This Week #24

19 Jul, 2026

Databricks bakes native AI and semantic primitives into Spark 4.2, Netflix rebuilds LLM serving with vLLM on Triton, multi-cloud lakehouse architecture on AWS for agentic AI, Kubernetes internals deep dive, and Debezium's log-based CDC architecture.
Data This Week #23

12 Jul, 2026

Apache OSSIE enters ASF incubation to standardize semantic layers, HubSpot scales to 20B vectors with Qdrant, cloud-native financial search with Iceberg and Turbopuffer, Lakekeeper's Generic Table API for multi-format lakehouses, and versioning Power BI with Git.
Data This Week #22

5 Jul, 2026

AWS S3 Annotations for business context, Cloudflare's Town Lake lakehouse and Skipper AI agent, Databricks LTAP rethinking database storage, Iceberg native views in Hive, Snowflake pipeline scaling pitfalls, and CRED's zero-data-loss RDS Blue/Green deployments at scale.

June ⁴

May ⁵

April ⁴

March ⁴

February ⁴