Posts

All the articles I've posted.

Data This Week #26

2 Aug, 2026

Amazon MSK delivers Kafka data directly to Iceberg streaming tables, Iceberg v3 introduces native row-level lineage for CDC, Spotify builds a custom indexing layer for online point queries on the data lake, and a Rust terminal database GUI called Rainfrog.
Data This Week #25

26 Jul, 2026

S3 Tables compaction myths debunked, DuckDB's vectorized execution internals, Snowflake's managed Iceberg at scale, NOT EXISTS rewritten as anti-joins, partition affinity over Redis, AWS Glue view automation, and Supabase Pipelines in public alpha.
Data This Week #24

19 Jul, 2026

Databricks bakes native AI and semantic primitives into Spark 4.2, Netflix rebuilds LLM serving with vLLM on Triton, multi-cloud lakehouse architecture on AWS for agentic AI, Kubernetes internals deep dive, and Debezium's log-based CDC architecture.
Data This Week #23

12 Jul, 2026

Apache OSSIE enters ASF incubation to standardize semantic layers, HubSpot scales to 20B vectors with Qdrant, cloud-native financial search with Iceberg and Turbopuffer, Lakekeeper's Generic Table API for multi-format lakehouses, and versioning Power BI with Git.
Data This Week #22

5 Jul, 2026

AWS S3 Annotations for business context, Cloudflare's Town Lake lakehouse and Skipper AI agent, Databricks LTAP rethinking database storage, Iceberg native views in Hive, Snowflake pipeline scaling pitfalls, and CRED's zero-data-loss RDS Blue/Green deployments at scale.
Data This Week #21

28 Jun, 2026

Postgres 19 pg_plan_advice for query plan control, Flink's Hadoop-free native S3 filesystem, incremental model pitfalls in dbt, Trino's summer SQL standard upgrades, PgBouncer's pooling mechanics, Razorpay's CDP architecture, and pgEdge ColdFront for Iceberg tiering.
Data This Week #20

21 Jun, 2026

Kafka's nine-layer architecture breakdown, self-healing pipeline barriers, ClickHouse full-text search on object storage, Google Cloud Next '26 data infrastructure, watsonx.data semantic layer, Neo4j on Databricks, DuckDB internals, and handling messy Excel ETL.
Data This Week #19

14 Jun, 2026

Slack's SSH-to-REST EMR migration, clusterless Iceberg Lakehouse with DuckDB, Iceberg v4 metadata proposals, Spark 4.0 on EMR GA, Apache Gravitino unified catalog, and Databricks Omnigent for AI agent orchestration.