Archives
All the articles I've archived.
-
Data This Week #8
Pydantic as a schema contract layer for Python pipelines, hidden complexities in Databricks Vector Search, a stateless Kafka-compatible broker called Tansu, Capital One's GenAI Cost Supervisor agent, RAG as a data engineering problem, and the community's take on testing in DE vs. traditional SWE.
-
Data This Week #7
Netflix's massive RDS-to-Aurora PostgreSQL migration, cutting cloud warehouse costs with DuckDB transpilation, real-time dashboards using PostgreSQL LISTEN/NOTIFY, deploying Airflow on Minikube with Helm, and the community's take on dbt vs. SQLMesh in 2026.
-
Data This Week #6
Deep dive into Xiaomi's unified lakehouse with Apache Doris & Paimon, optimizing Top-K in Postgres, monitoring dbt runs, PostgreSQL internals, Netflix's DataJunction semantic layer, and community debates on schema evolution and the data engineering job market.
-
Data This Week #5
Deep dive into Spark's DAG compilation, query federation with StarRocks, Pinterest's CDC migration, CyberArk's AI-powered support with Iceberg, Databricks Zerobus Ingest, and community debates on data quality tooling.
-
Data This Week #4
Explore how OpenAI scales PostgreSQL for ChatGPT and Dropbox builds enterprise RAG. Plus, 3x faster Spark on Apache Iceberg, dbt with DuckDB, local AWS Lakehouse setups, and new tools like Alibaba ZVec.
-
Data This Week #3
This week's roundup covers BigQuery cost optimization, Apache Iceberg updates, MinIO alternatives, AWS SageMaker governance, and new tools like Nao.
-
Data This Week #2
This week's roundup covers RisingWave's HTTP-based streaming to Iceberg, CedarDB's string compression breakthrough, Alibaba open-sources AliSQL (MYSQL + DuckDB), Databricks Lakebase GA, and AI-powered data quality monitoring.
-
Data This Week #1
This week's roundup covers PostgreSQL's dominance, Arrow-based database connectivity, Uber's petabyte-scale replication, Netflix's AI-powered graph search, and new tools like OpenEverest and Pandas 3.0.