Welcome to the third edition of Data This Week!
đź“– Blogs to read
Why your 5-second BigQuery query isn’t cheap
A critical read for FinOps and Data Leads working on BigQuery. This post breaks down why query speed is a misleading metric for cost in BigQuery. It exposes the underlying mechanics of BigQuery’s distributed execution engine, revealing that slots(BigQuery’s basic unit of compute) are the true unit of resource consumption. It explains that On-Demand billing is driven by columnar storage reads—rendering LIMIT clauses ineffective for cost control—while Flat-Rate environments suffer from slot contention where high total_slot_ms queries starve concurrent workloads.
A 2026 Introduction to Apache Iceberg
This comprehensive guide by Alex Merced revisits Apache Iceberg’s role as the backbone of the modern data lakehouse. Beyond the basics of ACID transactions and hidden partitioning, it dives into the evolution of the spec—from v1 to v3—and explores critical proposals for Version 4 (2025–2026), including single-file commits and Parquet-based metadata to further reduce I/O overhead.
Alternatives to MinIO for single-node local S3
For years, MinIO was the default for local S3 development. With recent licensing and architectural changes, the community is exploring lighter alternatives. Robin evaluates S3Proxy, SeaweedFS, RustFS, Zenko CloudServer, Garage, and Apache Ozone to determine the best post-MinIO options for local dev stacks.
Using Amazon SageMaker Unified Studio with Identity Center
This technical deep-dive solves a common enterprise headache: balancing rigorous governance with developer velocity. It details how to bridge Identity Center (IdC) domains—ideal for Pub/Sub data sharing and security—with IAM-based domains that unlock developer-centric features like Serverless Notebooks and Athena Spark. The post outlines using ABAC tagging to automatically propagate permissions across this hybrid setup.
🛠️ Tools
Nao: The Open Source Analytics Agent
Nao is a framework for building and deploying analytics agents that allow business users to ask questions in plain English and receive instant insights. It features an open context builder for defining data and metadata, a reliability framework for testing agent performance, and is fully self-hosted which ensures data security.
đź’ Community Sentiments
The 2026 State of Data Engineering Survey
Joe Reis releases the results from over 1,000 data practitioners, and the findings are telling. While 82% use AI daily, organizational maturity lags behind. The survey highlights that the biggest bottlenecks aren’t technical—they’re legacy systems, lack of leadership, and poor requirements.
Why read it? Instead of a static PDF, Joe built an interactive explorer where you can slice the data by role, region, or industry to see where you stand against the industry average.
đź’Ľ Jobs
Sifflet - Multiple Engineering Roles
Locations: Remote (India), Paris (Remote/Hybrid)
Sifflet is building the “Data Trust” layer (Observability) and is looking for:
- Backend Engineer - Integration (India)
- Senior Backend Engineer - Monitoring (India)
- Staff Engineer
- Backend Engineer - Monitoring
- Applied ML/AI Engineer - Monitoring
- Senior Backend Engineer - Integration
That’s all for this week! See you in the next edition.