Skip to content
Data this week
Go back

Data This Week #3

Welcome to the third edition of Data This Week!

đź“– Blogs to read

Why your 5-second BigQuery query isn’t cheap

A critical read for FinOps and Data Leads working on BigQuery. This post breaks down why query speed is a misleading metric for cost in BigQuery. It exposes the underlying mechanics of BigQuery’s distributed execution engine, revealing that slots(BigQuery’s basic unit of compute) are the true unit of resource consumption. It explains that On-Demand billing is driven by columnar storage reads—rendering LIMIT clauses ineffective for cost control—while Flat-Rate environments suffer from slot contention where high total_slot_ms queries starve concurrent workloads.

Read more →


A 2026 Introduction to Apache Iceberg

This comprehensive guide by Alex Merced revisits Apache Iceberg’s role as the backbone of the modern data lakehouse. Beyond the basics of ACID transactions and hidden partitioning, it dives into the evolution of the spec—from v1 to v3—and explores critical proposals for Version 4 (2025–2026), including single-file commits and Parquet-based metadata to further reduce I/O overhead.

Read more →


Alternatives to MinIO for single-node local S3

For years, MinIO was the default for local S3 development. With recent licensing and architectural changes, the community is exploring lighter alternatives. Robin evaluates S3Proxy, SeaweedFS, RustFS, Zenko CloudServer, Garage, and Apache Ozone to determine the best post-MinIO options for local dev stacks.

Read more →


Using Amazon SageMaker Unified Studio with Identity Center

This technical deep-dive solves a common enterprise headache: balancing rigorous governance with developer velocity. It details how to bridge Identity Center (IdC) domains—ideal for Pub/Sub data sharing and security—with IAM-based domains that unlock developer-centric features like Serverless Notebooks and Athena Spark. The post outlines using ABAC tagging to automatically propagate permissions across this hybrid setup.

Read more →


🛠️ Tools

Nao: The Open Source Analytics Agent

Nao is a framework for building and deploying analytics agents that allow business users to ask questions in plain English and receive instant insights. It features an open context builder for defining data and metadata, a reliability framework for testing agent performance, and is fully self-hosted which ensures data security.

Check it out on GitHub →


đź’­ Community Sentiments

The 2026 State of Data Engineering Survey

Joe Reis releases the results from over 1,000 data practitioners, and the findings are telling. While 82% use AI daily, organizational maturity lags behind. The survey highlights that the biggest bottlenecks aren’t technical—they’re legacy systems, lack of leadership, and poor requirements.

Why read it? Instead of a static PDF, Joe built an interactive explorer where you can slice the data by role, region, or industry to see where you stand against the industry average.

Explore the data →


đź’Ľ Jobs

Sifflet - Multiple Engineering Roles

Locations: Remote (India), Paris (Remote/Hybrid)

Sifflet is building the “Data Trust” layer (Observability) and is looking for:

  • Backend Engineer - Integration (India)
  • Senior Backend Engineer - Monitoring (India)
  • Staff Engineer
  • Backend Engineer - Monitoring
  • Applied ML/AI Engineer - Monitoring
  • Senior Backend Engineer - Integration

View openings →


That’s all for this week! See you in the next edition.