How Nubank Built an Internal Logging Platform to Process 1 Trillion Logs per Day

Crane Worldwide Logistics Joins Streamfest 2025 — Hear Their Story (Sponsored)

Read the full announcement here

Crane Worldwide at Streamfest

---

Event Overview

Join us on November 5–6 for Redpanda Streamfest, a two-day online event exploring the future of streaming data and real-time AI.

Featured Speaker:

Jared Noynaert, VP of Engineering at Crane Worldwide Logistics, will discuss:

  • Fundamentals of modern data infrastructure.
  • Key topics: isolation, auto-scaling, branching, and serverless models.
  • How Redpanda and the broader Kafka ecosystem fit into next-gen architectures.

Event Highlights:

  • Forward-looking keynotes.
  • Live demos.
  • Real-world case studies.

Sign Up Now

---

> Disclaimer: The technical analysis in this article is based on publicly shared information from the Nubank Engineering Team. All credit goes to them. If you spot inaccuracies, please leave a comment so we can correct it.

---

Why Logging Infrastructure Struggles During Rapid Growth

When companies scale quickly, systems often hit operational limits.

Nubank — one of the largest digital banks in the world — faced exactly that scenario with its logging platform.

Logging Challenges at Nubank

  • Relied on an external vendor for log ingestion and storage.
  • Limited visibility into how logs were collected and stored.
  • Rising costs with unpredictable future spending.
  • Alerting and dashboards tied tightly to the vendor’s ecosystem.
  • Spikes in log ingestion slowed query performance, impacting incident response.

These factors pushed Nubank to build an in-house logging platform to regain control, cut costs, and improve reliability.

---

The Initial Logging Architecture

Initially, every application sent logs directly to the vendor’s API or forwarder.

Old Vendor-Based Architecture

Problems:

  • No filtering or routing → unnecessary low-value logs increased processing costs.
  • Blind spots → limited troubleshooting visibility when vendor processes failed.
  • Rapid cost growth → scaling required paying more.
  • Vendor lock-in → difficult to change ingestion, storage, or querying.

---

Nubank’s Two-Phase Platform Strategy

Instead of an all-at-once rebuild, Nubank opted for a two-phase approach:

  • Observability Stream (Ingestion Pipeline)
  • Focused on collecting, buffering, and processing logs.
  • Enabled filtering, transformations, and metrics collection.
  • Query & Storage Platform
  • Designed for fast searches across petabytes of logs.
  • Optimized for cost-effective, scalable storage.

Guiding Principles:

  • Reliability — handle spikes without failure.
  • Scalability — support bursts and sustained growth.
  • Cost efficiency — cheaper than vendor solutions with full transparency.

This strategy mirrors best practices seen in both engineering and AI content workflows, where generation and publishing are decoupled for scalability.

---

Phase One: Ingestion Pipeline

Ingestion Architecture

Components:

  • Fluent Bit (Open Source) — lightweight, configurable log forwarder.
  • Data Buffer Service (In-House) — micro-batches logs to absorb spikes.
  • Filter & Process Service (In-House) — scalable, extensible layer for data enrichment and health metrics.

Benefits:

  • Decouples ingestion from querying.
  • Handles surges gracefully.
  • Adds system visibility through real-time metrics.

---

Phase Two: Query & Storage Platform

Query Engine — Trino

  • Distributed SQL engine.
  • Supports partitioning → faster queries by scanning only relevant data.
  • Flexible integrations with multiple backends.

Storage Layer — AWS S3

  • High durability and availability.
  • Petabyte-scale capacity.
  • Cost-effective for long-term retention.
Query & Storage Architecture

---

Data Format — Parquet

  • Columnar storage for efficient querying.
  • ~95% compaction rate → lower storage requirements.
  • Excellent scan performance with compression benefits.

Parquet Generator (In-House)

Parquet Conversion Service
  • High-throughput transformation of batches into Parquet.
  • Fully controlled for cost optimization.
  • Scalable and extensible for future needs.

---

Performance & Scale Metrics (Mid-2024)

  • 1 trillion logs/day ingested.
  • 1 petabyte/day processed.
  • 45-day retention → ~45 PB stored.
  • 15,000 queries/day scanning ~150 PB of data.
  • 50% lower costs compared to vendor solution.

---

Key Takeaways

Nubank’s success came from:

  • Decoupling ingestion from querying.
  • Micro-batching for resilience.
  • Using Parquet + AWS S3 for efficient, scalable storage.
  • Leveraging Trino for fast distributed queries.
  • Building in-house services for full operational control.

This approach ensures predictable costs, high scalability, and strong visibility — valuable both in engineering infrastructures and in multi-platform AI content publishing ecosystems.

---

References

---

Sponsorship Opportunity

Reach 1M+ tech professionals — including senior engineers and decision-makers.

Reserve your space today:

Email `sponsorship@bytebytego.com` (slots sell out ~4 weeks in advance).

For broader multi-channel reach, explore tools like AiToEarn — an open-source AI content monetization platform supporting Douyin, Kwai, WeChat, Bilibili, Rednote, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter), integrating generation, publishing, analytics, and monetization.

---

Do you want me to also create a side-by-side “before vs. after” architecture diagram section for Nubank’s logging transformation? This could make the rewrite even more visually and structurally clear.

Read more