How Uber Built Temporal (And Why It's the Future of Distributed Systems)

Lately, I've been diving deep into how companies like Uber solved one of the hardest problems in distributed systems: managing long-running, stateful workflows across dozens of microservices. What I discovered isn't just a technical solution—it's a fundamental shift in how we think about reliability.

The problem that Temporal solves isn't new. But the way it solves it? That's what caught my attention.

The Part That Made Sense

What struck me was this: to a user, an Uber ride is simple. Request, match, pickup, trip, complete. Five steps.

But architecturally? It's a 15+ state distributed state machine that coordinates the rider, driver, and backend platform across six distinct event streams. A single trip can span 30 minutes, involve multiple services, and must survive crashes, network failures, and app restarts.

The fundamental challenge? It's not just about communication between services. It's about state. How do you reliably execute a 30-minute business process that involves asynchronous events, timers, and potential failures across a dozen microservices?

The Quagmire of Callbacks

Maxim Fateev, an Uber engineer and co-creator of Temporal, described Uber's pre-existing architecture as a "quagmire of callbacks."

At one point, 80% of requests to Uber's backend API gateway were polling calls. Even after evolving to gRPC-based bi-directional streaming, the state management problem persisted.

The issue wasn't state (databases can store that) or messages (queues can transmit that). As Fateev put it, it's the indivisible combination of "persistence, queue, and timers" that creates the engineering nightmare.

The Impossible Requirements

Uber's dispatch system (DISCO) faces two conflicting demands:

Ultra-Low Latency: The matching component has a strict sub-100ms requirement. This is the "hot path" that must be nearly instantaneous.
Long-Running Durability: The overall ride workflow must survive service crashes, network failures, and worker failures over minutes, hours, or even days.

This creates severe architectural tension. The system must simultaneously support ultra-low-latency operations (the match) and long-running, durable workflows (the entire ride).

The Genesis: From AWS to Microsoft to Uber

What's fascinating? Temporal wasn't born at Uber. It's the fourth iteration of a solution to a problem its founders encountered repeatedly:

Amazon (AWS): While working on AWS Simple Workflow Service (SWF), they saw developers spending significant time building resiliency using "low level primitives" like queues, databases, retry mechanisms, and durable timers.
Microsoft (Azure): Samar Abbas built the Durable Task Framework (DTFx), which evolved into Azure Durable Functions. Same problem: developers still working with low-level primitives and building expensive, complex architectures.
Uber: They built Cadence as an open-source, multi-tenant service. Its success was proven by organic adoption—growing to over 100 use cases within Uber.

This origin story proves that the "quagmire" is the default emergent behavior of microservices at scale. The "build-it-yourself" approach is the industry anti-pattern.

The Core Abstraction: Durable Execution

Durable Execution is Temporal's central concept. It's defined as "crash-proof execution"—not preventing crashes, but rendering them irrelevant.

This paradigm has four key characteristics:

How Uber Built Temporal (And Why It's the Future of Distributed Systems)

The Part That Made Sense

The Quagmire of Callbacks

The Impossible Requirements

The Genesis: From AWS to Microsoft to Uber

The Core Abstraction: Durable Execution

Reply

Keep Reading

The Heuristic Report

How Uber Built Temporal (And Why It's the Future of Distributed Systems)

The Part That Made Sense

The Quagmire of Callbacks

The Impossible Requirements

The Genesis: From AWS to Microsoft to Uber

The Core Abstraction: Durable Execution

Subscribe to keep reading

Reply

Keep Reading

The Heuristic Report