Distributed Systems

How does Google achieve very large scale distributed systems tracing using Dapper


What is Dapper

  • Dapper is Google’s production distributed systems tracing infrastructure.
  • Dapper began as a self-contained tracing tool but evolved into a monitoring platform which has enabled the creation of many different tools, some of which were not anticipated by its designers.

Typical Distributed System

Design goals of Dapper Tracing system

  • Low overhead: the tracing system should have negligible performance impact on running services.
  • Application-level transparency: programmers should not need to be aware of the tracing system.
  • Scalability: it needs to handle the size of large scale of services

Technical Design Details of Dapper

  • Dapper trace is thought of a tree of nested RPCs. Dapper does trace activities such as SMTP sessions in Gmail, HTTP requests from the outside world, and outbound queries to SQL servers. Dapper traces are modeled using trees, spans, and annotations.
  • In a Dapper trace tree, the tree nodes are basic units of work which we refer to as spans. The edges indicate a casual relationship between a span and its parent span.
  • When a thread handles a traced control path, Dapper attaches a trace context to thread-local storage. A trace context is a small and easily copyable container of span attributes such as trace and span ids.
  • When computation is deferred or made asynchronous, most Google developers use a common control flow library to construct callbacks and schedule them in a thread pool or other executor. Dapper ensures that all such callbacks store the trace context of their creator, and this trace context is associated with the appropriate thread when the callback is invoked.
  • Nearly all of Google’s inter-process communication is built around a single RPC framework with bindings in both C++ and Java. We have instrumented that framework to define spans around all RPCs. The span and trace ids are transmitted from client to server for traced RPCs.
  • Dapper trace data is language-independent and many traces in production combine data from processes written in both C++ and Java

Learnings from designing Dapper

  • Dapper is deployed across virtually all of Google’s systems, and has allowed the vast majority of Google’s largest workloads to be traced without need for any application-level modifications, and with no noticeable performance impact.
  • The decision to combine a minimal application transparent tracing functionality with a simple API for programmers to enhance traces has been worthwhile.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?