Distributed Systems, System Design



What is BigTable

  • Bigtable is a distributed storage system for managing data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers
  • Bigtable has achieved several goals: wide applicability, scalability, high performance, and high availability.
  • Bigtable resembles a database: it shares many implementation strategies with databases.
  • Bigtable does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format, and allows clients to reason about the locality properties of the data represented in the underlying storage.
  • Data is indexed using row and column names that can be arbitrary strings.

BigTable Data Model

  • A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
  • The row keys in a table are arbitrary strings (currently up to 64KB in size, although 10-100 bytes is a typical size for most of our users).
  • Column keys are grouped into sets called column families, which form the basic unit of access control. All data stored in a column family is usually of the same type (we compress data in the same column family together)
  • Each cell in a Bigtable can contain multiple versions of the same data; these versions are indexed by timestamp. Bigtable timestamps are 64-bit integers.
  • The Bigtable API provides functions for creating and deleting tables and column families. It also provides functions for changing cluster, table, and column family metadata, such as access control rights.

Building blocks of BigTable

  • Bigtable uses the distributed Google File System (GFS) to store log and data files.
  • The Google SSTable file format is used internally to store Bigtable data. An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings.
  • Bigtable relies on a highly-available and persistent distributed lock service called Chubby [8]. A Chubby service consists of five active replicas, one of which is elected to be the master and actively serve requests

Lessons Learnt from designing BigTable

  • Large distributed systems are vulnerable to many types of failures, not just the standard network partitions and fail-stop failures assumed in many distributed protocols.
  • It is important to delay adding new features until it is clear how the new features will be used.
  • It is very important to do proper system-level monitoring (i.e., monitoring both Bigtable itself, as well a the client processes using Bigtable).
  • The most important lesson we learned is the value of simple designs. Given both the size of our system (about 100,000 lines of non-test code in Bigtable), as well as the fact that code evolves over time in unexpected ways, we have found that code and design clarity are of immense help in code maintenance and debugging.

Key things to know about Google BigTable

  • Bigtable clusters have been in production use since April 2005, and we spent roughly seven person-years on design and implementation before that date.
  • BigTable users like the performance and high availability provided by the Bigtable implementation, and that they can scale the capacity of their clusters by simply adding more machines to the system as their resource demands change over time.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments