Distributed Systems, NoSQL, System Design

Why are document based databases so fast in writes

5
(1)

I did cover in one of the articles about use cases of DynamoDB. Let’s have a look at document based database internal implementation at high level.

Document based databases use Memtables as first point of entry while writing a record. Memtable is an in memory store of  key value pairs where value can be a Json document. And the data in Memtable is sorted by keys. Memtable uses Red Black Tree or AVL tree data structures to achieve this. Red Black Tree or AVL trees are balanced binary trees so that any insert, update, delete or find operation is Log N.

When MemTable memory is full, it creates a new MemTable and writes complete data in a persistent space (disk) called SSTable. (Sorted String Table). Again the data written here is in sorted order by keys. Not all but some keys are written in memory with offsets of those keys in SSTable. We can call them sparse sorted keys. Once one SSTable is full, another SSTable is created and data written to it and so on. At a time there can be multiple SSTables.

While reading the data, we try to find the keys in sparse sorted keys memory store. Say we are looking for a key 155, so we find there are two sparse keys in sparse sorted keys store 100 and 200. So we find the offsets of index 100 and index 200 in most recent SSTable. And then try to find key 155 there. If we are unable to find, then we go to next SSTable and so on till we reach MemTable and then return the data. This way the read of data has to go through multiple hops before it is able to find the data. So the reads could be slightly slower than relational databases who use BTree data structures.

Multiple SSTables do get merged using Merge Sort like algorithm with data in latest SSTable getting priority in case of clashes. These operations ensure that memory utilisation is  optimal.

Since while writing the data, the data is directly written in memory store (MemTable) and not on disks immediately, the writes are extremely fast. So if your use case is where your data is primarily primary key based (Hierarchical) and you need extremely fast writes handling millions and billions of transactions, Document based data stores could be the right way to go. MongoDB, Cassandra, DynamoDB,

 

 

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 1

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments