D   A   T   A   W   O   K





Creation: December 28 2017
Modified: February 05 2022

Mongo DB

MongoDB has a flexible storage system, which means stored objects are not necessarily required to have the same structure or fields. MongoDB also has some optimization features, which distributes the data collections across, being overall a more balanced and performance focused system.

MongoDB is a schema-free, document-oriented database written in C++. As a document store based, it stores values (referred to as documents) in the form of encoded data.

The choice of encoded format in MongoDB is JSON. This means that even if the data is nested inside JSON documents, it will still be queryable and indexable.


Architecture

Shards (mongod)

Sharding is the partitioning and distributing of data across multiple nodes. A shard is a collection of MongoDB nodes. Using shards also means the ability to make an horizontally scalation across multiple nodes. In the case that there is an application using a single database server, it can be converted to sharded cluster with very few changes to the original application. Software is almost completely decoupled from the public APIs exposed to the client side.

Configuration servers (mongod)

Each one holds a copy of the metadata indicating which shard contains what data.

Routers (mongos)

A group of servers acting as a main interface for one or more clients. A router dipatch the client requests to the appropriate shards using the configuration servers to know who owns the required information.

Global cluster

Read and write actions are sent from the clients to one of the router servers in the cluster, and are automatically routed by that server to the appropriate shards that contain the data with the help of the configuration servers.

A shard in MongoDB has a data replication scheme, which creates a copy set of each shard that holds exactly the same data. There are two types of replication schemes in MongoDB: Master-Slave replication and Replica-Set replication. Replica-Set provides more automation and better failure handling, while Master-Slave requires the administrator intervention more often. Regardless of the replication scheme, at any point in time in a replica set, only one shard acts as the primary shard, all other replica shards are secondary shards. All write and read operations go to the primary shard, and are then distributed evenly (if needed) to the other secondary shards in the set.

In the graphic below, we see the MongoDB architecture explained, showing the router servers in green, the configuration servers in yellow, and the shards that contain the blue MongoDB nodes.

mongo1

It should be noted that sharding (or sharing the data between shards) in MongoDB is completely automatic, which reduces the failure rate and makes it a highly scalable database management system.


Characteristics Overview

Programming Languages Libraries

Systems

Linux, Unix, BSD, Windows, Mac OSX, Android

Licensing

Storage

Document based: BSON (Binary JSON)

Storage engines

https://docs.mongodb.com/manual/core/storage-engines/

Query method and Interface

HTTP verbs mapping:

Some of the common HTTP result codes are often used inside REST APIs:

Security features

Authentication

Authorization

Encryption

Other

High availability

Partitioning and distributing data across multiple nodes (sharding). Data can be duplicated across shards (replica sets). Data is written in a single shard instance and is successivelly distributed in the replica set.

Scalability

Very good horizontal scalability. Shards can be added easily on running systems. Safer, easier and less expensive than vertical scaling (cpu,storage,ram).

https://docs.mongodb.com/manual/sharding/

Transaction model

When a single write operation modifies multiple documents, the modification of each document is atomic, but the operation as a whole is not atomic and other operations may interleave. For cases where a sequence of write operations must operate as if in a single transaction, you can implement a two-phase commit in your application. Using two-phase commit ensures data consistency.

Tools

Cloud

https://www.mongodb.com/cloud/atlas

References

Proudly self-hosted on a cheap Raspberry Pi