MongoDB from fundamentals

Aviral Srivastava
3 min readApr 23, 2020

After working on MongoDB for about six years, I am sharing certain practices that have worked well for me.

Introduction to MongoDB

In relational database designs, the schema is statically defined. In document-based databases such as MongoDB, the schema is dynamic and is based on the document structure. MongoDB is a schemaless database.

Designing MongoDB Document Stores

The structure of documents should be application-driven and the focus of this design should be on the access pattern. Giving priority to access-patterns mostly results in inconsistency of data. However, linking within the designs creates a structure that avoids consistency issues.

A thumb rule that I have encountered from multiple resources is that if you are designing your document store in a relational database manner, you are probably not utilizing the document database.

MongoDB provides support for one-to-one and one-to-many relationships.

JOIN operations adversely affect the performance of MongoDB and so in order to avoid using them, we can leverage BSON (Binary JSON) documents. These documents let you focus on embedded designs by logically organizing information in one single document. Having logically related data in one single document lets you avoid JOINs. You should also avoid JOINs even at the application level. This can be called a tradeoff between performance and consistency and we prioritized performance.

Understanding Relationships

Our objective is to model the different types of relations from the relational model. We shall cover Linking Representation and Embedding Representation.

Linking Relationships

Linking is performed with the intent of using primary and foreign keys. In the snippet below, we can observe that authors is a list of IDs in the Book table. This representation is an example of one to many relationships.

There is an anomaly in the Book table below. Can you spot it?

As with any design, there is a trade-off between performance and consistency which should be potentially mitigated by the application logic.

Book{_id: “10”,name: “Database Systems ”,authors: [1,2,3],language: “ENG”}{           _id: “20”,name: “Time-Constrained Transaction Management”,authors: [4,2,1],language: “english”}Authors{ _id: 1, name: “Avi Silberschatz”}{ _id: 2, name: “Henry F. Korth”}{ _id: 3, name: “S. Sudarshan”}{ _id: 4, name: “N. Soparkar”}

In the above table, we see two tables: Authors and Book. We assume that we would query Book more often than Authors.

Embedding Representation

Let’s organize our data in a different way, from a perspective of Authors, and embed a collection of Books.

In this example, the assumption is that we often query our data by Authors.

Observe possible anomalies within both the names of the book and the language.

Authors{           _id: 1,name: “Avi Silberschatz”,books:[ { name: “Database System Concepts”, language: “english”},{name: “Operating System Concepts”, language: “english”}]}{           _id: 2,name: “Henry F. Korth”,books: [ { name: “Database Systems Concepts”, language: “ENG”},{name: “Time-Constrained Transaction Management”, language: “ENG”}]}{           _id: 3,name: “S. Sudarshan”,books: [{ name: “Database System Concepts”, language: “english”}]}{           _id: 4,Name: “Abraham Silberschatz”,books: [{ name: “Database System Concepts”, language: “German”}]}

Modeling Relationships

As a general guideline use linking in case of true 1:Many relationships and embedding in case of 1:Few relationships.

For eg, College: Students will be 1:Many and so we could use Linking but Tweet: Replies on a normal Twitter account would be 1:Few and so, we could instead just embed the “few”.

Many: Many relationships

_id: “10”,
name: “Database System Concepts”,authors: [1,2,3],language: “english”}Publisher{
_id: “10”
name: “McGraw Hill”}Published_Book{
book_id: “10”,
Publisher_id: “10”year: “2011”}

In the snippet above, we have modeled a many: many relationship in the Published_Book. We have linked the two tables: Book and Authors using _id.

Cardinality decides the relationship design in document databases. In the earlier example, Book: Author could come up as Few: Few. We can model such relations using the Embedding technique but we have to make ourselves aware of the query direction: whether we will be querying books more often or authors.

One: One Relationships

One: One relationship can be modeled the same way as few: few with embedding the preferred option depending on the access pattern, with a linking pattern used for bidirectional access.

Final Takeaway

For performance, model in a way that entities accessed together are a part of the same document (this is the embedding example we explored earlier. If documents are growing in size, then use linking instead of embedding since the maximum size of each document should not be more than 16 MB.