Document database data Modeling Techniques

Kinanee Samson - Jun 25 '21 - - Dev Community

Document databases are increasingly becoming popular because of it's simplicity and ease of use, Document databases gives you total control around how you want your data to be structured. This is cool but we all know that too much freedom becomes intoxicating, since we are not limited to any format for modeling our data it bags the big question; how do we accurately model our data when working with document databases? In this article i am going to be talking about five techniques we can employ to model our data when using document oriented databases. Let me point out, these techniques aren't written in stone or you must follow them all the time, they are techniques that i use quite often when modeling data and i think they will work for you too.

Choosing the right data structure

Document databases offer similar data structures like collections, documents and sometimes sub collections, this is where things can start getting messy from. Collections are useful to group documents with quite similar attributes together, think of it as an array of documents. A document is actually a representation of each unit of our data. Use collections to group related documents together, documents that will have some consistency with the keys on them and the data type of the key should be grouped into one collection. We have might have a collection for users, where we store all the user's info and another collection for posts. A user and a post will not have the same structure so it makes sense to split them into different collections. I don't really use sub collections, but you could do that if you expect to have different types of users on your platform. Getting the right data structure for your data is the first step to achieving a good model for your data so you have to get this part spot on.

// IMAGINARY DOCUMENT DATABASE 
// COLLECTION
const usersCollection =[
    // documents are object like
    {
        name: 'John Doe',
        email: 'john.doe@gmail.com',
        phone: '234703635235'
    },
   //...other documents
]
Enter fullscreen mode Exit fullscreen mode

Denormalization

Denormalization is a principle that is applied by default Document databases, it is the opposite of normalization. Normalization involves breaking down of complex data into different tables and linking them by using keys. When working with Document databases, you might wanna avoid this king of approach because they are not built up from the ground to support normalization. How do we achieve denormalization? Simple, we just put everything that needs to be together in one place. Following from our user example, rather create another collection to handle say the users address or social media links, you store the address and the social media links on each user document itself. This way we are denormalizing our data because everything is kept in one place so we don't need to be using keys or looking up another collection for that information.

// IMAGINARY DOCUMENT DATABASE 
// COLLECTION
const usersCollection =[
    {
        name: 'John Doe',
        email: 'john.doe@gmail.com',
        address: {
            city: 'port harcourt',
            zip: 500102,
            state: 'rivers',
            country: 'nigeria'
        }
    },
   //...other documents
]
Enter fullscreen mode Exit fullscreen mode

When we denormalize our data, we use nested entitites which we will talk about next.

Using Nested Entities

A nested entity is simply a form of data structure that holds multiple key-value pairs or multiple items inside one. A good example of a nested entity in an object inside another a document, or an array of objects, or a map. Nested entities help us achieve a good model for our data, when we use nested entities it also helps us to achieve denormalization of our data like we said earlier. So when should you use nested entities? Nested entities should be used when want to group a sections of data that is related or quite complex. Like the example above the address key on the user is a nested entity because itself is an object that is embedded inside each user in our database. We know that an address should belong to a user, thus it makes sense to nest an object that models the users address on each user document. If we wanted to add a list of hobbies that belongs to a user then we might use an array. The cool thing with nested entities is that we can use them in almost any format we see fit, we might have an array of objects or an object with a property that's an array. If you rather use maps then you are also welcome to using maps. Below is a good case of using nested entities, there is no limit to the level of nesting we want to apply as far it makes logical sense and is a good representation of our data.

// IMAGINARY DOCUMENT DATABASE 
// COLLECTION
const usersCollection =[
    {
        name: 'John Doe',
        address: {
            city: 'port harcourt'
            zip: 500102,
            state: 'rivers',
            country: 'nigeria'
        },
        hobbies: ['swiming', 'reading', 'singing'],
        contact: {
            socialMedia: {
                facebook: "link-to-facebook",
                linkedIn: 'link-to-linkedin'
            },
            normal: {
                email: 'john.doe@gmail.com',
                phone: '234703635235'
            }
        }
    },
   //...other documents
]
Enter fullscreen mode Exit fullscreen mode

Referencing

Referencing is another cool technique that we can use to model our data. When working document databases each document inside a collection is usually assigned a specific ID, this Id will help when we want make some read or write operations on a single document. This helps to avoid repeating data within a collection because the id of each document is unique. Referencing involves storing the reference to a document on a key in another document. We have a user in our collection and we know that each user can have one or more friends, we can either use nested entities and store some properties of the one user in an friends array on another user that they are related to. Or better we can just store a reference to that user inside the friends array. Referencing helps to keep our data compact and concise, it won't make any sense storing over 500 user data inside another user.

// IMAGINARY DOCUMENT DATABASE 
// COLLECTION
const usersCollection =[
    {
        id: 12AM8H12HYTRS6F24WBVT,
        name: 'John Doe',
        address: {
            city: 'port harcourt'
            zip: 500102,
            state: 'rivers',
            country: 'nigeria'
        },
        hobbies: ['swiming', 'reading', 'singing'],
        contact: {
            socialMedia: {
                facebook: "link-to-facebook",
                linkedIn: 'link-to-linkedin'
            },
            normal: {
                email: 'john.doe@gmail.com',
                phone: '234703635235'
            }
        },
        friends: ['LK0G678YUOPQZXOTVU', 'WE19BC67UIL0QA17LJH', ...otherFriends],
        following: ['LK0G678YUOPQZXOTVU', 'WE19BC67UIL0QA17LJH', ...otherPeopleUserFollows],
        followers: ['LK0G678YUOPQZXOTVU', 'WE19BC67UIL0QA17LJH', ...otherFollowers],
        posts: ['LK0G678YUOPQZXOTVU', 'WE19BC67UIL0QA17LJH', ...otherPosts]
    },
   //...other documents
]
Enter fullscreen mode Exit fullscreen mode

App Side Joins

One of the characteristics of document databases is the lack of joins we get with SQL databases, there is often no built in feature like that on document databases. You might wonder why? This because performing joins can be quite an expensive computation the only way to combat this is to perform the joins on the frontend. We can loop through the array that contains the references and query the database for each document with an Id that matches each reference in the array.

// IMAGINARY FORNTEND
// get the user somehow
const friends = user.friends.map(id => getEachDocumentSomehow(id))
const posts = user.posts.map(id => getEachPostSomehow(id))
Enter fullscreen mode Exit fullscreen mode

One of the drawbacks of making app side joins is the too many request we have to make to the server to fetch each list we need, we can negate this by making the joins on the server instead by using graphQL, graphQL can handle situations like this effortlessly and in a scalable manner, you can scan through this article to get a basic intro to graphQL.

That's it for this article, i hope this helps you model your data appropriately when using document databases, talking about Document databases, you should try using faunadb as your database solution for your next project. I hope you found this useful

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .