Boost Your MongoDB Performance: Indexing, Embedding, and Sharding Techniques

Jacky - Oct 11 '23 - - Dev Community

MongoDB is a popular document-oriented NoSQL database known for its flexibility, scalability, and high performance. However, to achieve optimal performance from MongoDB, you need to follow some key optimization strategies. In this article, we will explore some tips for optimizing MongoDB.

Index appropriately

Indexing allows MongoDB to efficiently execute queries by rapidly searching indexed fields only. Without indexes, MongoDB would have to scan every document of a collection to select matching documents.

Proper indexes are critical for fast query performance. You should create indexes on fields that are frequently queried.

// Create index on `name` field 
db.products.createIndex({name: 1})

// Create compound index on `name` and `price`
db.products.createIndex({name: 1, price: -1})
Enter fullscreen mode Exit fullscreen mode

Use covered queries

Covered queries are queries where the indexes contain all the fields scanned by the query. This allows MongoDB to get all the queried data from the index, without having to look up the documents at all.

// Covered query
db.products.find(
    {price: {$gt: 30}}, 
    {name: 1, price: 1} // fields 
)
Enter fullscreen mode Exit fullscreen mode

Here the index contains both name and price fields, so no document lookups are needed.

Embed related data

MongoDB uses a flexible schema design that allows embedding related data in documents. Embedded related data results in fewer queries and joins.

// Embed 'comments' array in product document
{
   name: "Product 1",
   price: 100,
   comments: [
      {user: "user1", text: "Nice!"},
      {user: "user2", text: "Lovely"}
   ]
}
Enter fullscreen mode Exit fullscreen mode

Now retrieving comments just needs a single query on products collection, avoiding joins.

Use sharding for horizontal scaling

Sharding distributes data across multiple servers called shards. This provides horizontal scalability, as well as improves read/write performance.

// Enable sharding for 'products' collection
sh.enableSharding("mydb")  
sh.shardCollection("mydb.products", {name: "hashed"})
Enter fullscreen mode Exit fullscreen mode

Sharding routes reads/writes intelligently to appropriate shards based on the shard key.

These are some key techniques for optimizing MongoDB performance. Proper indexing, covered queries, embedding related data, and sharding allow realizing the full potential of MongoDB.

Use connection pooling

MongoDB connections can be expensive to establish and tear down repeatedly. Opening a new connection for every database operation can result in significant performance overhead.

Connection pooling helps mitigate this by maintaining a pool of connections that can be reused, rather than opening and closing connections constantly.

In MongoDB, connection pooling is handled by the driver. Here is example code using the official Node.js driver:

// Create MongoClient with connection pooling
const MongoClient = require('mongodb').MongoClient;

const client = new MongoClient(uri, {
  poolSize: 10, // maintain up to 10 connections
  ssl: true,
  auth: {
    user: 'user',
    password: 'pass'
  }
});

// Get a connected client from the pool 
async function run() {
  const db = client.db('mydb');

  // Reuse connections from the pool
  await db.collection('customers').insertOne({name: 'John'});

  await db.collection('orders').find({}).toArray();

  client.close(); 
}

run().catch(console.error);
Enter fullscreen mode Exit fullscreen mode

The key points are:

  • Set a poolSize to control max connections
  • Use client.db() to get DB connections from the pool
  • The driver handles efficiently reusing connections
  • Don’t forget to call client.close() to clean up So in this way, we can reduce connection overhead and improve MongoDB performance using connection pooling.

Use replication for redundancy

Replication provides redundancy and high availability by maintaining multiple copies of data. MongoDB replicates data across replica sets which contain primary and secondary nodes.

Here is an example replica set configuration:

var replSet = new ReplSet([
  new Mongos({
    _id: 0, 
    host: "mongodb1.example.com",
    priority: 2
  }),
  new Mongos({
    _id: 1,
    host: "mongodb2.example.com"  
  }),
  new Mongos({
    _id: 2,
    host: "mongodb3.example.com"
  })
]);

replSet.initiate();
Enter fullscreen mode Exit fullscreen mode

This defines a replica set with:

  • Primary node (priority 2) at mongodb1.example.com
  • Two secondary nodes at mongodb2 and mongodb3
  • Node IDs 0, 1, and 2 To use this in a Node.js app:
// Connect to replica set 
const MongoClient = require('mongodb').MongoClient;
const uri = "mongodb://mongodb1.example.com,mongodb2.example.com,mongodb3.example.com/?replicaSet=replSet";

MongoClient.connect(uri, function(err, client) {

  const db = client.db("mydb");

  // Read/write to primary
  db.collection("customers").find(...); 
  db.collection("orders").insertOne(...);

});
Enter fullscreen mode Exit fullscreen mode

This connects to the replica set and directs reads/writes automatically to the primary node. The driver handles failover if the primary goes down.

Read more: Mongo Indexes you should be known

Happy coding!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .