MongoDB Bulk Update , Insert and Upsert Using Java


MongoDB Bulk Update

There are use cases when you want to do send the bulk update command to MongoDB to eliminate the to-and-fro between client and database.  

The new Bulk Update API for MongoDB is opening the way for performance improvements when using the latest MongoDB Shell or drivers as it allows the database to consume commands much faster.


I updated my MongoDB java driver to version  2.12.2 to support the bulk operations , as the MongoDB java driver supports backward compatibility , the upgrade was as easy as replacing the jar file.
MongoDB support both ordered and unordered bulk write operations.
Ordered bulk operations are executed in an order (thus the name), halting when there’s an error.
Unordered bulk operations are executed in no particular order (potentially in parallel) and these operations do not stop when an error occurs.

My team was trying to find an example of doing bulk upsert but they were not able to find one . After going through the API documentation  and some trails and errors , here are the examples of the Bulk Operations with MongoDB using Java.

Bulk Insert
//necessary imports
import com.mongodb.BasicDBObject;
import com.mongodb.BulkWriteResult;
import com.mongodb.BulkWriteOperation;
import com.mongodb.DBObject;
import com.mongodb.BulkWriteException;


// Sample code
com.mongodb.DBCollection collection = db.getCollection("mycol");

// Get  BulkWriteOperation by accessing the mongodb com.mongodb.DBCollection class on mycol //Collection

BulkWriteOperation  bulkWriteOperation= collection.initializeUnorderedBulkOperation();

//perform the insert operation in the loop to add objects for bulk execution
for (int i=0;i<100;i++)
{
bulkWriteOperation.insert(new BasicDBObject("_id",Integer.valueOf(i)));
}

// execute bulk operation on mycol collection
BulkWriteResult result=bulkWriteOperation.execute();

The most useful bulk operation is Bulk upsert, In the upsert if document is present it will update that document or insert the new one.

 Bulk update with Upsert

//necessary imports
import com.mongodb.BasicDBObject;
import com.mongodb.BulkWriteResult;
import com.mongodb.BulkWriteOperation;
import com.mongodb.DBObject;
import com.mongodb.BulkUpdateRequestBuilder;
import com.mongodb.BulkWriteException;

// Sample code
com.mongodb.DBCollection collection = db.getCollection("mycol");

// Get  BulkWriteOperation by accessing the mongodb com.mongodb.DBCollection class on mycol //Collection

BulkWriteOperation  bulkWriteOperation= collection.initializeUnorderedBulkOperation();

//perform the upsert  operation in the loop to add objects for bulk execution
for (int i=0;i<100;i++)
{
// get a bulkWriteRequestBuilder by issuing find on the mycol with _id
BulkWriteRequestBuilder bulkWriteRequestBuilder=bulkWriteOperation.find(new BasicDBObject('_id',Integer.valueOf(i)));

// get hold of upsert operation from bulkWriteRequestBuilder
BulkUpdateRequestBuilder updateReq=              bulkWriteRequestBuilder?.upsert()

updateReq. replaceOne (new BasicDBObject("_id",Integer.valueOf(i)).append("myvalue", "myvalue"+i));

}
// execute bulk operation on mycol collection
BulkWriteResult result=bulkWriteOperation.execute();








Comments

  1. Thanks for the post. It was helpful.
    I have mongo instance with 6million+(and growing) records.
    The Bulk insert is fast, but Bulk update(upsert) is taking really long time.
    By long time, it is taking about 300+ times longer than insert.

    Could you help?

    ReplyDelete
    Replies
    1. Please try to run mongostat , that will help you in diagnose the root cause of the problem.

      These are some of the important output fields you would like to watch.

      faults – the faults column shows you the number of Linux page faults per second. This is when Mongo accesses something that is mapped to the virtual address space but not in physical memory. i.e. it results in a read from disk. High values here indicate you may not have enough RAM to store all necessary data and disk accesses may start to become the bottleneck.

      flushes – this shows how many times data has been flushed to disk. MongoDB only physically writes data to disk every 60 seconds (by default). This has the effect of increasing performance but can decrease durability because a hard crash inbetween flushes will result in that data not being written,this stat shows how often mongod is flushing data to disk.

      locked % – shows the % of time in a global write lock. When this is happening no other queries will complete until the lock is given up, or the lock owner yields. This is indicative of a large, global operation like a remove() or dropping a collection and can result in slow performance.

      % idx miss – this is like we saw in the server status output except instead of an aggregate total, you can see queries hitting (or missing) the index in real time. This is useful if you’re debugging specific queries in development or need to track down a server that is performing badly.

      qr|qw – when MongoDB gets too many queries to handle in real time, it queues them up. This is represented in mongostat by the read and write queue columns. When this starts to increase you will see slowdowns in executing queries as they have to wait to run through the queue. You can alleviate this by stopping any more queries until the queue has dissipated. Queues will tend to spike if you’re doing a lot of write operations alongside other write heavy ops, such as large ranged removes.




      Delete
    2. The find() part will be slow unless there are indexes to support efficient find by whatever criteria you are using

      Delete
  2. Awesome post, It might help http://www.code-sample.com/2016/07/mongodb-tutorial-point.html

    ReplyDelete

Post a Comment