MongoDB Bulk Update 
There are use cases when you want to do send the bulk update command to MongoDB to eliminate the to-and-fro between
client and database.   
The new Bulk Update API for MongoDB is opening the way for
performance improvements when using the latest MongoDB Shell or drivers as it
allows the database to consume commands much faster. 
I updated my MongoDB java driver to version  2.12.2
to support the bulk operations , as the MongoDB java driver supports backward compatibility
, the upgrade was as easy as replacing the jar file.
MongoDB support
both ordered and unordered bulk write operations.
Ordered bulk
operations are executed in an order (thus the name), halting when there’s an
error.
Unordered bulk
operations are executed in no particular order (potentially
in parallel) and these operations do not stop when an error occurs.
My team was trying to find an example of doing bulk upsert but they were not able to find one . After going through the API documentation  and some trails and errors , here are the examples of the Bulk Operations with MongoDB
using Java.
Bulk Insert
//necessary imports 
import
  com.mongodb.BasicDBObject; 
import
  com.mongodb.BulkWriteResult; 
import
  com.mongodb.BulkWriteOperation; 
import
  com.mongodb.DBObject; 
import
  com.mongodb.BulkWriteException; 
// Sample
  code  
com.mongodb.DBCollection
  collection = db.getCollection("mycol"); 
// Get  BulkWriteOperation by accessing the mongodb
  com.mongodb.DBCollection class on mycol //Collection 
BulkWriteOperation  bulkWriteOperation=
  collection.initializeUnorderedBulkOperation(); 
//perform the
  insert operation in the loop to add objects for bulk execution 
for (int
  i=0;i<100;i++) 
{ 
bulkWriteOperation.insert(new
  BasicDBObject("_id",Integer.valueOf(i))); 
} 
// execute
  bulk operation on mycol collection 
BulkWriteResult
  result=bulkWriteOperation.execute(); 
 | 
 
The most useful bulk operation is Bulk upsert, In the upsert
if document is present it will update that document or insert the new one.
 Bulk update with Upsert
//necessary imports 
import com.mongodb.BasicDBObject; 
import com.mongodb.BulkWriteResult; 
import com.mongodb.BulkWriteOperation; 
import com.mongodb.DBObject; 
import com.mongodb.BulkUpdateRequestBuilder; 
import com.mongodb.BulkWriteException; 
// Sample code  
com.mongodb.DBCollection collection =
  db.getCollection("mycol"); 
// Get  BulkWriteOperation by
  accessing the mongodb com.mongodb.DBCollection class on mycol //Collection 
BulkWriteOperation 
  bulkWriteOperation= collection.initializeUnorderedBulkOperation(); 
//perform the upsert  operation
  in the loop to add objects for bulk execution 
for (int i=0;i<100;i++) 
{ 
// get a bulkWriteRequestBuilder by issuing find on the mycol with
  _id 
BulkWriteRequestBuilder
  bulkWriteRequestBuilder=bulkWriteOperation.find(new
  BasicDBObject('_id',Integer.valueOf(i))); 
// get hold of upsert operation from bulkWriteRequestBuilder 
BulkUpdateRequestBuilder updateReq=              bulkWriteRequestBuilder?.upsert() 
updateReq. replaceOne (new
  BasicDBObject("_id",Integer.valueOf(i)).append("myvalue",
  "myvalue"+i)); 
} 
// execute bulk operation on mycol collection 
BulkWriteResult result=bulkWriteOperation.execute(); 
 | 
 

Thanks for the post. It was helpful.
ReplyDeleteI have mongo instance with 6million+(and growing) records.
The Bulk insert is fast, but Bulk update(upsert) is taking really long time.
By long time, it is taking about 300+ times longer than insert.
Could you help?
Please try to run mongostat , that will help you in diagnose the root cause of the problem.
DeleteThese are some of the important output fields you would like to watch.
faults – the faults column shows you the number of Linux page faults per second. This is when Mongo accesses something that is mapped to the virtual address space but not in physical memory. i.e. it results in a read from disk. High values here indicate you may not have enough RAM to store all necessary data and disk accesses may start to become the bottleneck.
flushes – this shows how many times data has been flushed to disk. MongoDB only physically writes data to disk every 60 seconds (by default). This has the effect of increasing performance but can decrease durability because a hard crash inbetween flushes will result in that data not being written,this stat shows how often mongod is flushing data to disk.
locked % – shows the % of time in a global write lock. When this is happening no other queries will complete until the lock is given up, or the lock owner yields. This is indicative of a large, global operation like a remove() or dropping a collection and can result in slow performance.
% idx miss – this is like we saw in the server status output except instead of an aggregate total, you can see queries hitting (or missing) the index in real time. This is useful if you’re debugging specific queries in development or need to track down a server that is performing badly.
qr|qw – when MongoDB gets too many queries to handle in real time, it queues them up. This is represented in mongostat by the read and write queue columns. When this starts to increase you will see slowdowns in executing queries as they have to wait to run through the queue. You can alleviate this by stopping any more queries until the queue has dissipated. Queues will tend to spike if you’re doing a lot of write operations alongside other write heavy ops, such as large ranged removes.
The find() part will be slow unless there are indexes to support efficient find by whatever criteria you are using
DeleteAwesome post, It might help http://www.code-sample.com/2016/07/mongodb-tutorial-point.html
ReplyDelete