MongoDB Bulk Update
There are use cases when you want to do send the bulk update command to MongoDB to eliminate the to-and-fro between
client and database.
The new Bulk Update API for MongoDB is opening the way for
performance improvements when using the latest MongoDB Shell or drivers as it
allows the database to consume commands much faster.
I updated my MongoDB java driver to version 2.12.2
to support the bulk operations , as the MongoDB java driver supports backward compatibility
, the upgrade was as easy as replacing the jar file.
MongoDB support
both ordered and unordered bulk write operations.
Ordered bulk
operations are executed in an order (thus the name), halting when there’s an
error.
Unordered bulk
operations are executed in no particular order (potentially
in parallel) and these operations do not stop when an error occurs.
My team was trying to find an example of doing bulk upsert but they were not able to find one . After going through the API documentation and some trails and errors , here are the examples of the Bulk Operations with MongoDB
using Java.
Bulk Insert
//necessary imports
import
com.mongodb.BasicDBObject;
import
com.mongodb.BulkWriteResult;
import
com.mongodb.BulkWriteOperation;
import
com.mongodb.DBObject;
import
com.mongodb.BulkWriteException;
// Sample
code
com.mongodb.DBCollection
collection = db.getCollection("mycol");
// Get BulkWriteOperation by accessing the mongodb
com.mongodb.DBCollection class on mycol //Collection
BulkWriteOperation bulkWriteOperation=
collection.initializeUnorderedBulkOperation();
//perform the
insert operation in the loop to add objects for bulk execution
for (int
i=0;i<100;i++)
{
bulkWriteOperation.insert(new
BasicDBObject("_id",Integer.valueOf(i)));
}
// execute
bulk operation on mycol collection
BulkWriteResult
result=bulkWriteOperation.execute();
|
The most useful bulk operation is Bulk upsert, In the upsert
if document is present it will update that document or insert the new one.
Bulk update with Upsert
//necessary imports
import com.mongodb.BasicDBObject;
import com.mongodb.BulkWriteResult;
import com.mongodb.BulkWriteOperation;
import com.mongodb.DBObject;
import com.mongodb.BulkUpdateRequestBuilder;
import com.mongodb.BulkWriteException;
// Sample code
com.mongodb.DBCollection collection =
db.getCollection("mycol");
// Get BulkWriteOperation by
accessing the mongodb com.mongodb.DBCollection class on mycol //Collection
BulkWriteOperation
bulkWriteOperation= collection.initializeUnorderedBulkOperation();
//perform the upsert operation
in the loop to add objects for bulk execution
for (int i=0;i<100;i++)
{
// get a bulkWriteRequestBuilder by issuing find on the mycol with
_id
BulkWriteRequestBuilder
bulkWriteRequestBuilder=bulkWriteOperation.find(new
BasicDBObject('_id',Integer.valueOf(i)));
// get hold of upsert operation from bulkWriteRequestBuilder
BulkUpdateRequestBuilder updateReq= bulkWriteRequestBuilder?.upsert()
updateReq. replaceOne (new
BasicDBObject("_id",Integer.valueOf(i)).append("myvalue",
"myvalue"+i));
}
// execute bulk operation on mycol collection
BulkWriteResult result=bulkWriteOperation.execute();
|
Thanks for the post. It was helpful.
ReplyDeleteI have mongo instance with 6million+(and growing) records.
The Bulk insert is fast, but Bulk update(upsert) is taking really long time.
By long time, it is taking about 300+ times longer than insert.
Could you help?
Please try to run mongostat , that will help you in diagnose the root cause of the problem.
DeleteThese are some of the important output fields you would like to watch.
faults – the faults column shows you the number of Linux page faults per second. This is when Mongo accesses something that is mapped to the virtual address space but not in physical memory. i.e. it results in a read from disk. High values here indicate you may not have enough RAM to store all necessary data and disk accesses may start to become the bottleneck.
flushes – this shows how many times data has been flushed to disk. MongoDB only physically writes data to disk every 60 seconds (by default). This has the effect of increasing performance but can decrease durability because a hard crash inbetween flushes will result in that data not being written,this stat shows how often mongod is flushing data to disk.
locked % – shows the % of time in a global write lock. When this is happening no other queries will complete until the lock is given up, or the lock owner yields. This is indicative of a large, global operation like a remove() or dropping a collection and can result in slow performance.
% idx miss – this is like we saw in the server status output except instead of an aggregate total, you can see queries hitting (or missing) the index in real time. This is useful if you’re debugging specific queries in development or need to track down a server that is performing badly.
qr|qw – when MongoDB gets too many queries to handle in real time, it queues them up. This is represented in mongostat by the read and write queue columns. When this starts to increase you will see slowdowns in executing queries as they have to wait to run through the queue. You can alleviate this by stopping any more queries until the queue has dissipated. Queues will tend to spike if you’re doing a lot of write operations alongside other write heavy ops, such as large ranged removes.
The find() part will be slow unless there are indexes to support efficient find by whatever criteria you are using
DeleteAwesome post, It might help http://www.code-sample.com/2016/07/mongodb-tutorial-point.html
ReplyDelete