Friday, August 28, 2015

MongoDB: Achieving High Speed Replication Across Latency Prone networks

There are multiple factors involved in replication in mongodb. Let's summarize some of the facts here.

  1. Secondary comes to primary node for replication. It fetches the records from oplog and takes back to itself and hands over to local threads for inserting into the database.
  2. The data is replicated from oplog. This is important to note. MongoDB does not replicate the records from DB, but records from oplog. In most of the cases, the oplog average object size is much smaller than that of DB for which data is getting replicated.
  3. This is another important factor. During replication, secondary can fetch max. of 4MB of data in one single fetch. So, no matter how big bandwidth you have, one fetch would fetch max. of 4MB.
  4. During the fetch operation, if you have large number of records in oplog (but of small size), secondary would be required to make more iterations on primary oplog to 'gather' 4MB of data, and it would take time for that. Thus, time required to fetch oplog from primary to secondary would increase. We add latency, it would further increase.


Will elaborate impact of point 4 above first. If we are having high transactions rate at primary, point 4 above suggests that at some point in time, we are likely to start seeing the lag at secondary. This is because secondary would take some time to fetch data back home, and at high transactions rate, there would be more records in oplog. Secondary would again take time to gather data up to 4 MB (or lesser - if gets required number of records). Eventually, this 'fetching' time would go on increasing and result in increasing lag. Soon, the member would be out of sync.

How do we handle this situation?

Through shards.
We should split the traffic across multiple shards so that traffic per oplog (transaction rate for each oplog) is reduced. Thus, by the time secondary fetches the data and comes back, there is not much addition to oplog. So, the chances of seeing a lag get reduced significantly. So, if we have high latency network, we can manage better replication through additional shards. Splitting the traffic is the key here.

Happy replication :) 

No comments:

Post a Comment