Thursday, November 19, 2009

Postgres- achieving high insertion rate in multithreaded environment

Suppose you have an application which runs multiple threads, create some set of data in memory, then all of them dump in one single table. This table is short lived. There are some queries etc done on it and then it can be dumped.
During our experiments, it was found that this was getting slow. What I did then, I created two installations of my applications, and measured the rate, and found that the insertion rate got increased. With the help of a DB expert, it came out that this happens because of transaction logs locks.
For example, your application has just one thread, it creates one set of data and then puts that in DB. With this case, you are able to achieve 1MB/sec insertion rate. Now if you increase the number of threads to 5, each of them will create one set of data, hence, total 5 sets of data, you would expect that you should be achieving 5MB/sec as threads are supposed to run in parallel. In fact, that does not happen as the threads now are blocked in locks on transaction logs. (we are assuming one DB connection per thread). Hence, the performance goes down a lot.
How to solve this?
In PostGreSQL 8.3 onwards, they have provided a mechanism for the same.
In above case, all the 5 threads need to write their data to some file, or one structure. Once the data is aggregated, since the table can be dumped, we will truncate it. So, begin a transaction, truncate the table, insert the data in table, and commit the transaction. In this case, since only on thread is going to write to DB, transaction logs are not enabled, hence no locking, and hence, we see a real good performance. The rate can be actually 5 MB/sec or more.

You would ask this - I can simply aggregate the data in memory and dump using one single connection, why to do all this? You are right. We needed to do this as we were using multiprocessing in python (as multithreading is not true multi-threading there), and the data could not be shared across processes, so, we had to write to some file on the disk, and then in one single transaction load that file in DB. We got a very high performance in terms of insertion rate.

Best luck.

No comments:

Post a Comment