SQLite3 performance testing
Post date: Apr 17, 2014 2:42:16 PM
I'm trying to figure out what's the transaction level, which can be satisfied with SQLite3, and when other database solutions are required.
Here's simple test, running 10000 transactional selects & updates, in 10 parallel threads. The update process does only what's required, so basically it reads and then updates data and commits it. After several tests, I found out that there are just a few things that do really matter. Which are Synchronous mode and journaling mode. I have seen many people recommending using larger page size for better performance, but actually in this light test I found out that the default 1024 bytes page size was the optimal. Any larger page, only requirs more reading and writing and brings down the performance, just as you would expect.
After some playing I figured out that in this case it doesn't really matter if the transaction mode is deferred, immediate or exclusive. Of course using deferred just required me to handle 'locking failures' in my code. But using immediate or exclusive pretty much gave same results? Why? Well, because basically it doesn't matter how it's done. All that matters is that there's a read and write lock to a table. Even if it would allow multiple parallel reads on table, only one commit will be successful and others will fail. Because there's no row level locking. Anyway, as we know this isn't highest performing database there is out there. Lol. But let's estimate what this is good for.
Results with the things that do matter.
Mode Sync Disk Time Transaction/second
Notes, Normal in this case means Delete, SSD was using Ext4 and HDD was using NTFS, if it does matter. As we can see from results, file system or even medium doesn't make so big difference, it's all about the mode and if synchronization is required or not. Other factos are just minor nyances in this show.
From these results we can conclude that SQLite is really fast enough for 99,9% web sites as they say on their page. WAL mode without sync, allows over 5k transactions per second using SSD and over 1k transactions per seconds on traditional HDD. Even the default mode, which is NORMAL/FULL using traditional HDD still allows 78 transactions per second. Because default locking mode for SQLite3 is deferred, it means that multiple processes / threads can read the database simultaneously and error / exception / notification is only done when database is being written to. If you're writing efficient code, you'll do everything what is required, and then do just at the very last moment start transaction, update data and commit it that. Because the secondary reason after these modes is the locking mode, which is ONE lock per database. If the segment which is modifying database, takes long, then whole process becomes slow. I also did run these tests by artificially adding time.sleep(0.1) inside the "synchronized" block, which reserves database. Just as expected, it made things very much slower. So keep the sections which update database, short and quick, and don't commit too often. Btw. In deferred mode as default, no journal file is created on read requests. Journal is only created when database is being modified. Now someone could immediately point out that there are other journaling modes than the default which is delete. That's absolutely true, but compared to sync options and WAL mode, that's really marginal difference, so I didn't bother to list results here separately.
Based on this test, SQLite3 is actually best database for my use in most of cases. Usually I have long running single process tasks, which commit at the very end of process. SQlLite3 is basically just a transactional data store.
As pro SQLite3 comment it can be stated that it's "in-process" database, it doesn't CPU and OS to do context switch. This means that the SQLite3 database gives actually great performance in terms of latency when doing tons of light read transactions.
I personally wouldn't run any production system with non-default transaction handling for data I don't want to lose. So turning journaling and sync off for testing environment, is great. But for production, it's absolute no no. I recommend using WAL mode with NORMAL, which is default when using WAL mode.
1) Just for fun, I did also run tests with :memory: database, and as expected, it's bit faster than any of the versions using journaling or file system persistence. This also proves that the delays of the performance are mainly caused by I/O and not the SQL command processing or Python itself. Yet, I had to to try it without using the SQLite3 at all, so the final row in the table is Python 3.2 dictionary with int as key and int as value, and if key exception, insert key with 1.
2) All of these tests had function which randomized the insertion sequence, so I tried it with 10000x1 and 1x10000 meaning 10000 different keys, with only create. Either of these options made any meaningful difference. So 10000 different keys with insert is quite much just as slow or fast as 10000 increments / updates to single key. I also varied number of threads, etc. No practical meaning. Unless you do something silly like using 10000 threads to update one individual dictionary entry per thread. But that doesn't make any sense anyway. If you want to go really crazy, use 10000 processes to each update one individual dictionary entry in memory. Btw. Time of creating the thread pool, closing and joining it, is included in the timing.
Details: I was using sqlite3 module which comes with Python 3.2. The SSD I used was "consumer grade" SSD, with Ubuntu. NTFS tests were run using Windows 7 Pro and consumer grade 500 GB 7200 RPM SATA drive. If that does any meaningful difference.
Added one comment from discussion this post lead to:
Actually, multiple programs, processes, threads can all access same database at same time perfectly well. But there's only one active write transaction allowed at a time. When using WAL mode, there can be slow open read transactions perfectly well which are parallel with (one) write transaction at a time. If you need additional performance, you can workaround that limitation, you might want to split data so that there's small table (database) which contains often updating data, and larger for rest of stuff, which isn't updated multiple times per second. So in case of web-forum, there can be separate DDoS protection table which is updated on every call, and then rest of the system data which is updated only when logged in user does some changes to rest of data. In forums, you might like to split data like 'last read' flags to own database, and not store it with message / forum tables, etc in same database. But as said,those things can be done only when it's necessary. For most of sites, that time will be never. Of course if you hit that kind of situation, it's good idea to start looking for higher performance solutions or replace DDoS table with in memory database or something similar. As said, SQLite3 puts reliability before performance with standard settings, and that's what most of people really like. We all know amount of whining MongoDB has received, because of lost committed data. It doesn't guarantee durability by default.