SQLite3 / Python 3.2, 100 million rows, 1 million row read / write test
Post date: May 1, 2014 5:30:01 AM
A friend said that I can't use SQLite3 because it's dog slow. Well, here's a little test where I imported 100 000 000 messages to SQLite3 database on my home computer and using traditional consumer SATA disk, not SSD. Commits were called between every iteration aka every million inserts. S: is size in millions of rows -1 (0 index) and W is write time for those million rows and R is read time for those newly inserted millions of rows, using individual select statements, as the table was key value data table. Here are the results, as you can see, yes, it's getting slower when you insert more data. Because results are very linear, I have only included 10% of test results here. Yes, there's some variation, but in this test main point was to conclude if database becomes dog slow, when there are more rows. As we can see, it affects, but much slower than most seem to expect. I know it becomes dog slow, I have personally experienced it, but you'll need to add much more than 100 million records.
Let's insert and read:
$ python3 io-test.py
S: 0 W: 2.52073073387146 R: 6.074654579162598
S: 10 W: 2.6333847045898438 R: 6.157799005508423
S: 20 W: 2.6562182903289795 R: 6.161279201507568
S: 30 W: 2.641266345977783 R: 6.182752370834351
S: 40 W: 2.6545310020446777 R: 6.170837640762329
S: 50 W: 2.676469326019287 R: 6.213182687759399
S: 60 W: 2.6849145889282227 R: 6.188485622406006
S: 70 W: 2.729151964187622 R: 6.243469476699829
S: 80 W: 2.7477469444274902 R: 6.227264165878296
S: 90 W: 2.721693515777588 R: 6.247784614562988
S: 99 W: 2.741420269012451 R: 6.243627548217773
Count: 100 000 000 records in 0.19950389862060547 seconds, file size: 902 606 848 bytes.
From the final row we can see that the insert of million items to database of 99 million rows took 2.7 seconds and read of million individual records by individual SQL selects took 6.2 seconds. I don't think it's bad at all. Based on this, I can conclude that the SQLite3 database is not going to be the performance bottle neck on most of my projects. Only thing that can really kill performance, are the long standing modify / write locks on database, if code isn't done correctly.
Full journaling, delete journal, between transactions, no WAL mode or synchronous 0 / off. So all data is actually persisted to disk, not memory cached. File system ext4.
I also tried different page sizes from 1k - 64k and found out that 4k pages are fastest. 1k pages might be faster in situations where there are updates instead of inserts to same record repeatedly.
Btw. Many other databases would perform a LOT worse? Why? Because SQLite3 provides excellent database latency. Situation of course would be completely different if the queries would include hugely complex joins etc. But in this case, the test was for Key, Value database. Of course if your app is only limited to that, there are also faster options. Main benefit of using SQLite3 is that it comes with Python bundled and doesn't require any additional setup what so ever, compared to many other solutions which can require extensive configuration work and are prone to misconfiguration and also security issues via misconfiguration.