Duplicati / Restic - Backup Program Comparison
Initially I planned to post this stuff quite a long ago. But I've held back this post for a while, so I do have the experience and results to deliver immediately.
Duplicati (@ duplicati.com) vs Restic (@ restic.net):
Duplicati - It got nice GUI for those whom care
There are basically all necessary options for normal backup use in both applications, but Duplicati got plenty of extra options under advanced.
Duplicati - The latest version still got issues while trying to stop backup. To me those seem like normal event handling / threading issues
Restic seems to perform a bit better when checking which files to backup and which chunks. This is actually quite expected from architecture points, it got it's pros and cons.
Restic - I’m worried about memory footprint, the environment I’m running the backups got large pretty stale data sets, so Restic could cause memory issues. This could ruin the benefits provided by better performance. Duplicati's memory footprint isn't low, but it's quite static even if backup set grows.
Restic is very nicely wrapped in a single binary file, which doesn't need any kind of setup. Making it really wonderful program to just drop on servers with a script and run.
On some rare cases for me, I assume de-duplication with variable block size to improve performance of the de-duplication a lot. But this applies just to a few very specific data sets. As example SQL dump as text could probably benefit from this a lot. When data gets out of de-duplicati block's alignment repeatedly.
I’ll use Restic with the rest-server. The data is stored on the same file system / backend as Duplicati backups are. Duplicati backups are updated with FTPS for now (which to be honest isn't the fastest way of handling chunks, but it works. Due to Duplicati's parallel file handling this isn't a bottle neck.)
Restic creates a lot of quite small chunks, well, pretty similar to Duplicati’s default. Not a great fit for SMR storage / cloud storage where there might be latency on per file access. Yet a few last versions bumped the chunk file size from 8 to 16 Mebibytes (MB). Both programs allow changing chunk size.
Depending on destination file system and storage platform, Duplicati's storage might be better or worse. It uses a single directory for all data. Restic uses bunch of a few more than 256 directories. Something like 260 directories. If storing the data to something like ext4 file system and SMR disk that's pretty bad option from performance standpoint.
One of largest differences in daily operation is pruning aka garbage collection aka deleting old backups / data. Both programs allow now setting a threshold when that operation runs. But Duplicati interprets it as one value for the whole backup and Restic does the analysis file (storage chunk) by file. - Let's say we use something like 25% threshold. Now Restic removes some data almost daily. But Duplicati could go on 4 months without compaction. Problem comes from the point that, if the data is something which is highly de-duplicated, but still changing all the time the process can be extremely slow. It basically means downloading, extracting, repackaging, compressing and uploading full backup data. It could be minutes, hours, days or even weeks, depending what kind of setup you've got. - One option is to disable compaction and only run it when you can wait for the compaction to finish. But that's not generally good tip for average user.
I just wish Duplicati would have a switch for this, allowing to change the behavior when seen beneficial. - This was especially damaging when the Duplicati had bug that aborting this deletion process could and would lead to data corruption.
Both applications do efficient de-duplication, compression, chunking, and update operations to storage back end. Radically reducing backup time and storage needs. - Both got basic design and features which are exactly what modern backup platform should be in my opinion.
Both options work great for normal desktop usage, if you usage is heavy duty and or the small things mentioned here do matter, then it's best to run both in parallel to see which one suits you better.
OS - For Windows user Duplicati is a great win. For Linux users Restic is easier to setup and use. In broader sense both applications are after all very similar, even if small internals vary hugely.
Duplicati got a way superior backend support, ie. the variety of backup locations where you can store the data. When Restic is quite limited with those options.
If you add or remove directories from Restic backup (sources), it's going to read again all files, which can be very slow.
For reference if anyone cares at all: Restic version 0.16.2 and Duplicati version 2.0.7.100
Two open wishes for Duplicati team:
Add Option to allow compacting triggering per backup repository block file (currently missing!) or whole backup set (current)
Making sure that aborting with nice request like with SIGINT / SIGTERM would terminate the process gracefully
Two open wishes to Restic team:
Add option like --certificate-fingerprint or something similar instead of forcing to use --insecure-tls. It would allow users to specify a trusted certificate. As example, some systems are in closed environments, and that's good reason not to have generally trusted certificates, yet it doesn't mean that the certificates being used would be insecure in anyway
Allow user to select if they want to use bunch of sub directories as destination, or a single flat directory or a flat single bucket
Duplicati issues, in past
Dupliati had serious data corruption issues: Duplicati Forum (@ forum.duplicati.com).
Duplicati - Just one thing is interesting, how it's possible that the test and restore functions are so different. Afaik, it tells that the application is pretty much guaranteed to be flawed. If it would have something to do with actually data corruption in the remote store, test should catch it. Or if it doesn't, then the test is even worse broken than I thought. Yet again, my primary suspect is the same that we've covered earlier. Repair / Database Rebuild <- That combination is broken, with some other things being broken, leading to total mess. - That other thing is backup data transactions. When program is shutdown it might leave a mess behind, and when it runs again, it doesn't know anymore, which data is current and which is stale so the correct way from resuming that situation just fails. Repair does something, but the result could be good or bad, quite randomly depending on the aborted operation stage.
But sure I could try the task with more durable storage, yet I'm doubtful it would fail in any reasonable time, nor that the root problem actually is the storage durability in this case. The storage system is used for many other things, which are encrypted and hashed, and none of those got any integrity issues. It's always and always, Duplicati RESTORE only issue. Verify nor Test never fails. Which points to software logic issues, afaik. I've seen lots of bad software, I've also written lots of bad software and learned in the process of failure.
Also at one point they had broken lz77 (xz) compression library, which corrupted data. I didn't ever see any kind of confirmation that the problem would have been actually fixed or the broken library disabled. But hopefully that's done. For me, it just did mean that I didn't use lz77 compression anymore.
Btw. Just now I'm running a full test to one of the backups sets, just to figure out what's the problem. I did start too tasks, restore with database and restore without the database. And if that makes any difference, it just pretty much is direct proof about all the things I've repeatedly said in the Duplicati forums. Yet the problem doesn't appear immediately, it's some unsafe data operations, which break the data set. I've even provided detailed logs earlier to the dev team.
Just to be sure, I'll run the tests on the corrupted data sets, to see, if any of the data is actually corrupted or if the app either incorrectly tests or just badly restores it. On some situations there's separate recovery tool, which allowed to restore the data. But why did the process get so messed up? Why normal recovery (based on journal) and or recovery using repair, didn't detect or remedy these issues?
Note. I haven't been following early Restic development, I'm sure they have had their issues as well.
Finally there's a Duplicati version which should fix these issues:
https://github.com/duplicati/duplicati/releases/tag/v2.0.7.100-2.0.7.100_canary_2023-12-27
They have also added an option --repair-force-block-use, which I assume would help recovering the unrestorable backups, which might have just issues with the index files.
Phew. I really hope, I'm done with this subject now.
2023-12-31