Flashcache is software that allows you to use a block device, such as a solid state drive (SSD), to cache the most frequently accessed data from a slow, mechanical drive. It runs on Linux, and it is free.
Most hard drives possess built-in cache memory, but it is usually small — about 64 MB or so. SSDs can hold gigabytes of data, and you can use the entire drive to act as a cache. For example, if you have a 120G SSD, then you can have 120G of dedicated hard drive cache.
Of course, SSDs are not as fast as RAM, so you will be limited to the SSD read and write speeds. However, an SSD is much faster than any mechanical hard drive, so the speed increase is noticeable.
Here are my Flashcache results with 7200RPM hard drives, a Samsung 840 SSD, and Linux Mint 17.3.
Update December 20, 2016: Flashcache no longer works with the more recent Linux kernels, such as 4.8.14 and 4.8.15. Any attempt produces this error message:
Hard drive caching is an old trick, and Linux has different options to choose from:
For my uses, Flashcache is the easiest to install and use. Plus, Flashcache works with existing data drives. Some caching software, such as bcache, requires that you first format the slow hard drive specific to bcache before use. This is not necessary with Flashcache. If you already have data on a hard drive, simply run a few commands, and Flashcache will automatically begin caching the data. Easy.
Flashcache has been around since 2010. It was designed by Facebook to accelerate database requests. While no longer maintained, it is available as open source code, and it still works.
You must download Flashcache from the web site, and compile it yourself since it is not in the Ubuntu repository. Instructions about compiling and installation are provided. Make sure you have build-essentials installed, and you should be fine.
*** Backup Your Data ***
Before you experiment with any form of hard drive caching, backup your data. In fact, make several backups to external drives and then disconnect them from your system. It is too easy to inadvertently ruin your partition information or perform some other careless mistake that results in a data loss. Linux gives you that kind of power.
If you do not have a backup of your data, then you deserve to lose it.
You can “connect” Flashcache to any hard drive. For this example, let’s assume that we have a 120G SSD as /dev/sda and a hard drive as /dev/sdb. For writethrough caching:
sudo flashcache_create -p thru cache /dev/sda /dev/sdb
flashcache_create Program that creates the cache. Must be root to use. -p thru Set caching mode to WRITETHROUGH cache Name of the cache. Change to whatever you like. /dev/sda The SSD. Use the entire device or a partition. Your choice. /dev/sdb The hard drive to cache. In this case, it's the entire device.
Your hard drive is ready to use!
However, access it as /dev/mapper/cache, not /dev/sdb or /dev/sda. /dev/sdb accesses the hard drive, but /dev/mapper/cache will access the slow hard drive through the faster caching SSD.
What Caching Mode Should I Use?
This is important, so take the time to read man flashcache_create and learn about the available caching modes. There are three of them:
- writethrough (safest, but slowest)
- writeback (less safe, but faster)
- writearound (slow, no caching)
I find this mode to be the most reliable. All reads are cached, but writes are performed at the speed of the slow hard drive to ensure that all data is written properly. The SSD cache is not persistent between reboots, so you must create the cache after each system boot before you can use it.
This mode is the definitely the fastest because it caches both reads and writes at the speed of the SSD.
You will be operating at the speed of the SSD with writeback, and best of all, all cached data is persistent between reboots. All files that you have cached will be available immediately from the SSD the next time you turn on your computer. You create the SSD cache one time, and that’s it. The same cache is used. Whatever you might have been working on prior to system shutdown will still be in the SSD cache the next time you boot Linux. If you have a slow 4T hard drive, then this gives the illusion of having 4T of faster SSD secondary storage.
Despite the benefits, I tend to avoid this mode. Why? During my usage, I have encountered two major dealbreakers that ruin the benefits of a persistent cache:
1. SLOW shutdown times.
Data is written to the SSD first, and then it is written to the hard drive at Flashcache’s leisure. If you perform serious writing with large files, the system shutdown time can require up to an hour or more while you listen to the hard drive grind. This is especially true for RAID arrays.
2. RAID arrays require resyncing
I tried Flashcache with a RAID-1 array, and it was a disappointment. Reads were cached like any other drive. Writes were faster as well. The cache was persistent between reboots. So, what could go wrong?
The problem I encountered — in addition to the extremely long shutdown times — was that the RAID-1 array always became out of sync. This meant that the RAID array would require a resync upon each boot — a process that can take anywhere from 4 to 6 hours depending upon the size of the array used. The entire point of RAID is reliability in the event of hard drive failure, and an out-of-sync array defeats the purpose.
With writethrough caching, neither of these problems occur, and RAID arrays remain intact.
A variation of writethough in which writes are NOT cached to the SSD. All writes are written directly to the slow hard drive. The only way to cache the data in the SSD is to read it back. Not persistent between reboots.
“Which mode should I use?”
I recommend writethrough mode for reliability until you become accustomed to Flashcache and how it works. Then, try writeback mode if you have a single data drive. If writeback works well, then great. But if you have critical data, then use writethrough mode. Writeback is too risky.
I have had no need to use writearound mode, but it is there in case you need it.
The required -p option determines the mode the cache will operate in. Each mode is denoted by a name.
- -p thru writeback
- -p back writeback
- -p around writearound
sudo flashcache_create -p thru cache /dev/sda /dev/sdb # writethrough mode sudo flashcache_create -p back cache /dev/sda /dev/sdb # writeback mode sudo flashcache_create -p around cache /dev/sda /dev/sdb # writearound mode
Keep in mind that you only need to create a cache one time if using writeback mode. Writethrough and writearound require that you create a new cache following each system boot.
Changing the Caching Mode
If you change you mind about a mode, then you must destroy the cache and create another with the new mode.
writethrough and writearound
Just reboot. You can also do this from the command line (sudo dmsetup remove /dev/mapper/cache), but rebooting is the easiest.
This mode requires the flashcache_destroy command because its cache is persistent. With the device unmounted, run:
sudo flashcache_destroy /dev/sda
Be sure to double check that you specify the correct cache device. Use sudo fdisk -l if you need to confirm the correct cache device.
Block Size and Cache Size
By default, Flashcache uses 4K blocks and all of the available SSD space. If you would like to specify a certain cache size, then do so in sectors or units (see the man page for details).
sudo flashcache_create -p thru -s 234441648 cache /dev/sda /dev/sdb
This creates a cache with 234441648 sectors. This number was obtained from sudo fdisk -l for /dev/sda, which is the device of the SSD in this example. Look for the total sectors. This is another way to say “use the entire 120G SSD.”
The defaults work well, so I have had no need to change them.
“Does Flashcache really work, or is this just a gimmick?”
It really works, but the data must be cached first. I am using writethrough mode, so all cached reads will be as fast as the SSD. However, the data must have been read first. Upon a system boot, device mount, a first read, or an initial file copy, you will NOT see a speed increase (unless using writeback mode). Initial reading will be performed at the speed of the slow hard drive.
Again, the data must FIRST BE CACHED into the SSD. This is accomplished by reading data or performing file copies like you normally would. Then, when you read the same data (or files) a second time, you will see a significant read boost as the data is read from the SSD instead of the slower hard drive.
“Does the SSD quality matter?”
Yes. Any SSD will be faster than a mechanical, spinning hard drive, so use what you have available. A low-quality spare SSD is fine, but remember that some SSDs perform better than others. An SSD with a maximum read speed of 300 MB/s means that your cache will be limited to 300 MB/s for reads. This is still better than slow 80 MB/s reads from a mechanical drive, but it is slower than a higher-quality Samsung 850 Pro, for example. If you use an SSD with 540 MB/s reads, then your cache will offer 540 MB/s read performance.
“Can I use an M.2 SSD?”
Yes. In fact, you can partition a part of the super-fast Samsung 950 Pro M.2 SSD (or use all or it) to act as a hard drive cache. It is much faster than any SATA SSD available, and you can expect reads in the 2 GB/s range if using PCIe 3.0 x4. Again, the cache is limited to the speed of the caching device.
“If M.2 Gen3 is fast, why not take this further and use a RAM drive?”
Yes, you can use a RAM drive to act as the caching device. I have configured a RAM drive to act as a caching block device with Flashcache, and I measured 3+ GB/s reads. This involves setting up a loopback device because Flashcache requires that the caching device be a block device.
RAM is volatile, so writethrough mode is best used with a RAM drive. There is not as much RAM available compared to an SSD, so RAM is better suited for smaller files.
If you deal with large 30+ GB files, such as VirtualBox images or 70+ GB uncompressed files for video editing, then I do not recommend using a RAM drive. I tried it, and there is really not that much improvement because those files are too big to store in RAM. Parts of files cache okay, but this is not what I want.
In practice, files and chunks of data smaller than the RAM drive size will cache well, but anything larger limits the speed to the slow hard drive. This becomes counterproductive and only wastes RAM. After all, why use RAM if you see no real performance boost?
“Wait a minute! Can I chain caches together?”
Yes, you can. You can create as many Flashcache caches as you like and chain them in a hierarchy. For example, you really can use a Flashcache SSD cache and a RAM cache.
“Oh, WOW! This must be REALLY fast, right?”
Not really. This looks good in theory, but not in practice. I set this up thinking I could use a cache hierarchy. RAM > SSD > HD. I even tried another variant with the 950 Pro M.2 in the mix.
Neither of these configurations worked out well. Sure, they worked. Data was being cached, but reads were actually slower than using a single SSD cache. There were no performance benefits. Only the added hassle and a loss of RAM and M.2 space.
For best results, use a single SSD and leave it at that.
“Can I combine Flashcache with Veracrypt?”
Yes, and this performs extremely well with Veracrypt. (Tested with Veracrypt v1.17.) It consists of two steps:
- Create the cache
- Mount the Veracrypt volume
sudo flashcache_create -p thru cache /dev/sda /dev/sdd
(Assume that /dev/sdd is an entire encrypted Veracrypt device.)
Mount the cache, not the the hard drive device.
veracrypt /dev/mapper/cache mount_point
We need to mount the cache with veracrypt, not the Veracrypt volume itself like we normally would. This is so the data remains encrypted on the SSD. /dev/mapper/cache is the hard drive cache that then accesses Veracrypt.
Keep in mind that reads will be slower due to the decryption process. Depending upon which encryption algorithm or chain you are using, you will see different read speeds. This means reads will be dependent more upon the speed of your CPU, so do not expect reads to be as fast as reads from an unencrypted drive.
“Can I use Flashcache with a RAID array?”
Yes. However, I recommend using writethrough mode for reliability and faster system operation. During my use, Flashcache+RAID caused excessive disk grinding, extended shutdown times, and required resyncing upon every boot. These three drawbacks made the system’s hard drive performance feel slower, not faster. Writeback mode was not worth it. Writethrough mode worked perfectly with RAID without any of these issues.
“Can I use a Bash script to create the cache?”
Yes. Write a Bash script and run it after logging in.
“Can I store regular data on a Flashcache SSD?”
No. You must use separate partitions or another SSD. The partition of the device you dedicate to Flashcache can only be used by Flashcache as a cache. You cannot use a Flashcache block device, such as /dev/mapper/cache, to store your files.
“Linux already has a caching mechanism. Why use Flashcache?”
Flashcache boosts caching beyond what Linux does natively. If a file will fit within the cache, then Flashcache will cache it. This is especially useful for large, multi-gigabyte files. Suppose you are sharing a large, 50GB file over your LAN utilizing link aggregation. Caching the file to an SSD offers far better performance than the standard hard drive of Linux’s caching mechanism.
Also, if you use writeback mode, your caching efforts will persist between reboots whereas Linux clears its cache between reboots. With a writeback cache, you can resume where you left off at SSD speeds. Just be aware of the possible drawbacks with a writeback cache.
Flashcache will boost benchmark results since the benchmark itself gets cached. The first run is always the speed of the slow hard drive, but successive runs reveal what the cache is capable of since reads are performed from the SSD.
Writes were avoided most of the time since I am using writethrough mode. There is no need to benchmark the write speeds since that will always be about 80 MB/s for the slow hard drive. For comparison and demonstration, some write results are included.
Let’s benchmark a RAID-1 array using two 2T hard drives that will be used for all caching tests. (RAID is probably of most interest to everyone.) We want to create a baseline for comparison using the Disks (gnome-disk-utility) benchmark.
100x100M, Samsung 840 SSD 120G
100x10M, RAID-1 7200 RPM 2Tx2 Array. No Flashcache.
120 MB/s read, 95 MB/s write. 16.47ms access time. Also notice how reads and writes drop over time. Regardless of the sample size, the 120/95 MB/s speeds were consistently close.
Let’s see if Flashcache can improve these numbers in any way.
Basic SSD Cache for RAID-1 (115 MB/s –> 305 MB/s)
RAID-1 with SSD Cache (126 MB/s –> 485 MB/s)
Samsung 840 SSD Cache (138 MB/s –> 489 MB/s)
SSD + M.2 950 Pro (122 MB/s –> 2.3 GB/s)
M.2 + 16G RAM Drive (117 MB/s –> 3.1 GB/s)
Let’s throw Veracrypt into the mix and see what happens.
Veracrypt AES 100x10M (109 MB/s –> 293 MB/s)
Veracrypt AES-Twofish-Serpent 6x1000M (131 MB/s –> 175 MB/s)
Flashcache definitely improves read speeds — both sequential and random — if using a solid state drive. Most importantly, notice the reduced access time from ~16ms to less than 1ms. This is significant since a lower access time makes a computer “feel faster.”
File Copy Tests
Using time, I measured how long it would take to copy various files from the slow hard drive to a 950 Pro M.2 SSD using the cp command. Three different types of files were tested. The purpose was to see if Flashcache made a difference in real-world performance. Copying VirtualBox images can be time-consuming, and this was a good test for that purpose. The system was rebooted and the cache was recreated between each test of a different device.
- 4 GB file containing random data.
- 24 GB file of random data.
- 15 GB of various files containing random data.
I also tried various cache arrangements:
- No cache – reading straight from the hard drive.
- SSD – One Samsung 840 SSD using Flashcache
- SSD+RAM – SSD + 16G RAM drive. Two-level cache.
- M.2 40G – Samsung 950 Pro with 40G cache partition.
- 16G RAM – 16G RAM drive used as cache device.
4G File Copy
time cp 4gfile dest
1st 2nd 3rd ------------------------------------------------ No Cache 56s 12s 9s SSD 50s 9s 9s SSD+RAM 56s 12s 9s M.2 40G 54s 12s 9s 16G RAM 50s 9s 9s
Notice the times without cache. Without Flashcache set up, the 2nd and 3rd copies showed cache speeds. This is because Linux performs its own caching. So, for this particular 4G file, Flashcache made no difference.
24GB File Copy
time cp 24gfile dest
1st 2nd 3rd ------------------------------------------------ No Cache 4m57s 4m36s 4m36s SSD 4m37s 1m15s 1m10s SSD+RAM 5m16s 2m12s 1m56s M.2 40G 4m59s 6m17s 5m58s 16G RAM 4m38s 3m16s 3m23s
Recall the question, “Linux already has a caching mechanism. Why use Flashcache?” This is why. In the 4G copy test, Linux used its own caching mechanism, but we are copying a 24 GB file in this test. Linux does not cache it, but Flashcache does.
This test also demonstrates that a single SSD is best. Surprisingly, the M.2 partition offered no benefit. Not certain why. It might have been my configuration. Or, it could have been due to the fact that the same 950 Pro M.2 SSD was used for both the cache and the destination on different partitions.
With Flashcache, the same 24 GB file copies in 1m15s if it has already been cached. This is a noticeable leap over the original 4m37s copy time.
When performing everyday file copies using Nemo or any other file manager, we see the same performance boost. It is fun watching the progress bar fill quickly during a file copy.
15 GB Various Files
time cp -r * dest
1st 2nd ------------------------- No Cache 2m27s 2m29s SSD 2m28s 45s SSD+RAM 2m36s 58s M.2 40G 2m29s 2m28s 16G RAM 2m27s 40s
Again, we see why a dedicated SSD cache is preferable to the built-in Linux cache. Flashcache boosts read speeds for cached data no matter how large as long as the cache is big enough.
The M.2 again shows normal speeds, so I think my configuration is to blame, and not the M.2 itself.
With combined caches (SSD+RAM) and a RAM drive, there is little improvement over a single SSD.
There are not many tools for Flashcache, but if you need to monitor cache statistics, then have a look at /proc/flashcache. Using watch on the stats file will show a realtime update of the cache in action. It is nothing great, but it is better than nothing. Have fun parsing and making sense of this information.
If you have a spare SSD, then put it to use! Flashcache boosts your hard drive performance for free. All it takes is the time to compile and set it up.
Of course, if all of your data is already on an SSD, then caching makes no sense. As SSD capacities increase and their prices fall, slow mechanical drives will fade into obscurity. But we are not in the age of budget-priced 4TB SSDs yet. For now, an SSD with Flashcache will (kind of) give the illusion of having a massive amount of SSD secondary storage without the cost.
Give it a try, and put that old SSD to use!