Tuesday, July 30, 2013

FreeBSD 9.2 Feature Highlight: ZFS LZ4 compression

As part of the continuous improvements to OpenZFS made as a joint effort between FreeBSD, IllumOS and various other developers and vendors, the ZFS version included in FreeBSD 9.2 has been upgraded from the last open source version from Sun/Oracle (v28) to v5000 (Feature Flags). The purpose behind the large large change in the version number is to avoid confusion with the continued proprietary development of ZFS at Oracle (currently at v34), and to ensure compatibility and clarity between the various open source versions of ZFS. Rather than continuing to increment the version number, OpenZFS has switched to 'Feature Flags', as new features are added, the pools are marked with a property, feature@featurename so that only compatible versions of ZFS will import the pool.

One of these new 'Feature Flags' is support for LZ4 Compression. ZFS has long supported transparent compression of datasets (data is automatically compressed as it is written) with a number of algorithms: lzjb, gzip [1-9] and zle. Of the available algorithms, lzjb was the most popular because of its lower CPU consumption, however specific datasets could be compressed with various levels of gzip to gain additional space savings at the cost of more CPU usage.

LZ4 is a new BSD licensed high performance multi-core scalable compression algorithm. In addition to better compression in less time, it also features extremely fast decompression rates. Compared to the default LZJB compression algorithm used by ZFS, LZ4 is 50% faster when compressing compressible data and over three times faster when attempting to compress incompressible data. The performance on incompressible data is a large improvement, this comes from an 'early abort' feature, if ZFS detects that the compression savings is less than 12.5% then compression is aborted and the block is written uncompressed (especially useful for large multimedia files that are already compressed). In addition, decompression is approximately 80% faster; on a modern CPU LZ4 is capable of compression at 500 MB/s and decompression at 1500 MB/s per CPU core. These numbers mean that for some workloads, compression will actually give increased performance, even with the CPU usage penalty, because data can be read from the disks at the same speed as uncompressed data but then once decompressed provides a much higher effective throughput. This also means it is now possible to use dataset compression on file systems that are storing databases, without a heavy latency penalty. LZ4 decompression at 1.5 GB/s on 8k blocks means the additional latency is only 5 microseconds, which is an order of magnitude faster than even the fasted SSDs currently available.

In the end, the gain you get from switching to LZ4 for compression on your dataset will depend on how compressible the data you are writing is.

To enable LZ4 compression on a dataset:
# zfs set compression=lz4 poolname/dataset

In  order to make use of LZ4 your pool will need to be upgraded to v5000 (Note: this means that your pool will only be readable by FreeBSD 8.4 and 9.2 or later).

2 comments:

  1. I wonder if there might be a place to implement multiple rounds of LZ4 for even better compression (LZ4-3 = 3 rounds), and or LZ4HC (more CPU time in exchange for more compression, still really fast decompression).

    It apparently works extremely well on log files:
    https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ

    In an example, a 3.5GB log file was compressed with LZ4 down to just 56MB. But that file was passed back in to LZ4 and came out as just 9.5MB.

    With LZ4HC, the file goes to 44MB on the first pass, then 2MB, then 1MB and stays at just over 750K after 5 rounds.

    ReplyDelete
  2. I have changed the compression level from Gzip to lz4 and now I am having issues in replication at destination, new files have errors. Will the change back from lz4 to gzip will initiate the replication? I know I should have done that after destroying the pool and recreating from base.

    ReplyDelete