Here is a full list of the tested libraries:
Compression 101Before I get to the results of the benchmark, I think it is important to understand the different archive formats and compression algorithms in order to compare the results with other benchmarks/and or tests. Lets get to it!
A compression algorithm can be compared to a small machine that takes in some data, do some math on it and transform it to some new data - just smaller in size. It sounds simple, but it involves a lot of math to do it efficient. The best compression algorithms are the ones which have a high compression ratio while only using small amounts of RAM and CPU. One such algorithm is the very well known DEFLATE which internally uses the famous LZ77 algorithm created by Abraham Lempel and Jacob Ziv. The DEFLATE algorithm has become the de facto standard when it comes to compression, and it is used in nearly all instances when a compression algorithm is needed. There are many compression algorithms out there, some are better than DEFLATE and also very wide spread. Examples include LZW, BWT, LZSS and LZMA.
Each of the compression algorithms only take in a single file, either as a blob of bytes, or a stream of bytes. The question is, how do compression applications such as 7Zip, WinZip and WinRAR compress multiple files?
Enter the world of archive formats. We all know about Zip, BZip2, GZip and RAR. They all simply describe the content of themselves either in terms of files or streams of data, along with some metadata such as original size, file name and path. The archive formats are more complex that you would think, simply because there are a lot of features. Some of them support multiple data streams (for multi-threading), encryption, password protection, file name obfuscation, self-extracting archives and more.
|Figure 1: The relationship between files, compression algorithms and archive formats.|
Many of the archiving formats are now synonymous with the compression algorithm they use. We say "Zip the file" or "Make a 7z of the folder". In reality, you can interchange the different archive formats with any of the compression algorithms, and as such you can have combinations such as Zip with LZMA or 7z with DEFLATE. That being said, there are some default compression algorithms in some archive formats. I've listed some examples here in archive/algorithm format:
- Zip / DEFLATE
- 7zip / LZMA
- Bzip2 / BWT
- Gzip / DEFLATE
- GIF / LZW
- PNG / DEFLATE
- Tar / none
- Rar / LZSS
In the Linux world, the solution to compressing multiple files was solved using an archiving format called Tar. It is a pretty simply format designed for tape archiving, which preserves the filesystem information such as timestamps, owner information and file size on the files held inside the archive. To compress a directory, Tar simply store all the files within the directory inside a single file, then compress it using DEFLATE (gzip) or BWT (bzip2).
BenchmarkNow that we know the difference between compression algorithms and archive formats, we can continue to the benchmark. The libraries tested here are overlapping in the sense that they are simply different implementations of different compression algorithms, and even though the name of the library might contain Zip, it has nothing to do with the actual Zip archiving format, but everything to do with the compression algorithm used by default in Zip files.
The benchmark uses a 128 MB test file that contains images, music, text and some random data. The file represent arbitrary data usually found on a filesystem.
|Library||Type||Packed (MB)||Pack (ms)||Unpack (ms)||Ratio||Speed (MB/s)|