What compression do you use?

The following is an evaluation of various compression utilities that I tested when reviewing the various options for MySQL backup strategies. The overall winner in performance was pigz, a parallel implementation of gzip. If you use gzip today as most organizations do, this one change will improve your backup compression times.

Details of the test:

  • The database is 5.4GB of data
  • mysqldump produces a backup file of 2.9GB
  • The server is an AWS t1.xlarge with a dedicated EBS volume for backups

The following testing was performed to compare the time and % compression savings of various available open source products. This was not an exhaustive test with multiple iterations and different types of data files.

Compression
Utility
Compression Time
(sec)
Decompression Time
(sec)
New Size
(% Saving)
lzo (-3) 21 34 1.5GB (48%)
pigz (-1) 43 33 995MB (64%)
pigz (-3) 56 34 967MB (67%)
gzip (-1) 81 43 995MB (64%)
fastlz 92 128 1.3GB (55%)
pigz [-6] 105 25 902MB (69%)
gzip (-3) 106 43 967MB (67%)
compress 145 36 1.1GB (62%)
pigz (-9) 202 23 893MB (70%)
gzip [-6] 232 78 902MB (69%)
zip 234 50 902MB (69%)
gzip (-9) 405 43 893MB (70%)
bzip2 540 175 757MB (74%)
rzip 11 minutes 360 756MB (74%)
lzo (-9) 20 minutes 82 1.2GB (58%)
7z 33 minutes 122 669MB (77%)
lzip 47 minutes 132 669MB (77%)
lzma 58 minutes 180 639MB (78%)
xz 59 minutes 160 643MB (78%)

Observations

  • The percentage savings and compression time of results will vary depending on the type of data that is stored in the MySQL database.
  • The pigz compression utility was the surprising winner in best compression time producing at least a size of gzip. This was a full 50% faster than gzip.
  • For this compression tests, only one large file was used. Some utilities work much better with many smaller files.

Find our more information of these tests and the results in Effective MySQL: Backup and Recovery

Comments

  1. says

    I also came to the same conclusion.

    My results are available at: http://www.kormoc.com/stuff/Compression%20Speeds.pdf

    The machine is a 24 core Xeon(R) CPU X5650, 24 disk raid 1+0 on a 3ware 9650. We ended up using pigz -1 as the time increase over -1 was offset by the faster network transfer so the end time to remote target was lower but if we were on a 100 mbit network, pbzip2 -9 was the better choice.

  2. James Day says

    For 7z were you using the PPMd compression method? See http://www.dotnetperls.com/ppmd if you’re not familiar with it.

    Views are my own. for an official Oracle view, consult a PR person.

    James Day, MYSQL Senior Principal Support Engineer, Oracle

  3. ben says

    xz -1 is much fast than default bzip2, about 90% times than default gzip, with almost same compress radio.

Trackbacks