Testing compression

Written by on .

gzip-test

Use the following bash-function to test how much percent a file or directory can be compressed with gzip.

gzip-test() ( set -e if ! [ -e "$1" ]; then echo "error: No such file or directory ($1)."; echo "Usage: gzip-test [path]"; return 1; fi local _a="$(mktemp)" trap "rm -f '$_a'" EXIT find "$1" -type f -exec cat {} \; 2>/dev/null | head -c 1048576 >"$_a" local _b="$(cat "$_a" | wc -c)" local _c="$(gzip -c "$_a" | wc -c)" echo "$((100 * (_b - _c) / _b))%" )

Usage:

gzip-test [path]

This takes the first megabyte of the file or directory, and compresses it with gzip. Then it returns the percentage in which the data could be compressed. Note that for archives such as tar, there is a 512 byte blocksize, and additionally stores paths and attributes, which can influence the result. A high percentage indicates a lot of predictable patterns, a low percentage indicates a higher entropy (less patterns).

Limitation

When used on a directory, it could be that the first megabyte consists of plain text files with natural language which is very easy to compress, followed by a large movie file that cannot be compressed any further by gzip. In this case, the result will be highly incorrect. When using on a directory, it will only be accurate when all of the files are either small (KiB size) or similar.

Testing gzip-test

head -c 10 /dev/urandom >/tmp/test-random; gzip-test /tmp/test-random -350% head -c 100 /dev/urandom >/tmp/test-random; gzip-test /tmp/test-random -38% head -c 1000 /dev/urandom >/tmp/test-random; gzip-test /tmp/test-random -3% head -c 10000 /dev/urandom >/tmp/test-random; gzip-test /tmp/test-random 0% head -c 100000000 /dev/urandom >/tmp/test-random; gzip-test /tmp/test-random 0% head -c 10 /dev/zero >/tmp/test-zero; gzip-test /tmp/test-zero -280% head -c 100 /dev/zero >/tmp/test-zero; gzip-test /tmp/test-zero 61% head -c 1000 /dev/zero >/tmp/test-zero; gzip-test /tmp/test-zero 95% head -c 10000 /dev/zero >/tmp/test-zero; gzip-test /tmp/test-zero 99% head -c 100000000 /dev/zero >/tmp/test-zero; gzip-test /tmp/test-zero 99%