Testing compression
Written by masteryeti on .gzip-test
Use the following bash-function to test how much percent a file or directory can be compressed with gzip.
gzip-test() (
set -e
if ! [ -e "$1" ]; then echo "error: No such file or directory ($1)."; echo "Usage: gzip-test [path]"; return 1; fi
local _a="$(mktemp)"
trap "rm -f '$_a'" EXIT
find "$1" -type f -exec cat {} \; 2>/dev/null | head -c 1048576 >"$_a"
local _b="$(cat "$_a" | wc -c)"
local _c="$(gzip -c "$_a" | wc -c)"
echo "$((100 * (_b - _c) / _b))%"
)
Usage:
gzip-test [path]
This takes the first megabyte of the file or directory, and compresses it with gzip. Then it returns the percentage in which the data could be compressed. Note that for archives such as tar, there is a 512 byte blocksize, and additionally stores paths and attributes, which can influence the result. A high percentage indicates a lot of predictable patterns, a low percentage indicates a higher entropy (less patterns).
Limitation
When used on a directory, it could be that the first megabyte consists of plain text files with natural language which is very easy to compress, followed by a large movie file that cannot be compressed any further by gzip. In this case, the result will be highly incorrect. When using on a directory, it will only be accurate when all of the files are either small (KiB size) or similar.
Testing gzip-test
head -c 10 /dev/urandom >/tmp/test-random; gzip-test /tmp/test-random
-350%
head -c 100 /dev/urandom >/tmp/test-random; gzip-test /tmp/test-random
-38%
head -c 1000 /dev/urandom >/tmp/test-random; gzip-test /tmp/test-random
-3%
head -c 10000 /dev/urandom >/tmp/test-random; gzip-test /tmp/test-random
0%
head -c 100000000 /dev/urandom >/tmp/test-random; gzip-test /tmp/test-random
0%
head -c 10 /dev/zero >/tmp/test-zero; gzip-test /tmp/test-zero
-280%
head -c 100 /dev/zero >/tmp/test-zero; gzip-test /tmp/test-zero
61%
head -c 1000 /dev/zero >/tmp/test-zero; gzip-test /tmp/test-zero
95%
head -c 10000 /dev/zero >/tmp/test-zero; gzip-test /tmp/test-zero
99%
head -c 100000000 /dev/zero >/tmp/test-zero; gzip-test /tmp/test-zero
99%