Forward error correction provides the ability to detect and correct corrupt data. At a high level, forward error correction works by adding structured redundancy to data that ensures the original data may always be reconstructed, so long as the corruption does not affect more than a configurable percentage of the overall data.

This post introduces the redupe and fecsum command-lines tool for providing forward error correction to streams of data. The first tool, redupe , is modeled after compression tools like gzip or bzip2 , but adds redundancy instead of eliminating it. Redupe operates on the data directly, transforming it into its redundant state by adding the redundant information inline with the original data. This works well for streaming or archival purposes, but is notably less convenient for multimedia use cases where the data would have to be recovered prior to playback. For these use cases the package provides fecsum , short for "forward-error-correction checksum" that preserves the exact byte pattern of the input data by writing the forward error correction to a separate file.

Quick Overview

The redupe tool has two basic workflows. The most common/tested workflow is to pass redupe a file or files to be encoded with error correction, like this (output modified for presentation):

$ ls
2018-05-15-home-backup.tar.gz
$ redupe 2018-05-15-home-backup.tar.gz
$ ls -l
-rw------- 1 rescrv 5625162218 2018-05-15-home-backup.tar.gz
-rw-r--r-- 1 rescrv 6433996800 2018-05-15-home-backup.tar.gz.rd

Notice that the original file was left intact and that we have a new file that is approximately 15% larger than the original. Out of an abundance of caution, redupe will never remove the original file. This is to encourage a workflow where the file can be checked end-to-end for accuracy:

$ sha256sum 2018-05-15-home-backup.tar.gz 2018-05-15-home-backup.tar.gz.rd
0335f60167c8276f13b0... 2018-05-15-home-backup.tar.gz
1a878ad85dd45b2af18e... 2018-05-15-home-backup.tar.gz.rd
$ reundupe < 2018-05-15-home-backup.tar.gz.rd | sha256sum
0335f60167c8276f13b0... -
$ sha256sum 2018-05-15-home-backup.tar.gz 2018-05-15-home-backup.tar.gz.rd
1a878ad85dd45b2af18e... 2018-05-15-home-backup.tar.gz.rd

This example shows us that there exists a tool called reundupe . The commands redupe and reundupe should link to the same binary; it will add or remove redundancy based upon which name is used to invoke the command. The other thing we can see is that reundupe (and because it's the same binary, redupe ) can operate within a UNIX pipeline. When passed no files and a stream on stdin, both commands will output the transformed stream on stdout.

At a higher level, we see an extra paranoid process for verifying our backup before shipping it off to long-term storage. First, we check the original and redupe'd files's checksums. Then, we undo the redupe process and confirm it matches the original file's checksum. Finally, we check the redupe'd file's checksum against the original. While this process is not guaranteed to ensure there are no unrecoverable errors, it provides a high degree of confidence that the redundancy-infused file can reconstruct the original, and that the file is unlikely to have been corrupted during the verification process.

We could also, have gone through a similar workflow with the fecsum tool:

$ ls
2018-05-15-home-backup.tar.gz
$ fecsum create 2018-05-15-home-backup.tar.gz{,.fec}
$ ls
2018-05-15-home-backup.tar.gz
2018-05-15-home-backup.tar.gz.fec
$ fecsum check 2018-05-15-home-backup.tar.gz{,.fec} || echo fail

If we were to discover that the last step failed, we could recover our original. It looks something like this:

$ fecsum check 2018-05-15-home-backup.tar.gz{,.fec} || echo fail
fail
$ fecsum correct 2018-05-15-home-backup.tar.gz{,.fec,.orig} || echo fail

Verification

I trust 100% of my backups to redupe. To gain confidence that I could do such a thing, I subjected redupe to literally years worth of CPU time to verify that it would keep my data safe in the case of no errors. I used the guacamole random generator to create approximately 128 different random streams. I then ran each of these streams on an independent CPU core for somewhere between two weeks and a month of constant testing. I sought to confirm that something like this pipeline performed correctly:

$ guacamole-generate -s $seed \
| redupe \
| reundupe \
| guacamole-compare -s $seed

Because guacamole is a deterministic random number generator, we should be able to generate data in guacamole-generate and recreate the exact same sequence of bytes in guacamole-compare to compare against the stream of data that has been processed in both directions by the redupe software.

All told, this works out to somewhere between five and ten years of up-front verification. The guacamole generator was not well optimized during this validation (2015) and put forth only 200-400MB/s of random data. This was still enough to push more than a couple petabytes through the algorithm for verification.

There was also approximately another week of follow-up validation with an altered pipeline that injected random corruption into the file. Unfortunately, the details of this testing escape my memory and have been lost to time.

Limitations

Redupe and fecsum were written largely as a fun weekend project to learn about Reed Solomon codes. As it was a learning exercise, it may not follow best practices for forward error correction. Some anticipable problems would be:

The basic algorithm underlying the tool is to perform the forward error correction across groups of 255 pages, so that any single page or adjacent set of bytes can be lost or overwritten. This rounds every file up to a multiple of approximately 1MB. I work around this by generally operating on files of hundreds of megabytes or more, so the waste is a trivial percentage of data stored.
There is an assumption that data corruption will be limited in its impact. When 12.5% redundancy is added, no single chunk of data (~1MB) can contain more than 12.5% corruption. If any single chunk exceeds the configured redundancy, the chunk is entirely unrecoverable.
The fecsum API is ugly and subject to change. I usually don't notice this because I embed it within scripts and haven't changed the code much.
As implemented, forward error correction will account for changed bytes, but the tool does nothing to account for shifts in the file. There is no real work-around in redupe for such failures.

Last, but certainly not least, and deserving of its own paragraph: This was a hobby project written to learn about forward error correction. Read the LICENSE distributed with redupe because it contains strong language about this software being provided as-is

Getting redupe

Make it this far and want to install redupe? It is available in source form at my web page or on GitHub . This is a standard autotools package, so the usual ./configure && make && make install dance will suffice for installation. If installing from git, you'll need to prefix that with autoreconf -ivf . You will also likely need to install libpopt's development package and execute the make install step with root privileges.

If you have trouble with this, please don't hesitate to open a ticket on GitHub.

Summary

Redupe is a library for providing forward error correction on the command-line. It was developed to add a layer of safety for both tarball-oriented backups and my multimedia collection. While the tool has seen extensive use by me and has processed petabytes of synthetic data, I've never really taken the time to make it accessible to others and hope this blog post is the start of changing that.

hack.systems

Redupe: Forward Error Correction

Quick Overview

Verification

Limitations

Getting redupe

Summary