Imagine you needed to generate 10TB of random data and needed to be able to efficiently convey to another individual what you expect this data to contain. How would you do it?

This particular scenario comes up quite a bit in storage system benchmarking and validation, where it is necessary to be able to quickly describe generate large quantities of data and later recall exactly which bytes comprise that data.

This post introduces guacamole, a pseudo-random number generator (PRNG) that provides a seekable random stream. We will see how guacamole helps us answer the above challenge.

Guacamole by Example

Guacamole is provided as part of the hack.systems/random package. To bring it into your environment, you can fetch it via standard Go tooling:

go get hack.systems/random

If we were to go back to the prompt at the start of this post, we could use guacamole to generate 10TB of random data like so:

package main

import (
    "hack.systems/random/guacamole"
)

const (
    TenTerabyte uint64 = 1e13
    BlockSize   uint64 = 64
)

func main() {
    g := guacamole.New()
    buf := make([]byte, BlockSize)

    for i := uint64(0); i < TenTerabyte; i += BlockSize {
        g.Fill(buf)
        // do something with buf
    }
}

So far, so good, but our example program looks just like it would for most every other random number generator. In particular, no matter how optimized the implementation is, it will be bound by the rate at which a single CPU can generate the data and the speed with which we can do something with buf .

Guacamole helps us work around the single-CPU bottleneck through a sequential seed property . Where a typical PRNG provides a seed function that takes some user-provided data and unpredictably transforms it into a position within the random stream, guacamole's seed function guarantees that adjacent seeds produce adjacent outputs.

Guacamole's sequential seed property allows us to break up the work while producing the same sequence of random bytes. We can get to the same state that g is in after hours of executing the above code with a single, constant time seed operation:

g.Seed(TenTerabye / BlockSize)

Here, we've seeded the generator with the index corresponding to the tenth terabyte. Guacamole generates data in 64 byte blocks, so we've taken that into account here.

We can exploit the speed and predictability of the seed operation to parallelize our original task. If we break the data into 1TB chunks, we can use a full ten CPU cores to generate the same data we generated above:

package main

import (
    "hack.systems/random/guacamole"
)

const (
    Terabyte  uint64 = 1e9
    BlockSize uint64 = 64
)

func genTerabyte(i int) {
    g := guacamole.New()
    buf := make([]byte, BlockSize)
    g.Seed(uint64(i) * Terabyte)
    for i := uint64(0); i < Terabyte; i += BlockSize {
        g.Fill(buf)
        // do something with buf
    }
}

func main() {
    for i := 0; i < 10; i++ {
        genTerabyte(i)
    }
}

At this point, it is worth examining why we would care to preserve the random number stream between these two different use cases. Why would care about the actual content of the random data? After all, it is random and has no meaning to us, right?

Not quite, because we actually know and can say quite a bit about this random data. For instance, we can directly compute the value of any 1MB segment within the data. If the random data is being used to exercise a storage system, we don't need to know anything about the actual bytes themselves to know that the bytes stored in the system better match they bytes we expect.

Salsa and Guacamole

Guacamole is a chopped-up version of the "salsa" cipher from DJB. Salsa was a great choice for this because it did not require keeping any state for the stream other than an index into the stream. Guacamole is a transformed version of salsa that makes this index an explicit input, makes the cipher operate on explicit 64B boundaries, and inlines constant values for the key and plaintext that allow for better performance.

The name is largely a play on words that comes from a misunderstanding I had. When I first learned of the salsa cipher, my mental association was with the dip for chips and not with the variety of dance. Later when it came time to assign a name to my fork of salsa that was not as the original author intended, I thought it'd be a humorous pun to play off the salsa homonym so that I was using both the code and the name in ways that weren't originally intended.

Validation

Random number generators have varying degrees of quality. Many simpler random number generators have a tendency to repeat or have predictable patterns in the lower order bits of the numbers that they generate. I've heard horror stories of people using the standard drand48 family of random number generators and finding that it introduces significant artifacts into their experimental results.

To help quantify the extent to which a random number generator is "good", researchers have assembled a collection of tests of randomness that can unveil weaknesses in random number generators. While what these tests actually do is outside the scope of this post, an intuition for what they do is quite helpful.

Imagine if we were to use a PRNG to flip a coin one million times and see that it comes up almost evenly split between heads and tails. This sounds like a good property for the generator to have, so it would pass this test. Another test could be to see how many time heads turns up in a row. A generator that passed the first test could fail this second test if, e.g., it swapped between heads and tails every 997 coin tosses. To have that many heads in a row once is a one in one novemdecillion (thats one with sixty zeroes). To have that happen once is practically impossible; if it happens twice, the generator is broken, and we can declare it as such.

The Dieharder random number test suite is a collection of tests much more sophisticated than our coin-flipping test that can show how good (or bad) a random number generator is. Running it on a stream of data from guacamole shows that 111/114 tests pass , with the remaining three showing signs of weakness. For a baseline, a run with /dev/urandom on Linux passed only 110/114 tests .

Damned Lies

Guacamole is a Go facade around C and amd64 assembly implementations of the algorithm. The assembly version gives significantly better performance than the portable C code, producing up to five times more bytes per second than the C code can produce.

On a 2014-era Intel Core i7-4600 (Haswell) CPU at 2.1Ghz (in a laptop), guacamole can pump out more than one gigabyte per second per core:

Block Size	Portable	SSE4.1
64B	89.8 MB/s	218.7 MB/s
1KB	134.4 MB/s	876.6 MB/s
4KB	136.5 MB/s	1054.0 MB/s
64KB	142.5 MB/s	1134.2 MB/s
1MB	142.2 MB/s	1056.2 MB/s

These benchmarks ship alongside guacamole and you can see how guacamole performs on your own hardware with:

GOMAXPROCS=1 go test hack.systems/random/guacamole \
    -bench=Guac -benchtime=10s

Summary

Guacamole is a low-level pseudo-random number generator with a unique sequential seed property. This allows tests and benchmarks to model rather large data sets without having to pay a storage cost to materialize the data set. In future posts, we'll explore some other pieces of the hack.systems/random package that build on top of guacamole and demonstrate how the low-level random bytes it produces can be manipulated to build higher-level primitives.