Imagine you needed a way to turn a set of predictable numbers, say 1 through N, into an unpredictable set of N different numbers. The challenge here is that coming up with a mapping is relatively easy, but making the mapping unpredictable is quite hard.

In this post, we will see how a solution to this problem can allow us to write tests and benchmarks over large data sets without having to materialize the whole data set.

Scrambling Guacamole

Scrambler complements the guacamole pseudo-random number generator (PRNG). Guacamole has a unique sequential seed property that makes the Seed function predictable. Sequential numbers will translate to adjacent blocks of 64 bytes of random data. This is quite useful for generating one-off random streams, but it creates a new problem for applications: To generate more than one stream of data, the application must somehow partition the original guacamole stream into N different streams and use each of these N seeds to pick a different starting position in the original stream.

The most direct way to do this would be to divide UINT64_MAX by N and start each of the streams [0, N) at a different multiple of this value. This maximally spaces the distinct streams, but gives us just one way to divide the stream.

Scrambler provides us a different approach in that it will turn any 64-bit number into some other 64-bit number in a way that is effectively irreversible. This gives us a tool with which to construct N distinct streams of guacamole: We can pass each number in the range [0, N) through the scrambler to translate it to a unique offset within the output. Even with a trillion distinct streams, we can get an average of 256MB of random data out of each stream before it overlaps with other streams.

Because scrambler is such a good fit for guacamole, it ships with the same package as guacamole:

go get hack.systems/random

We could use the scrambler and two guacamole instances to repeatedly generate the same ten random strings.

package main

import (
    "encoding/hex"
    "fmt"

    "hack.systems/random/guacamole"
)

func main() {
    g1 := guacamole.New()
    g2 := guacamole.New()
    s := guacamole.NewScrambler()
    buf := make([]byte, 8)

    for {
        idx := g1.Uint64() % 10
        scrambledIdx := s.Scramble(idx)
        g2.Seed(scrambledIdx)
        g2.Fill(buf)
        fmt.Printf("idx=%d scrambled=%20d %s",
            idx, scrambledIdx, hex.Dump(buf)[8:])
    }
}

The key here is that we have two different instances of guacamole. The first gives us random indices in the range [0, 10). We can run this index through the scrambler to get a unique index scattered throughout the 64-bit numbers. We can efficiently seek to this index in the second guacamole instance and generate some random bytes. This process will guarantee that we select the same ten strings randomly with replacement indefinitely. If you run the output, it looks something like this:

$ go run scrambler.go | head -10
idx=4 scrambled=12404588102576239538   be bc cf f7 14 7d 83 bd   |.....}..|
idx=1 scrambled= 7272475945470074791   15 d4 a1 d2 42 92 4a f0   |....B.J.|
idx=4 scrambled=12404588102576239538   be bc cf f7 14 7d 83 bd   |.....}..|
idx=0 scrambled= 5690745928405278072   69 be b7 6d 37 04 eb 84   |i..m7...|
idx=8 scrambled= 2201218818704928123   77 33 c4 33 2f 64 f3 c7   |w3.3/d..|
idx=4 scrambled=12404588102576239538   be bc cf f7 14 7d 83 bd   |.....}..|
idx=7 scrambled= 6774274821177524071   bd 1a 5e 6f bc 8e 58 da   |..^o..X.|
idx=3 scrambled= 8294687906054504177   8a 83 5a 75 23 dc f9 57   |..Zu#..W|
idx=7 scrambled= 6774274821177524071   bd 1a 5e 6f bc 8e 58 da   |..^o..X.|
idx=9 scrambled= 1062469691266938762   e3 98 ad 0c b1 5c ed c0   |.....\..|
signal: broken pipe

We can see that in the first ten strings, we have selected strings 4 and 7 twice each and, as promised, they came back with the same random bytes.

If we would like a different set of random bytestrings, we could extend our main function to change the behavior of the scrambler:

s.Change(42)

This call changes the mapping in use by the scrambler to a different mapping. Change is an extremely expensive operation (more than one hundred times more expensive than Scramble ) compared to all the others we are using, so it should be used sparingly, and probably only on startup.

Benchmarks

Scrambling numbers is relatively expensive compared to simple modular arithmetic. Where a multiply/modulus operation can take 9ns, Scrambler takes approximately 120ns. When choosing whether to put scrambler on the hot path, it is worth weighing this cost with the possibility of something faster, but more predictable like modulus. The Change operation is even more expensive, coming in at more than 34 microseconds per operation, nearly two hundred times that of the Scramble operation.

Operation Cost per Op
Change 34 us/op
Scramble 129 ns/op
Modulus 9.2 ns/op

These benchmarks ship alongside guacamole and you can see how guacamole performs on your own hardware with:

GOMAXPROCS=1 go test hack.systems/random/guacamole \
    -bench=Scrambler -benchtime=10s

Summary

Scrambler is a low-level way to turn one set of random numbers into a different set of random numbers. This power allows us to turn enumerable sequences into unpredictable sequences; this disparity allows us to turn predictable, easy-to-generate values into outputs that are more likely to test our system. In this post we saw how to combine the scrambler with guacamole to create random byte strings. Future posts will show how this same technique is leveraged within hack.systems/random/armnod to provide primitives for randomly selecting strings from a set of strings in constant time, without having to materialize any of the set in advance.