Scrambler: Pseudo-random Mappings
By Robert EscrivaImagine you needed a way to turn a set of predictable numbers, say 1 through N, into an unpredictable set of N different numbers. The challenge here is that coming up with a mapping is relatively easy, but making the mapping unpredictable is quite hard.
In this post, we will see how a solution to this problem can allow us to write tests and benchmarks over large data sets without having to materialize the whole data set.
Scrambling Guacamole
Scrambler complements the
guacamole
pseudo-random number generator (PRNG). Guacamole has a unique sequential seed
property that makes the
Seed
function predictable. Sequential numbers will
translate to adjacent blocks of 64 bytes of random data. This is quite useful
for generating one-off random streams, but it creates a new problem for
applications: To generate more than one stream of data, the application must
somehow partition the original guacamole stream into N different streams and use
each of these N seeds to pick a different starting position in the original
stream.
The most direct way to do this would be to divide
UINT64_MAX
by N and start
each of the streams [0, N) at a different multiple of this value. This
maximally spaces the distinct streams, but gives us just one way to divide the
stream.
Scrambler provides us a different approach in that it will turn any 64-bit number into some other 64-bit number in a way that is effectively irreversible. This gives us a tool with which to construct N distinct streams of guacamole: We can pass each number in the range [0, N) through the scrambler to translate it to a unique offset within the output. Even with a trillion distinct streams, we can get an average of 256MB of random data out of each stream before it overlaps with other streams.
Because scrambler is such a good fit for guacamole, it ships with the same package as guacamole:
go get hack.systems/random
We could use the scrambler and two guacamole instances to repeatedly generate the same ten random strings.
package main import ( "encoding/hex" "fmt" "hack.systems/random/guacamole" ) func main() { g1 := guacamole.New() g2 := guacamole.New() s := guacamole.NewScrambler() buf := make([]byte, 8) for { idx := g1.Uint64() % 10 scrambledIdx := s.Scramble(idx) g2.Seed(scrambledIdx) g2.Fill(buf) fmt.Printf("idx=%d scrambled=%20d %s", idx, scrambledIdx, hex.Dump(buf)[8:]) } }
The key here is that we have two different instances of guacamole. The first gives us random indices in the range [0, 10). We can run this index through the scrambler to get a unique index scattered throughout the 64-bit numbers. We can efficiently seek to this index in the second guacamole instance and generate some random bytes. This process will guarantee that we select the same ten strings randomly with replacement indefinitely. If you run the output, it looks something like this:
$ go run scrambler.go | head -10 idx=4 scrambled=12404588102576239538 be bc cf f7 14 7d 83 bd |.....}..| idx=1 scrambled= 7272475945470074791 15 d4 a1 d2 42 92 4a f0 |....B.J.| idx=4 scrambled=12404588102576239538 be bc cf f7 14 7d 83 bd |.....}..| idx=0 scrambled= 5690745928405278072 69 be b7 6d 37 04 eb 84 |i..m7...| idx=8 scrambled= 2201218818704928123 77 33 c4 33 2f 64 f3 c7 |w3.3/d..| idx=4 scrambled=12404588102576239538 be bc cf f7 14 7d 83 bd |.....}..| idx=7 scrambled= 6774274821177524071 bd 1a 5e 6f bc 8e 58 da |..^o..X.| idx=3 scrambled= 8294687906054504177 8a 83 5a 75 23 dc f9 57 |..Zu#..W| idx=7 scrambled= 6774274821177524071 bd 1a 5e 6f bc 8e 58 da |..^o..X.| idx=9 scrambled= 1062469691266938762 e3 98 ad 0c b1 5c ed c0 |.....\..| signal: broken pipe
We can see that in the first ten strings, we have selected strings 4 and 7 twice each and, as promised, they came back with the same random bytes.
If we would like a different set of random bytestrings, we could extend our main function to change the behavior of the scrambler:
s.Change(42)
This call changes the mapping in use by the scrambler to a different mapping.
Change is an extremely expensive operation (more than one hundred times more
expensive than
Scramble
) compared to all the others we are using, so it should
be used sparingly, and probably only on startup.
Benchmarks
Scrambling numbers is relatively expensive compared to simple modular arithmetic. Where a multiply/modulus operation can take 9ns, Scrambler takes approximately 120ns. When choosing whether to put scrambler on the hot path, it is worth weighing this cost with the possibility of something faster, but more predictable like modulus. The Change operation is even more expensive, coming in at more than 34 microseconds per operation, nearly two hundred times that of the Scramble operation.
Operation | Cost per Op |
---|---|
Change | 34 us/op |
Scramble | 129 ns/op |
Modulus | 9.2 ns/op |
These benchmarks ship alongside guacamole and you can see how guacamole performs on your own hardware with:
GOMAXPROCS=1 go test hack.systems/random/guacamole \ -bench=Scrambler -benchtime=10s
Summary
Scrambler is a low-level way to turn one set of random numbers into a different
set of random numbers. This power allows us to turn enumerable sequences into
unpredictable sequences; this disparity allows us to turn predictable,
easy-to-generate values into outputs that are more likely to test our system.
In this post we saw how to combine the scrambler with guacamole to create random
byte strings. Future posts will show how this same technique is leveraged
within
hack.systems/random/armnod
to provide primitives for randomly selecting
strings from a set of strings in constant time, without having to materialize
any of the set in advance.