Reproducible randomness and its applications

cover page by cottonbrow studio

Dilbert: October 25, 2001

While these comic strips may sound absurd, they actually contain a deep insight into random numbers and how they are generated and used. Do not get me wrong; The joke is funny, but there is more to it than you might think.

throwing dice in code

Randomness is tricky in programs – computers are governed by determinism, which means that a computer with identical inputs can produce only one output each time. For true randomness, it requires an external source to obtain sequences of true random numbers. The physical processes involved in generating these sources of entropy are often too slow for general use, and not all hardware allows such true random sources.

There is a faster compromise, though—the algorithm is called Pseudo Random Number Generatoreither PRNG for small.

To generate a sequence of pseudorandom numbers, you need to provide a seed price to the algorithm. The simplest way to think about what this value represents is to imagine a big book full of random numbers. Choosing a seed value means flipping over to a particular page of that book, finding the correct line and column on the page and reading the sequence from left to right, top to bottom, reading as many digits as you need at a time. When you reach the end of the book, you turn to the first page and proceed from there.

It may surprise you that a printed book source was scientists and engineers who obtained large amounts of random numbers for experimental purposes before the advent of accessible computing machines. The book contained sequences that were guaranteed to satisfy all the important randomness criteria known at the time. You can even buy a used book by them and try it yourself! It definitely throws rolling dice or coins.

A million base-10 digits aren’t really that many when you consider modern computers, though – the entire sequence would fit on a single floppy disk. The problem with such a relatively short sequence is that it loops back on its own once enough digits are used up. To generate a 64-bit floating number you need to use 20 base-10 digits at a time (


IheyYes10264I20log_{10} 2^{64} \approx 20

), which means that it can generate random number sequences only 50,000 numbers before the cycle starts again.

PRNGs work a little differently. A seed value sets the internal state which is used to shuffle the bits each time a new number is generated in the sequence. The idea is that a good PRNG algorithm will generate an impossible-to-guess sequence, while satisfying the criterion tests for as much randomness as possible. That way, the PRNG effectively writes a book of numbers while you’re reading it, but the book doesn’t need to be stored anywhere, and each seed value produces a separate book.

Note: Some PRNGs may appear random, but given its substantial output, researchers have managed to deduce the internal state of the PRNG so the sequence can be guessed, which is why it is cryptographically secure. It is important to use PRNGs with true random numbers.

Depending on the algorithm chosen, the book written by a PRNG can be very long –

219937,12^{19937}-1

bits in the case Mersenne Twister Used in the examples below. But there are as many different sequences as the seed value, so you can be sure not to run out of fresh unexpected sequences of numbers any time soon. Importantly, the same seed value will always produce the same book.

unpredictable yet reproducible

To see this in action, let’s take a look at the comparison between a seeded PRNG and a built-in Math.random() Functions in Javascript:

In this example, the left column of names is chosen using the values ​​from the Mersenne Twister generator set with the initial seed value from the input field. In the right column, the selection is done using Math.random() Which is seeded for you by the browser (and has no way for you to configure).

You will notice that each time you run the example again the left column always produces the same list of names given the same seed value. The right column generates a new list of names each time the page loads or you press a button.

The problem with the solution on the right is that after you press the button, the original pseudo-random sequence is gone and there is no way to reproduce it. If you spot an interesting specimen and you accidentally miss your opportunity to capture that state, you’re out of luck. With the seed value, all you have to do is capture that number and use it to rebuild the state.

This property makes things like game world sharing possible by just sharing a short string or number and running the same algorithm to produce the same results everywhere.

At the end of the day, if you wanted randomness each time you booted, you could choose a new seed each time using the external value Date.now(), Math.random() * Number.MAX_SAFE_INTEGER or any other medium.

a practical example

Sometimes, the sheer number of different combinations of inputs does not make it possible to thoroughly review the runtime behavior of a program. In those cases, having a generator on hand to generate lots of unexpected inputs can help solve problems you didn’t expect.

To make things easier to manage and write, let’s explore a construct in JavaScript called generators, which will let us easily control and organize generating huge amounts of data without needing to write hundreds of loops.

Javascript generators and generator functions

To create a generator, we can take advantage of the JavaScript generator function. The nice thing about generator functions is that they evaluate lazily, which means users pull values ​​from them rather than prematurely populating a collection like a function. A generator function can use yield To emit a value, which pauses its execution until the next value is requested.

Note: the difference between generator work And generator Is this calling a . Is generator function will come back parent, a parent is an object with at least next() method that returns an object with properties { done, value }, If done Is true, then the generator has finished generating its sequence and will no longer emit values, which also eliminates the iteration. This API is called iterator protocol and any object that can enumerate this API can be used for...of the noose.

Let’s start with a simple example that takes an initial number and an increment step to generate an infinite sequence of integers:

function* int(start = 0, step = 1) {
  let i = start;
  while(true) {
    yield i;
    i += step;
  }
}

const generator = int();

for (const n of generator) {
  console.log(n);
}
enter fullscreen mode

exit fullscreen mode

In this example, calling int() Returns generator. when for...of the loop calls next() method, runs the generator function body until it is hit yield The keyword that stops the generator and emits the next value. If the generator function returns, no more values ​​are emitted.

Note: There is much more to the generator’s functions than is described here. yield Returns the values ​​passed to the statements next() Method to provide two-way communication between generator and consumer. One generator can also delegate to other generators using the function yield*, they can produce Promisetake advantage of s and await Enable statements and asynchronous iteration.

take

Most of the time, we want to limit the amount of items in the generated sequence without writing for loop, so we would define a take Generator function that terminates a sequence when the limit is reached or the underlying generator ends:

const take = function* (count, generator) {
  for (let i = 0; i < count; i++) {
    const { value, done } = generator.next();
    if (done) return;
    yield value;
  }
  generator.return?.();
};

const first10Numbers = take(10, int(1));

// will print [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
console.log([...first10Numbers]); 
enter fullscreen mode

exit fullscreen mode

pay attention return?.() The call to the wrapped generator signals that we are consuming its values. This enables generators to clean up any side effects they may have created, such as closing network connections, or clearing up any open resources such as I/O handles or caches. The null coalescing operator is there for safety in cases where the generator does not provide that function.

zip

For convenience, let’s say a . also define zip generator. Zip in this context means taking multiple generators, requesting a value from each of them and emitting one value as an array of all the values. This makes it easy to group the results from multiple generators into a single value:

const zip = function* (...generators) {
  while (true) {
    const results = [];
    for (const g of generators) {
      const v = g.next();
      if (v.done) {
        generators.forEach(g => g.return?.());
        return;
      }
      results.push(v.value);
    }
    yield results;
  }
}

const sumsTo10 = zip(int(0, 1), int(10, -1));

// will print [[0, 10], [1, 9], [2, 8], [3, 7], [4, 6]]
console.log([...take(5, sumsTo10)]);
enter fullscreen mode

exit fullscreen mode

map

Next we’ll make one mapgenerator that returns a generator that translates values ​​from the underlying generator using the provided mapping function:

const map = function*(mapper, generator) {
  for (const value of generator)
    yield mapper(value);
}

/**
 * Returns an ordinal representation of an integer. Only works
 * with positive integers.
 */
const intToOrdinal = n => {
  if (n % 10 == 1 && n % 100 != 11) {
    return `${n}st`;
  } else if (n % 10 == 2 && n % 100 != 12) {
    return `${n}nd`;
  } else if (n % 10 == 3 && n % 100 != 13) {
    return `${n}rd`;
  }
  return `${n}th`
};

const ordinals = map(intToOrdinal, int(1));

// will print ['1st', '2nd', '3rd', '4th', '5th']
console.log([...take(5, ordinals)]);
enter fullscreen mode

exit fullscreen mode

Creating a Random Number-Driven Generator

So far, we’ve only worked with regular sequences, but that doesn’t mean we can’t use our PRNG to generate an (almost) never-ending stream of random numbers:

import mt from 'mersennetwister';

const rnd = function* (seed) {
  const prng = new mt(seed);
  while(true) yield prng.real();
}

// will always print [1125387415, 2407456957, 681542492]
console.log([...take(3, rnd(1337))]);
enter fullscreen mode

exit fullscreen mode

To make it more reusable, let’s create a generator that takes an array and seeds values ​​and returns a generator that will randomly pick values ​​from that array:

import mt from 'mersennetwister';

const fromArray = function* (arr, seed) {
  const prng = new mt(seed);
  while(true) yield arr[prng.int() % arr.length];
}

const sample = ['apple', 'pear', 'banana', 'kiwi'];

// will always print ["kiwi", "pear", "apple"]
console.log([...take(3, fromArray(sample, 1337))]);
enter fullscreen mode

exit fullscreen mode

Put it all together

With these utilities, we can compile a sample data set as arrays to generate fake personal information to use in our apps. We start by creating separate generators for first and last names:

import * as data from './data/people';
import { fromArray } from './utils';

export const firstNames = (seed) => fromArray(data.firstNames, seed);

export const lastNames = (seed) => fromArray(data.lastNames, seed);

export const fullNames = (seed) => map(
  n => n.join(' '),
  zip(firstNames(seed), lastNames(seed + 10))
);
enter fullscreen mode

exit fullscreen mode

So what is going on here? firstNames And lastNames return generators for the first and last names, respectively, and fullNames Composes them and returns a generator that emits values ​​with both joining with a space.

We can use these generators to quickly create a fake application that displays the generated personal information as a list of avatars:

This example is immediately usable as a review tool, a performance benchmark, and a smoke test tool. You can instantly look up a myriad of first and last name combinations and generate a ton of them to measure how fast they render. For example, there is a bug in the mapping function that removes initials from generated names which is immediately apparent:

Screenshot highlighting a bug in one of the avatar components

Hint: JavaScript strings are encoded using UTF-16, index access in strings always returns a 16-bit value (a code unit) with that limit. If you think you can crack it, fork the example and post your solution in the comments!

Being able to easily reproduce this situation by sharing just a few parameters means that the developer can now figure out where the fault is without having to guess where the problem is.

Adopting a Preferred PRNG in Your Own Project

While this generator-based approach can work well when building your own data generation pipeline or specific data sets (it has on several occasions for me on large business projects), sometimes the adoption of other generation libraries which can be easy to provide sensible and realistic data already included in the example set. Here are some libraries I’ve used in the past to take advantage of seeded randomness:

  • faker — Controversial Grandfather of Fake Data Generation
  • fiona – A lean library with a similar feature set

out of sight

While PRNG sequences are useful for discrete data generation, they can be used as a tool to combine them with other techniques such as interpolation that produce contiguous multi-dimensional regions. These can be used to create textures that can mimic natural materials or model organic-looking structures based on a few simple mathematical expressions. But it is a topic that we may address in a future article.

Leave a Comment