The Randomness Problem: How Lava Lamps Protect the Internet

SciShow is supported by Go to to get 20% off
of an annual Premium subscription. [♩INTRO] You jog onto the field for another NFL football
game or at least your team’s video game avatars
do, which, I mean, let’s face it, is about as
close as you’ll get. The virtual ref tosses his virtual coin to
see who gets dibs on the ball, and…you lose! Once again, you’re stuck letting the computer
receive first. It happens so often it seems unfair. Come to think of it, what if it is? After all, the computer’s following a fixed
program. How could it ever give you a fair, random
coin toss? The whole point of computers is that they’re
predictable! And if those coin tosses weren’t random,
how would you ever know? These questions matter for more than just
games. Getting computers to behave randomly is crucial
for simulations, scientific studies, and cybersecurity. But for all its importance, randomness is
surprisingly hard to define, and even harder to produce. Computers just aren’t that good at being
random. In fact, most computerized randomness is basically
faked; it’s totally predictable. For real, honest-to-goodness randomness, engineers turn to some pretty odd tricks. It turns out a lot of the internet is kept
secure by lava lamps, Geiger counters, and the occasional rooftop
microphone! There are lots of uses for digital randomness. Think of a pollster who wants to call a representative
sample of people, or medical researchers deciding who in their
study gets which treatment. For the statistics to work, they have to choose
the phone numbers or treatment groups randomly. Another example is predicting the behavior
of very complex systems, like weather forecasts or traffic patterns. These systems may not have simple, neat equations that tell you how they’ll
develop. Instead, simulations often model the world as a collection of small randomized components. So a traffic simulator might assume that cars
randomly show up at a highway entrance at an average rate of
10 per minute, then watch how the simulation proceeds. Randomness is also a critical part of basic
network technology. If your laptop sends a packet of data over
the wifi network at the same moment as your roommate’s, the packets collide they garble each other, and each computer
has to resend. But it wouldn’t help anything if each computer,
following the same deterministic program, tried to retransmit at exactly the
same time. You’d just get collisions over and over! Instead, they wait a random amount of time
after a collision. This type of symmetry breaking is impossible without some form of randomness. But the most common use for randomness is
both high-stakes and very necessary: encryption, which requires
secret data that an adversary can’t predict. For example, every time you connect to a secure
website like an online banking portal, your computer
and the server running the site have to agree on encryption
keys. The keys are like a more sophisticated version
of a secret decoder ring: each computer can scramble its messages so
that only someone with the same key can decode it. Obviously, no one should be able to guess
the secret key, or they’ll be able to see your bank account information or even
steal your money. So your computer and the server base the keys
on random numbers to make them nearly impossible for your online nemesis
to guess. Randomness is so valuable that when the RAND
Corporation published a 600-page book with a million random digits
in 1955, it was considered a landmark contribution
to science. Now that computers have gotten faster, the RAND book is mostly useful as a source
of hilarious Amazon reviews. But getting computers to generate numbers
that are actually random is still much harder than you might think. The first problem to solve is how to even
define randomness, because our intuitions about what’s random
aren’t always great. Take this demonstration from paleontologist
and author Stephen Jay Gould. In which image would you guess the dots were
placed totally at random? If you guessed the one on the left, you’re
not alone. Most people see the clumps in the right image
as patterns. But in fact, it’s just the opposite! It’s the even spacing on the left that’s
the result of a pattern: the program that generated the dots was modified to forbid
them from being too close together. Here’s another brain twist: tossing a coin
twenty times is just as likely to produce ten heads and then ten tails as it
is to mix them up. Neither sequence is inherently less random. To define randomness properly, the trick is
to think not just about particular string of numbers, but about an infinite sequence
of them. If a so-called random coin flipper kept producing
clumps of heads and clumps of tails forever, then we could definitively
say it’s not random. Mathematicians have used the concept of infinite
sequences to come up with a few different definitions for randomness,
but the simplest one describes randomness as unpredictability. If you were betting on future digits in the
sequence, you couldn’t find an algorithm a mathematical procedure for a computer to
follow whose predictions would make money in the
long run. The other definitions all turn out to be equivalent
to this: if a sequence is random, you can’t describe
it using some sort of algorithm. Maybe you can start to see the problem here:
if by definition, you can’t use a program to describe a random sequence, how are you supposed to write a program to
generate one? Not only that, but if you try to build a random
number generator, how can you tell if you’ve succeeded? You can’t check a whole infinite sequence
for randomness, or check all possible betting algorithms to
see if any make money. When it comes to generating random numbers,
most of the time, computers make do with pseudo-random number
generators, or PRNGs. The numbers they generate aren’t truly random
… but they’re close. The algorithms take a seed, some value to
start with, generate a number from it, update the seed,
and repeat. For instance, mathematician John von Neumann
suggested the middle square method: you take the seed,
square it, and grab the middle few digits as the next
random number. That number also serves as the next seed. Unfortunately, the middle square method tends
to start looping through the same cycle of numbers. But computer scientists have designed many
more sophisticated algorithms that do better, some with awesome names like
“Fortuna” and “the Mersenne Twister.” To check if a pseudo-random number generator is a good approximation of randomness, you can use a combination of theoretical analysis
and empirical testing. Theoretical analysis looks not at a particular
sequence of numbers, but at the process that produced them. Even if the process wasn’t truly random,
mathematicians can sometimes prove statements like “This algorithm will generate
at least 14 gazillion digits before the sequence wraps around and starts
repeating.” For empirical testing, you just generate some
random numbers and see if the statistics look right, things like whether evens and odds are equally
common. The stats will always be a bit off, so you
can never say for sure that the sequence was randomly generated. But you can at least estimate how likely the
stats you’re seeing should be. Under most statistical tests, good pseudo-random
numbers are indistinguishable from true randomness… if you don’t know the initial seed. That’s good enough for most purposes. I mean, who cares if your virtual football
game isn’t 100% unpredictable? It’s not a big deal for things like medical
research or predicting traffic patterns, either, as long as the numbers produce the
same statistics as randomness. In fact, sometimes pseudo-randomness has real
advantages. A randomized program is easier to troubleshoot
if you can make it do all the same things again by providing the same seed. And in some uses of PRNGs, determinism is
a core design feature. For example, a remote car key fob works by
running a PRNG with a seed known only to it and your car. When you click the unlock button, the fob
radios its next random number to the car, which checks that number against the
next outputs of the PRNG. To a thief, the numbers look unpredictable. But the system only works because the key
and the car produce the same series of numbers from their shared seed. Sometimes, though, PRNGs aren’t enough. Even if the algorithm is flawless, an attacker
who somehow figures out the seed can predict all the numbers. Ironically, the website Hacker News once became a poster child for this problem. Back in 2009, a security researcher realized
that the site’s random number generator, which assigned IDs to logged-in
users, was seeded with the time when the server started up. By triggering a server restart, the attacker
was able to narrow down the possible seeds to a small set of numbers. That let him predict other users’ IDs and
impersonate them. A similar problem a decade earlier allowed another team of researchers to cheat at online
poker. A leaked seed could be especially catastrophic
for a PRNG that was used for encryption: if an attacker uncovered the seed,
they could guess the encryption keys, then go back and decrypt all the previous
messages. So when the stakes are high, people get creative looking for sources of
true randomness. The most common place to look is external
noise factors that someone could in principle influence, but which the software can’t predict. For example, the website has a
rooftop microphone that captures atmospheric noise, unpredictable radio waves coming mostly from
lightning and space. Most computers use more readily available
sources of noise, like the timing between mouse clicks or incoming
network messages. Some software will even ask you to scoot your
mouse around while it’s generating big encryption keys. For the highest-stakes uses, though, you want
intrinsic randomness, something that’s guaranteed unpredictable by the laws of physics no matter what the
attacker knows or does. The Internet giant Cloudflare, which provides
encryption for about 10% of all Internet traffic, points
a video camera at a wall of lava lamps in the lobby of its
San Francisco office. Meanwhile, in its London office, a camera watches three pendulums hanging from
pendulums. Both are examples of chaotic systems: the tiniest inaccuracy in your knowledge of
their configuration will send your future predictions wildly off-base. So those camera streams are effectively random. Intrinsic randomness can also come from smaller-scale
physics. Some hardware random number generators record
tiny fluctuations in the movements of electrons in the circuitry. And Geiger counters trained on decaying hunks
of uranium will click at a rate determined by quantum mechanics, which is
inherently probabilistic. So, by the strict definition, it’s more
likely than not that your football game isn’t truly random. It’s probably flipping a coin based on a
simple PRNG seeded with the time you started playing. Even so, losing a bunch of coin flips is probably
just a brief unlucky streak; you’d be hard-pressed to show any bias with
larger statistical tests. For high-stakes contexts like security, though,
rock-solid randomness is essential, and there’s a whole cottage
industry for supplying it. Lava lamps may not be the height of cool anymore,
but as it turns out, they are still useful for security. Or maybe programmers just really miss the
70s. If you miss 1970s bedroom decor, or even if
you don’t, you can still strengthen your programming
skills at with the Computer Science Fundamentals course. Even if you know nothing about computer science, the lessons and quizzes build on each other. So it’s impossible to finish a lesson without
improving. And if you are a confident computer programmer, the quizzes might still bend your brain a
little bit, and you can always work on problems posed by other
premium members. Brilliant is offering the first 200 SciShow
viewers to go to a 20% discount off of an annual premium subscription. So check it out! You’ll definitely learn and have fun, and you’ll be helping to support SciShow
while you do! So, thanks! [♩OUTRO]


Add a Comment

Your email address will not be published. Required fields are marked *