Bernoulli process
This article includes a list of general references, but it lacks sufficient corresponding inline citations. (September 2011) |
Part of a series on statistics |
Probability theory |
---|
In
The problem of determining the process, given only a limited sample of Bernoulli trials, may be called the problem of checking whether a coin is fair.
Definition
A Bernoulli process is a finite or infinite sequence of
- for each i, the value of Xi is either 0 or 1;
- for all values of , the probability p that Xi = 1 is the same.
In other words, a Bernoulli process is a sequence of
Independence of the trials implies that the process is memoryless. Given that the probability p is known, past outcomes provide no information about future outcomes. (If p is unknown, however, the past informs about the future indirectly, through inferences about p.)
If the process is infinite, then from any point the future trials constitute a Bernoulli process identical to the whole process, the fresh-start property.
Interpretation
The two possible values of each Xi are often called "success" and "failure". Thus, when expressed as a number 0 or 1, the outcome may be called the number of successes on the ith "trial".
Two other common interpretations of the values are true or false and yes or no. Under any interpretation of the two values, the individual variables Xi may be called Bernoulli trials with parameter p.
In many applications time passes between trials, as the index i increases. In effect, the trials X1, X2, ... Xi, ... happen at "points in time" 1, 2, ..., i, .... That passage of time and the associated notions of "past" and "future" are not necessary, however. Most generally, any Xi and Xj in the process are simply two from a set of random variables indexed by {1, 2, ..., n}, the finite cases, or by {1, 2, 3, ...}, the infinite cases.
One experiment with only two possible outcomes, often referred to as "success" and "failure", usually encoded as 1 and 0, can be modeled as a Bernoulli distribution.[1] Several random variables and probability distributions beside the Bernoullis may be derived from the Bernoulli process:
- The number of successes in the first n trials, which has a binomial distribution B(n, p)
- The number of failures needed to get r successes, which has a negative binomial distribution NB(r, p)
- The number of failures needed to get one success, which has a geometric distribution NB(1, p), a special case of the negative binomial distribution
The negative binomial variables may be interpreted as random waiting times.
Formal definition
The Bernoulli process can be formalized in the language of probability spaces as a random sequence of independent realisations of a random variable that can take values of heads or tails. The state space for an individual value is denoted by
Borel algebra
Consider the
Bernoulli measure
If the chances of flipping heads or tails are given by the probabilities , then one can define a natural measure on the product space, given by (or by for the two-sided process). In another word, if a
- and .
We denote this distribution by Ber(p).[1]
Given a cylinder set, that is, a specific sequence of coin flip results at times , the probability of observing this particular sequence is given by
where k is the number of times that H appears in the sequence, and n−k is the number of times that T appears in the sequence. There are several different kinds of notations for the above; a common one is to write
where each is a binary-valued random variable with in Iverson bracket notation, meaning either if or if . This probability is commonly called the
Note that the probability of any specific, infinitely long sequence of coin flips is exactly zero; this is because , for any . A probability equal to 1 implies that any given infinite sequence has
To conclude the formal definition, a Bernoulli process is then given by the probability triple , as defined above.
Law of large numbers, binomial distribution and central limit theorem
Let us assume the canonical process with represented by and represented by . The law of large numbers states that the average of the sequence, i.e., , will approach the
for any given random variable out of the infinite sequence of Bernoulli trials that compose the Bernoulli process.
One is often interested in knowing how often one will observe H in a sequence of n coin flips. This is given by simply counting: Given n successive coin flips, that is, given the set of all possible strings of length n, the number N(k,n) of such strings that contain k occurrences of H is given by the binomial coefficient
If the probability of flipping heads is given by p, then the total probability of seeing a string of length n with k heads is
where . The probability measure thus defined is known as the Binomial distribution.
As we can see from the above formula that, if n=1, the Binomial distribution will turn into a Bernoulli distribution. So we can know that the Bernoulli distribution is exactly a special case of Binomial distribution when n equals to 1.
Of particular interest is the question of the value of for a sufficiently long sequences of coin flips, that is, for the limit . In this case, one may make use of Stirling's approximation to the factorial, and write
Inserting this into the expression for P(k,n), one obtains the Normal distribution; this is the content of the central limit theorem, and this is the simplest example thereof.
The combination of the law of large numbers, together with the central limit theorem, leads to an interesting and perhaps surprising result: the
The size of this set is interesting, also, and can be explicitly determined: the logarithm of it is exactly the
This value is the
Dynamical systems
The Bernoulli process can also be understood to be a
Bernoulli shift
One way to create a dynamical system out of the Bernoulli process is as a shift space. There is a natural translation symmetry on the product space given by the shift operator
The Bernoulli measure, defined above, is translation-invariant; that is, given any cylinder set , one has
and thus the
Instead of the probability measure , consider instead some arbitrary function . The pushforward
defined by is again some function Thus, the map induces another map on the space of all functions That is, given some , one defines
The map is a
If one restricts to act on polynomials, then the eigenfunctions are (curiously) the
This coincidence of naming was presumably not known to Bernoulli.The 2x mod 1 map
The above can be made more precise. Given an infinite string of binary digits write
The resulting is a real number in the unit interval The shift induces a homomorphism, also called , on the unit interval. Since one can see that This map is called the dyadic transformation; for the doubly-infinite sequence of bits the induced homomorphism is the Baker's map.
Consider now the space of functions in . Given some one can find that
Restricting the action of the operator to functions that are on polynomials, one finds that it has a
where the are the Bernoulli polynomials. Indeed, the Bernoulli polynomials obey the identity
The Cantor set
Note that the sum
gives the Cantor function, as conventionally defined. This is one reason why the set is sometimes called the Cantor set.
Odometer
Another way to create a dynamical system is to define an
In this case, the transformation is given by
It leaves the Bernoulli measure invariant only for the special case of (the "fair coin"); otherwise not. Thus, is a
Bernoulli sequence
The term Bernoulli sequence is often used informally to refer to a realization of a Bernoulli process. However, the term has an entirely different formal definition as given below.
Suppose a Bernoulli process formally defined as a single random variable (see preceding section). For every infinite sequence x of coin flips, there is a sequence of integers
called the Bernoulli sequence[verification needed] associated with the Bernoulli process. For example, if x represents a sequence of coin flips, then the associated Bernoulli sequence is the list of natural numbers or time-points for which the coin toss outcome is heads.
So defined, a Bernoulli sequence is also a random subset of the index set, the natural numbers .
Almost all Bernoulli sequences are ergodic sequences.[verification needed]
Randomness extraction
From any Bernoulli process one may derive a Bernoulli process with p = 1/2 by the
Basic von Neumann extractor
Represent the observed process as a sequence of zeroes and ones, or bits, and group that input stream in non-overlapping pairs of successive bits, such as (11)(00)(10)... . Then for each pair,
- if the bits are equal, discard;
- if the bits are not equal, output the first bit.
This table summarizes the computation.
input | output |
---|---|
00 | discard |
01 | 0 |
10 | 1 |
11 | discard |
For example, an input stream of eight bits 10011011 would by grouped into pairs as (10)(01)(10)(11). Then, according to the table above, these pairs are translated into the output of the procedure: (1)(0)(1)() (=101).
In the output stream 0 and 1 are equally likely, as 10 and 01 are equally likely in the original, both having probability p(1−p) = (1−p)p. This extraction of uniform randomness does not require the input trials to be independent, only
The von Neumann extractor uses two input bits to produce either zero or one output bits, so the output is shorter than the input by a factor of at least 2. On average the computation discards proportion p2 + (1 − p)2 of the input pairs(00 and 11), which is near one when p is near zero or one, and is minimized at 1/4 when p = 1/2 for the original process (in which case the output stream is 1/4 the length of the input stream on average).
Von Neumann (classical) main operation pseudocode:
if (Bit1 ≠ Bit2) {
output(Bit1)
}
Iterated von Neumann extractor
This section may contain verify the text.(January 2014) ) |
This decrease in efficiency, or waste of randomness present in the input stream, can be mitigated by iterating the algorithm over the input data. This way the output can be made to be "arbitrarily close to the entropy bound".[5]
The iterated version of the von Neumann algorithm, also known as advanced multi-level strategy (AMLS),[6] was introduced by Yuval Peres in 1992.[5] It works recursively, recycling "wasted randomness" from two sources: the sequence of discard-non-discard, and the values of discarded pairs (0 for 00, and 1 for 11). It relies on the fact that, given the sequence already generated, both of those sources are still exchangeable sequences of bits, and thus eligible for another round of extraction. While such generation of additional sequences can be iterated infinitely to extract all available entropy, an infinite amount of computational resources is required, therefore the number of iterations is typically fixed to a low value – this value either fixed in advance, or calculated at runtime.
More concretely, on an input sequence, the algorithm consumes the input bits in pairs, generating output together with two new sequences:
input | output | new sequence 1 | new sequence 2 |
---|---|---|---|
00 | none | 0 | 0 |
01 | 0 | 1 | none |
10 | 1 | 1 | none |
11 | none | 0 | 1 |
(If the length of the input is odd, the last bit is completely discarded.) Then the algorithm is applied recursively to each of the two new sequences, until the input is empty.
Example: The input stream from above, 10011011, is processed this way:
step number | input | output | new sequence 1 | new sequence 2 |
---|---|---|---|---|
0 | (10)(01)(10)(11) | (1)(0)(1)() | (1)(1)(1)(0) | ()()()(1) |
1 | (11)(10) | ()(1) | (0)(1) | (1)() |
1.1 | (01) | (0) | (1) | () |
1.1.1 | 1 | none | none | none |
1.2 | 1 | none | none | none |
2 | 1 | none | none | none |
From the step of 1 on, the input becomes the new sequence1 of the last step to move on in this process. The output is therefore (101)(1)(0)()()() (=10110), so that from the eight bits of input five bits of output were generated, as opposed to three bits through the basic algorithm above. The constant output of exactly 2 bits per round (compared with a variable 0 to 1 bits in classical VN) also allows for constant-time implementations which are resistant to timing attacks.
Von Neumann–Peres (iterated) main operation pseudocode:
if (Bit1 ≠ Bit2) {
output(1, Sequence1)
output(Bit1)
} else {
output(0, Sequence1)
output(Bit1, Sequence2)
}
Another tweak was presented in 2016, based on the observation that the Sequence2 channel doesn't provide much throughput, and a hardware implementation with a finite number of levels can benefit from discarding it earlier in exchange for processing more levels of Sequence1.[7]
References
- ^ ISBN 9781852338961.
- ISBN 978-1-84800-047-6.
- ^ Pierre Gaspard, "r-adic one-dimensional maps and the Euler summation formula", Journal of Physics A, 25 (letter) L483-L485 (1992).
- ISBN 0-7923-5564-4
- ^ .
- ^ "Tossing a Biased Coin" (PDF). eecs.harvard.edu. Archived (PDF) from the original on 2010-03-31. Retrieved 2018-07-28.
- (PDF) from the original on 2019-02-12.
Further reading
- Carl W. Helstrom, Probability and Stochastic Processes for Engineers, (1984) Macmillan Publishing Company, New York ISBN 0-02-353560-1.