Bernoulli process

In

discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variables X_i are identically distributed and independent. Prosaically, a Bernoulli process is a repeated coin flipping, possibly with an unfair coin (but with consistent unfairness). Every variable X_i in the sequence is associated with a Bernoulli trial or experiment. They all have the same Bernoulli distribution. Much of what can be said about the Bernoulli process can also be generalized to more than two outcomes (such as the process for a six-sided die); this generalization is known as the Bernoulli scheme

.

The problem of determining the process, given only a limited sample of Bernoulli trials, may be called the problem of checking whether a coin is fair.

Definition

A Bernoulli process is a finite or infinite sequence of

independent random variables

X₁, X₂, X₃, ..., such that

for each i, the value of X_i is either 0 or 1;
for all values of ${\textstyle i}$ , the probability p that X_i = 1 is the same.

In other words, a Bernoulli process is a sequence of

independent identically distributed Bernoulli trials

.

Independence of the trials implies that the process is memoryless, in which past event frequencies have no influence on about future event probability frequencies. In most instances the true value of p is unknown, therefore we use past frequencies to assess/forecast/estimate future events & their probabilities indirectly via applying probabilistic inference upon p.

If the process is infinite, then from any point the future trials constitute a Bernoulli process identical to the whole process, the fresh-start property.

Interpretation

The two possible values of each X_i are often called "success" and "failure". Thus, when expressed as a number 0 or 1, the outcome may be called the number of successes on the ith "trial".

Two other common interpretations of the values are true or false and yes or no. Under any interpretation of the two values, the individual variables X_i may be called Bernoulli trials with parameter p.

In many applications time passes between trials, as the index i increases. In effect, the trials X₁, X₂, ... X_i, ... happen at "points in time" 1, 2, ..., i, .... That passage of time and the associated notions of "past" and "future" are not necessary, however. Most generally, any X_i and X_j in the process are simply two from a set of random variables indexed by {1, 2, ..., n}, the finite cases, or by {1, 2, 3, ...}, the infinite cases.

One experiment with only two possible outcomes, often referred to as "success" and "failure", usually encoded as 1 and 0, can be modeled as a Bernoulli distribution.^[1] Several random variables and probability distributions beside the Bernoullis may be derived from the Bernoulli process:

The number of successes in the first n trials, which has a binomial distribution B(n, p)
The number of failures needed to get r successes, which has a negative binomial distribution NB(r, p)
The number of failures needed to get one success, which has a geometric distribution NB(1, p), a special case of the negative binomial distribution

The negative binomial variables may be interpreted as random waiting times.

Formal definition

The Bernoulli process can be formalized in the language of probability spaces as a random sequence of independent realisations of a random variable that can take values of heads or tails. The state space for an individual value is denoted by $2=\{H,T\}.$

Borel algebra

Consider the

countably infinite direct product

of copies of

2=\{H,T\}

. It is common to examine either the one-sided set

\Omega =2^{\mathbb {N} }=\{H,T\}^{\mathbb {N} }

or the two-sided set

\Omega =2^{\mathbb {Z} }

. There is a natural

Borel algebra

. This algebra is then commonly written as

(\Omega ,{\mathcal {B}})

where the elements of

{\mathcal {B}}

are the finite-length sequences of coin flips (the cylinder sets).

Bernoulli measure

If the chances of flipping heads or tails are given by the probabilities $\{p,1-p\}$ , then one can define a natural measure on the product space, given by $P=\{p,1-p\}^{\mathbb {N} }$ (or by $P=\{p,1-p\}^{\mathbb {Z} }$ for the two-sided process). In another word, if a

discrete random variable X has a Bernoulli distribution with parameter p, where 0 ≤ p ≤ 1, and its probability mass function

is given by

pX(1)=P(X=1)=p

and

pX(0)=P(X=0)=1-p

.

We denote this distribution by Ber(p).[1]

Given a cylinder set, that is, a specific sequence of coin flip results $[\omega _{1},\omega _{2},\cdots \omega _{n}]$ at times $1,2,\cdots ,n$ , the probability of observing this particular sequence is given by

P([\omega _{1},\omega _{2},\cdots ,\omega _{n}])=p^{k}(1-p)^{n-k}

where k is the number of times that H appears in the sequence, and n−k is the number of times that T appears in the sequence. There are several different kinds of notations for the above; a common one is to write

P(X_{1}=x_{1},X_{2}=x_{2},\cdots ,X_{n}=x_{n})=p^{k}(1-p)^{n-k}

where each $X_{i}$ is a binary-valued random variable with $x_{i}=[\omega _{i}=H]$ in Iverson bracket notation, meaning either $1$ if $\omega _{i}=H$ or $0$ if $\omega _{i}=T$ . This probability $P$ is commonly called the

Bernoulli measure.^[2]

Note that the probability of any specific, infinitely long sequence of coin flips is exactly zero; this is because $\lim _{n\to \infty }p^{n}=0$ , for any $0\leq p<1$ . A probability equal to 1 implies that any given infinite sequence has

measure zero. Nevertheless, one can still say that some classes of infinite sequences of coin flips are far more likely than others, this is given by the asymptotic equipartition property

.

To conclude the formal definition, a Bernoulli process is then given by the probability triple $(\Omega ,{\mathcal {B}},P)$ , as defined above.

Law of large numbers, binomial distribution and central limit theorem

Let us assume the canonical process with $H$ represented by $1$ and $T$ represented by $0$ . The law of large numbers states that the average of the sequence, i.e., ${\bar {X}}_{n}:={\frac {1}{n}}\sum _{i=1}^{n}X_{i}$ , will approach the

expectation value

of flipping heads, assumed to be represented by 1, is given by

p

. In fact, one has

\mathbb {E} [X_{i}]=\mathbb {P} ([X_{i}=1])=p,

for any given random variable $X_{i}$ out of the infinite sequence of Bernoulli trials that compose the Bernoulli process.

One is often interested in knowing how often one will observe H in a sequence of n coin flips. This is given by simply counting: Given n successive coin flips, that is, given the set of all possible strings of length n, the number N(k,n) of such strings that contain k occurrences of H is given by the binomial coefficient

N(k,n)={n \choose k}={\frac {n!}{k!(n-k)!}}

If the probability of flipping heads is given by p, then the total probability of seeing a string of length n with k heads is

\mathbb {P} ([S_{n}=k])={n \choose k}p^{k}(1-p)^{n-k},

where $S_{n}=\sum _{i=1}^{n}X_{i}$ . The probability measure thus defined is known as the Binomial distribution.

As we can see from the above formula that, if n=1, the Binomial distribution will turn into a Bernoulli distribution. So we can know that the Bernoulli distribution is exactly a special case of Binomial distribution when n equals to 1.

Of particular interest is the question of the value of $S_{n}$ for a sufficiently long sequences of coin flips, that is, for the limit $n\to \infty$ . In this case, one may make use of Stirling's approximation to the factorial, and write

n!={\sqrt {2\pi n}}\;n^{n}e^{-n}\left(1+{\mathcal {O}}\left({\frac {1}{n}}\right)\right)

Inserting this into the expression for P(k,n), one obtains the Normal distribution; this is the content of the central limit theorem, and this is the simplest example thereof.

The combination of the law of large numbers, together with the central limit theorem, leads to an interesting and perhaps surprising result: the

Kolmogorov 0-1 law

.

The size of this set is interesting, also, and can be explicitly determined: the logarithm of it is exactly the

entropy

of the Bernoulli process. Once again, consider the set of all strings of length n. The size of this set is

2^{n}

. Of these, only a certain subset are likely; the size of this set is

2^{nH}

for

H\leq 1

. By using Stirling's approximation, putting it into the expression for P(k,n), solving for the location and width of the peak, and finally taking

n\to \infty

one finds that

H=-p\log _{2}p-(1-p)\log _{2}(1-p)

This value is the

Bernoulli entropy

of a Bernoulli process. Here, H stands for entropy; not to be confused with the same symbol H standing for heads.

isomorphism of dynamical systems. The question long defied analysis, but was finally and completely answered with the Ornstein isomorphism theorem. This breakthrough resulted in the understanding that the Bernoulli process is unique and universal; in a certain sense, it is the single most random process possible; nothing is 'more' random than the Bernoulli process (although one must be careful with this informal statement; certainly, systems that are mixing

are, in a certain sense, "stronger" than the Bernoulli process, which is merely ergodic but not mixing. However, such processes do not consist of independent random variables: indeed, many purely deterministic, non-random systems can be mixing).

Dynamical systems

The Bernoulli process can also be understood to be a

ergodic system and specifically, a measure-preserving dynamical system, in one of several different ways. One way is as a shift space, and the other is as an odometer

. These are reviewed below.

Bernoulli shift

One way to create a dynamical system out of the Bernoulli process is as a shift space. There is a natural translation symmetry on the product space $\Omega =2^{\mathbb {N} }$ given by the shift operator

T(X_{0},X_{1},X_{2},\cdots )=(X_{1},X_{2},\cdots )

The Bernoulli measure, defined above, is translation-invariant; that is, given any cylinder set $\sigma \in {\mathcal {B}}$ , one has

P(T^{-1}(\sigma ))=P(\sigma )

and thus the

Bernoulli measure is a Haar measure; it is an invariant measure

on the product space.

Instead of the probability measure $P:{\mathcal {B}}\to \mathbb {R}$ , consider instead some arbitrary function $f:{\mathcal {B}}\to \mathbb {R}$ . The pushforward

f\circ T^{-1}

defined by $\left(f\circ T^{-1}\right)(\sigma )=f(T^{-1}(\sigma ))$ is again some function ${\mathcal {B}}\to \mathbb {R} .$ Thus, the map $T$ induces another map ${\mathcal {L}}_{T}$ on the space of all functions ${\mathcal {B}}\to \mathbb {R} .$ That is, given some $f:{\mathcal {B}}\to \mathbb {R}$ , one defines

{\mathcal {L}}_{T}f=f\circ T^{-1}

The map ${\mathcal {L}}_{T}$ is a

linear operator

, as (obviously) one has

{\mathcal {L}}_{T}(f+g)={\mathcal {L}}_{T}(f)+{\mathcal {L}}_{T}(g)

and

{\mathcal {L}}_{T}(af)=a{\mathcal {L}}_{T}(f)

for functions

f,g

and constant

a

. This linear operator is called the

Frobenius–Perron eigenvalue

, and in this case, it is 1. The associated eigenvector is the invariant measure: in this case, it is the Bernoulli measure. That is,

{\mathcal {L}}_{T}(P)=P.

If one restricts ${\mathcal {L}}_{T}$ to act on polynomials, then the eigenfunctions are (curiously) the

Bernoulli polynomials!^[3]^[4]

This coincidence of naming was presumably not known to Bernoulli.

The 2x mod 1 map

The map T : [0,1) → [0,1), $x\mapsto 2x{\bmod {1}}$ preserves the Lebesgue measure.

The above can be made more precise. Given an infinite string of binary digits $b_{0},b_{1},\cdots$ write

y=\sum _{n=0}^{\infty }{\frac {b_{n}}{2^{n+1}}}.

The resulting $y$ is a real number in the unit interval $0\leq y\leq 1.$ The shift $T$ induces a homomorphism, also called $T$ , on the unit interval. Since $T(b_{0},b_{1},b_{2},\cdots )=(b_{1},b_{2},\cdots ),$ one can see that $T(y)=2y{\bmod {1}}.$ This map is called the dyadic transformation; for the doubly-infinite sequence of bits $\Omega =2^{\mathbb {Z} },$ the induced homomorphism is the Baker's map.

Consider now the space of functions in $y$ . Given some $f(y)$ one can find that

\left[{\mathcal {L}}_{T}f\right](y)={\frac {1}{2}}f\left({\frac {y}{2}}\right)+{\frac {1}{2}}f\left({\frac {y+1}{2}}\right)

Restricting the action of the operator ${\mathcal {L}}_{T}$ to functions that are on polynomials, one finds that it has a

discrete spectrum

given by

{\mathcal {L}}_{T}B_{n}=2^{-n}B_{n}

where the $B_{n}$ are the Bernoulli polynomials. Indeed, the Bernoulli polynomials obey the identity

{\frac {1}{2}}B_{n}\left({\frac {y}{2}}\right)+{\frac {1}{2}}B_{n}\left({\frac {y+1}{2}}\right)=2^{-n}B_{n}(y)

The Cantor set

Note that the sum

y=\sum _{n=0}^{\infty }{\frac {b_{n}}{3^{n+1}}}

gives the Cantor function, as conventionally defined. This is one reason why the set $\{H,T\}^{\mathbb {N} }$ is sometimes called the Cantor set.

Odometer

Another way to create a dynamical system is to define an

carry bits as the odometer rolls over. This is nothing more than base-two addition on the set of infinite strings. Since addition forms a group, and the Bernoulli process was already given a topology, above, this provides a simple example of a topological group

.

In this case, the transformation $T$ is given by

T\left(1,\dots ,1,0,X_{k+1},X_{k+2},\dots \right)=\left(0,\dots ,0,1,X_{k+1},X_{k+2},\dots \right).

It leaves the Bernoulli measure invariant only for the special case of $p=1/2$ (the "fair coin"); otherwise not. Thus, $T$ is a

measure preserving dynamical system in this case, otherwise, it is merely a conservative system

.

Bernoulli sequence

The term Bernoulli sequence is often used informally to refer to a realization of a Bernoulli process. However, the term has an entirely different formal definition as given below.

Suppose a Bernoulli process formally defined as a single random variable (see preceding section). For every infinite sequence x of coin flips, there is a sequence of integers

\mathbb {Z} ^{x}=\{n\in \mathbb {Z} :X_{n}(x)=1\}\,

called the Bernoulli sequence^{[verification needed]} associated with the Bernoulli process. For example, if x represents a sequence of coin flips, then the associated Bernoulli sequence is the list of natural numbers or time-points for which the coin toss outcome is heads.

So defined, a Bernoulli sequence $\mathbb {Z} ^{x}$ is also a random subset of the index set, the natural numbers $\mathbb {N}$ .

Almost all Bernoulli sequences $\mathbb {Z} ^{x}$ are ergodic sequences.^{[verification needed]}

Randomness extraction

From any Bernoulli process one may derive a Bernoulli process with p = 1/2 by the

von Neumann extractor, the earliest randomness extractor

, which actually extracts uniform randomness.

Basic von Neumann extractor

Represent the observed process as a sequence of zeroes and ones, or bits, and group that input stream in non-overlapping pairs of successive bits, such as (11)(00)(10)... . Then for each pair,

if the bits are equal, discard;
if the bits are not equal, output the first bit.

This table summarizes the computation.

input	output
00	discard
01	0
10	1
11	discard

For example, an input stream of eight bits 10011011 would by grouped into pairs as (10)(01)(10)(11). Then, according to the table above, these pairs are translated into the output of the procedure: (1)(0)(1)() (=101).

In the output stream 0 and 1 are equally likely, as 10 and 01 are equally likely in the original, both having probability p(1−p) = (1−p)p. This extraction of uniform randomness does not require the input trials to be independent, only

uncorrelated. More generally, it works for any exchangeable sequence

of bits: all sequences that are finite rearrangements are equally likely.

The von Neumann extractor uses two input bits to produce either zero or one output bits, so the output is shorter than the input by a factor of at least 2. On average the computation discards proportion p² + (1 − p)² of the input pairs(00 and 11), which is near one when p is near zero or one, and is minimized at 1/4 when p = 1/2 for the original process (in which case the output stream is 1/4 the length of the input stream on average).

Von Neumann (classical) main operation pseudocode:

if (Bit1 ≠ Bit2) {
   output(Bit1)
}

Iterated von Neumann extractor

This decrease in efficiency, or waste of randomness present in the input stream, can be mitigated by iterating the algorithm over the input data. This way the output can be made to be "arbitrarily close to the entropy bound".^[5]

The iterated version of the von Neumann algorithm, also known as advanced multi-level strategy (AMLS),^[6] was introduced by Yuval Peres in 1992.^[5] It works recursively, recycling "wasted randomness" from two sources: the sequence of discard-non-discard, and the values of discarded pairs (0 for 00, and 1 for 11). It relies on the fact that, given the sequence already generated, both of those sources are still exchangeable sequences of bits, and thus eligible for another round of extraction. While such generation of additional sequences can be iterated infinitely to extract all available entropy, an infinite amount of computational resources is required, therefore the number of iterations is typically fixed to a low value – this value either fixed in advance, or calculated at runtime.

More concretely, on an input sequence, the algorithm consumes the input bits in pairs, generating output together with two new sequences, () gives AMLS paper notation:

input	output	new sequence 1(A)	new sequence 2(1)
00	none	0	0
01	0	1	none
10	1	1	none
11	none	0	1

(If the length of the input is odd, the last bit is completely discarded.) Then the algorithm is applied recursively to each of the two new sequences, until the input is empty.

Example: The input stream from the AMLS paper, 11001011101110 using 1 for H and 0 for T, is processed this way:

step number	input	output	new sequence 1(A)	new sequence 2(1)
0	(11)(00)(10)(11)(10)(11)(10)	()()(1)()(1)()(1)	(1)(1)(0)(1)(0)(1)(0)	(1)(0)()(1)()(1)()
1	(10)(11)(11)(01)(01)()	(1)()()(0)(0)	(0)(1)(1)(0)(0)	()(1)(1)()()
2	(11)(01)(10)()	()(0)(1)	(0)(1)(1)	(1)()()
3	(10)(11)	(1)	(1)(0)	()(1)
4	(11)()	()	(0)	(1)
5	(10)	(1)	(1)	()
6	()	()	()	()

Starting from step 1, the input is a concatenation of sequence 2 and sequence 1 from the previous step (the order is arbitrary but should be fixed). The final output is ()()(1)()(1)()(1)(1)()()(0)(0)()(0)(1)(1)()(1) (=1111000111), so from 14 bits of input 10 bits of output were generated, as opposed to 3 bits through the von Neumann algorithm alone. The constant output of exactly 2 bits per round per bit pair (compared with a variable none to 1 bit in classical VN) also allows for constant-time implementations which are resistant to timing attacks.

Von Neumann–Peres (iterated) main operation pseudocode:

if (Bit1 ≠ Bit2) {
   output(1, Sequence1)
   output(Bit1)
} else {
   output(0, Sequence1)
   output(Bit1, Sequence2)
}

Another tweak was presented in 2016, based on the observation that the Sequence2 channel doesn't provide much throughput, and a hardware implementation with a finite number of levels can benefit from discarding it earlier in exchange for processing more levels of Sequence1.^[7]

References

^
ISBN 9781852338961
.

ISBN 978-1-84800-047-6
.

^ Pierre Gaspard, "r-adic one-dimensional maps and the Euler summation formula", Journal of Physics A, 25 (letter) L483-L485 (1992).

ISBN 0-7923-5564-4

^
doi:10.1214/aos/1176348543
.

^ "Tossing a Biased Coin" (PDF). eecs.harvard.edu. Archived (PDF) from the original on 2010-03-31. Retrieved 2018-07-28.

doi:10.1109/HST.2016.7495553. Archived
(PDF) from the original on 2019-02-12.

Further reading

Carl W. Helstrom, Probability and Stochastic Processes for Engineers, (1984) Macmillan Publishing Company, New York
ISBN 0-02-353560-1
.

External links

Using a binary tree diagram for describing a Bernoulli process

Discrete time

Bernoulli process

Branching process

Chinese restaurant process

Galton–Watson process

Independent and identically distributed random variables

Markov chain

Moran process

Random walk
Loop-erased

Self-avoiding

Biased

Maximal entropy

Continuous time

Additive process

Airy process

Bessel process

Birth–death process
pure birth

Brownian motion
Bridge

Dyson

Excursion

Fractional

Geometric

Meander

Cauchy process

Contact process

Continuous-time random walk

Cox process

Diffusion process

Empirical process

Feller process

Fleming–Viot process

Gamma process

Geometric process

Hawkes process

Hunt process

Interacting particle systems

Itô diffusion

Itô process

Jump diffusion

Jump process

Lévy process

Local time

Markov additive process

McKean–Vlasov process

Ornstein–Uhlenbeck process

Poisson process
Compound

Non-homogeneous

Quasimartingale

Schramm–Loewner evolution

Semimartingale

Sigma-martingale

Stable process

Superprocess

Telegraph process

Variance gamma process

Wiener process

Wiener sausage

Both

Branching process

Gaussian process

Hidden Markov model (HMM)

Markov process

Martingale
Differences

Local

Sub-

Super-

Random dynamical system

Regenerative process

Renewal process

Stochastic chains with memory of variable length

White noise

Fields and other

Dirichlet process

Gaussian random field

Gibbs measure

Hopfield model

Ising model
Potts model

Boolean network

Markov random field

Percolation

Pitman–Yor process

Point process
Cox

Determinantal

Poisson

Random field

Random graph

Time series models

Autoregressive conditional heteroskedasticity (ARCH) model

Autoregressive integrated moving average (ARIMA) model

Autoregressive (AR) model

Autoregressive–moving-average (ARMA) model

Generalized autoregressive conditional heteroskedasticity (GARCH) model

Moving-average (MA) model

Financial models

Binomial options pricing model

Black–Derman–Toy

Black–Karasinski

Black–Scholes

Chan–Karolyi–Longstaff–Sanders (CKLS)

Chen

Constant elasticity of variance (CEV)

Cox–Ingersoll–Ross (CIR)

Garman–Kohlhagen

Heath–Jarrow–Morton (HJM)

Heston

Ho–Lee

Hull–White

Korn-Kreer-Lenssen

LIBOR market

Rendleman–Bartter

SABR volatility

Vašíček

Wilkie

Actuarial models

Bühlmann

Cramér–Lundberg

Risk process

Sparre–Anderson

Queueing models

Bulk

Fluid

Generalized queueing network

M/G/1

M/M/1

M/M/c

Properties

Càdlàg paths

Continuous

Continuous paths

Ergodic

Exchangeable

Feller-continuous

Gauss–Markov

Markov

Mixing

Piecewise-deterministic

Predictable

Progressively measurable

Self-similar

Stationary

Time-reversible

Limit theorems

Central limit theorem

Donsker's theorem

Doob's martingale convergence theorems

Ergodic theorem

Fisher–Tippett–Gnedenko theorem

Large deviation principle

Law of large numbers (weak/strong)

Law of the iterated logarithm

Maximal ergodic theorem

Sanov's theorem

Lévy
)

Inequalities

Burkholder–Davis–Gundy

Doob's martingale

Doob's upcrossing

Kunita–Watanabe

Marcinkiewicz–Zygmund

Tools

Cameron–Martin formula

Convergence of random variables

Doléans-Dade exponential

Doob decomposition theorem

Doob–Meyer decomposition theorem

Doob's optional stopping theorem

Dynkin's formula

Feynman–Kac formula

Filtration

Girsanov theorem

Infinitesimal generator

Itô integral

Itô's lemma

Karhunen–Loève theorem

Kolmogorov continuity theorem

Kolmogorov extension theorem

Lévy–Prokhorov metric

Malliavin calculus

Martingale representation theorem

Optional stopping theorem

Prokhorov's theorem

Quadratic variation

Reflection principle

Skorokhod integral

Skorokhod's representation theorem

Skorokhod space

Snell envelope

Stochastic differential equation
Tanaka

Stopping time

Stratonovich integral

Uniform integrability

Usual hypotheses

Wiener space

Classical

Abstract

Disciplines

Actuarial mathematics

Control theory

Econometrics

Ergodic theory

Extreme value theory (EVT)

Large deviations theory

Mathematical finance

Mathematical statistics

Probability theory

Queueing theory

Renewal theory

Ruin theory

Signal processing

Statistics

Stochastic analysis

Time series analysis

Machine learning

List of topics

Category

Retrieved from "https://en.wikipedia.org/w/index.php?title=Bernoulli_process&oldid=1296539230"

[:0-1] 
ISBN 9781852338961
.

[klenke-2] ISBN 978-1-84800-047-6
.

[3] Pierre Gaspard, "r-adic one-dimensional maps and the Euler summation formula", Journal of Physics A, 25 (letter) L483-L485 (1992).

[4] ISBN 0-7923-5564-4

[Peres-5] 
doi:10.1214/aos/1176348543
.

[6] "Tossing a Biased Coin" (PDF). eecs.harvard.edu. Archived (PDF) from the original on 2010-03-31. Retrieved 2018-07-28.

[7] :10.1109/HST.2016.7495553. Archived
(PDF) from the original on 2019-02-12.

[1]

[2]

[3]

[4]

[5]

[6]

[7]