Dictionary of the History of Ideas Studies of Selected Pivotal Ideas |
![]() | V. |
![]() | V. |
![]() | V. |
![]() | V. |
![]() | V. |
![]() | V. |
![]() | V. |
![]() | VII. |
![]() | VII. |
![]() | III. |
![]() | III. |
![]() | I. |
![]() | II. |
![]() | V. |
![]() | V. |
![]() | VI. |
![]() | II. |
![]() | V. |
![]() | V. |
![]() | VII. |
![]() | VII. |
![]() | I. |
![]() | VI. |
![]() | VI. |
![]() | VI. |
![]() | III. |
![]() | III. |
![]() | VI. |
![]() | III. |
![]() | III. |
![]() | III. |
![]() | III. |
![]() | III. |
![]() | III. |
![]() | III. |
![]() | III. |
![]() | III. |
![]() | III. |
![]() | III. |
![]() | III. |
![]() | V. |
![]() | V. |
![]() | III. |
![]() | I. |
![]() | VI. |
![]() | III. |
![]() | VI. |
![]() | I. |
![]() | III. |
![]() | VII. |
![]() | I. |
![]() | I. |
![]() | IV. |
![]() | VI. |
![]() | V. |
![]() | VI. |
![]() | VI. |
![]() | IV. |
![]() | III. |
![]() | V. |
![]() | VI. |
![]() | III. |
![]() | VI. |
![]() | VI. |
![]() | VI. |
![]() | III. |
![]() | VI. |
![]() | VI. |
![]() | VI. |
![]() | VI. |
![]() | II. |
![]() | II. |
![]() | II. |
![]() | VII. | PROBABILITY:OBJECTIVE THEORY |
![]() | IV. |
![]() | IV. |
![]() | V. |
![]() | VI. |
![]() | VI. |
![]() | V. |
![]() | Dictionary of the History of Ideas | ![]() |
PROBABILITY:
OBJECTIVE THEORY
I. THE BEGINNING
1. Games and gambling are as old as human history.
It seems that gambling, a specialty of the human spe-
cies, was spread among virtually all human groups. The
Rig Veda, one of the oldest known poems, mentions
gambling; the Germans of Tacitus' times gambled
heavily, so did the Romans, and so on. All through
history man seems to have been attracted by uncer-
tainty. We can still observe today that as soon as an
“infallible system” of betting is found, the game will
be abandoned or changed to beat the system.
While playing around with chance happenings is
very old, attempts towards any systematic investigation
were slow in coming. Though this may be how most
disciplines develop, there appears to have been a par-
ticular resistance to the systematic investigation of
chance phenomena, which by their very nature seem
opposed to regularity, whereas regularity was generally
considered a necessary condition for the scientific un-
derstanding of any subject.
The Greek conception of science was modelled after
the ideal of Euclidean geometry which is supposedly
derived from a few immediately grasped axioms. It
seems that this rationalistic conception limited philos-
ophers and mathematicians well beyond the Middle
Ages. Friedrich Schiller, in a poem of 1795 says of the
“sage”: Sucht das vertraute Gesetz in des Zufalls
grausenden Wundern/Sucht den ruhenden Pol in der
Erscheinungen Flucht (“Seeks the familiar law in the
dreaded wonders of chance/Looks for the unmoving
pole in the flux of appearances”).
2. However, the hardened gambler, not influenced
by philosophical scruples, could not fail to notice some
sort of long-run regularity in the midst of apparent
irregularity. The use of loaded dice confirms this.
The first “theoretical” work on games of chance
is by Girolamo Cardano (Cardanus), the gambling
scholar: De ludo aleae (written probably around 1560
but not published until 1663). Todhunter describes it
as a kind of “gambler's manual.” Cardano speaks of
chance in terms of the frequency of an event. His
mathematics was influenced by Luca Pacioli.
A contribution by the great Galileo was likewise
stimulated directly by gambling. A friend—probably
the duke of Ferrara—consulted Galileo on the follow-
ing problem. The sums 9 and 10 can be each produced
by three dice, through six different combinations,
namely:
9 = 1 + 2 + 6 = 1 + 3 + 5 = 1 + 4 + 4 = 2+ 2 + 5
= 2 + 3 + 4 = 3 + 3 + 3,
10 = 1 + 3 + 6 = 1 + 4 + 5 = 2 + 2 + 6 = 2 + 3 + 5
= 2 + 4 + 4 = 3 + 3 + 4,
and yet the sum 10 appears more often than the sum
9. Galileo pointed out that in the above enumeration,
for the sum 9, the first, second, and fifth combination
can each appear in 6 ways, the third and fourth in
3 ways, and the last in 1 way; hence, there are alto-
gether 25 ways out of 216 compared to 27 for the sum
10. It is interesting that the “friend” was able to detect
empirically a difference of 1/108 in the frequencies.
3. Of the same type is the well-known question
gambler. It was usual among gamblers to bet even
money that among 4 throws of a true die the “6” would
appear at least once. De Méré concluded that the same
even chance should prevail for the appearance of the
“double 6” in 24 throws (since 6 times 6 is 36 and
4 times 6 is 24). Un problème relatif aux jeux de hasard,
proposé à un austère Janséniste par un homme du
monde a été l'origine du calcul des probabilités (“A
problem in games of chance, proposed to an austere
Jansenist by a man of the world was the origin of
the calculus of probability”), writes S. D. Poisson in
his Recherches sur la probabilité des jugements...
(Paris, 1837). The Chevalier's experiences with the
second type of bet compared unfavorably with those
in the first case. Putting the problem to Blaise Pascal
he accused arithmetic of unreliability. Pascal writes
on this subject to his friend Pierre de Fermat (29 July
1654): Voilà quel était son grand scandale que lui
faisait dire hautement que les propositions [proportions
(?)] n'étaient pas constantes et que l'arithmétique se
démentait (“This was for him a great scandal which
made him say haughtily that the propositions [propor-
tions (?)] are not constant and that arithmetic is self-
contradictory”).
Clearly, this problem is of the same type as that of
Galileo's friend. Again, the remarkable feature is the
gambler's accurate observation of the frequencies.
Pascal's computation might have run as follows. There
are 64 = 1296 different combinations of six signs a, b,
c, d, e, f in groups of four. Of these, 54 = 625 contain
no “a” (no “6”) and, therefore, 1296 - 625 = 671
contain at least one “a,” and 671/1296 = 0.518 = p1 is
the probability for the first bet. A similar computation
gives for the second bet p2 = 0.491, indeed smaller
than p1.
Both Fermat and Pascal, just as had previously
Galileo, found it natural to base their reasoning on
observed frequencies. They were interested in the an-
swers to actual problems and created the simplest
“theory” which was logically sound and explained the
observations.
4. Particularly instructive is another problem exten-
sively discussed in the famous correspondence between
the two eminent mathematicians, the problème des
parties (“problem of points”), which relates to the
question of the just division of the stake between
players if they decide to quit at a moment when neither
has definitely won. Take a simple case. Two players,
A and B, quit at a moment when A needs two points
and B three points to win. Then, reasons Pascal, the
game will certainly be decided in the course of four
more “trials.” He writes down explicitly the combina-
tions which lead to the winning of A, namely aaaa,
aaab, aabb. Here, aaab stands for four different ar-
rangements, namely aaab, aaba,... and similarly aabb
stands for six different arrangements. Hence, 1 + 4 +
6 = 11 arrangements out of 16 lead to the winning
of A and 5 to that of B. The stake should, therefore,
be divided in the ratio 11:5. (It is worthwhile men-
tioning that mathematicians like Roberval and
d'Alembert doubted Pascal's solution.)
The same results were obtained in a slightly different
way by Fermat. The two greatest mathematicians of
their time, Pascal and Fermat, exchanged their dis-
coveries in undisturbed harmony. In the long letter
quoted above, Pascal wrote to Fermat: Je ne doute plus
maintenant que je suis dans la vérité après le rencontre
admirable où je me trouve avec vous.... Je vois bien
que la vérité est la même à Toulouse et à Paris (“I do
not doubt any longer that I have the truth after finding
ourselves in such admirable agreement.... I see that
truth is the same in Toulouse and in Paris”). In connec-
tion with such questions Pascal and Fermat studied
combinations and permutations (Pascal's Traité du tri-
angle arithmétique, 1664) and applied them to various
problems.
5. We venture a few remarks regarding the ideas
on probability of the great philosophers of the seven-
teenth century. “Probability is likeness to be true,” says
Locke. “The grounds of it are in short, these two
following. First, the conformity of anything with our
knowledge, observation, and experience. Secondly, the
testimony of others” (Essay concerning Human Under-
standing, Book IV). This is the empirical viewpoint,
a viewpoint suggested by the observation of gambling
results as well as of deaths, births, and other social
happenings. “But,” continues Keynes “in the meantime
the subject had fallen in the hands of the mathe-
maticians and an entirely new method of approach was
in course of development. It had become obvious that
many of the judgments of probability, which we, in
fact, make do not depend upon past experience in a
way which satisfied the canon laid down by the logi-
cians of Port Royal and by Locke” (“La logique ou
l'art de penser...,” by A. Arnauld, Peter Nicole, and
others, 1662, called the “Port Royal Logic”). As we
have seen, in order to explain observations, the mathe-
maticians created a theory based on the counting of
combinations. The decisive assumption was that the
observed frequency of an event (e.g., of the “9” in
Galileo's problem) be proportional to the corre-
sponding relative number of combinations (there,
25/216).
6. We close our description of the first steps in
probability calculus with one more really great name,
though his fame was not due to his contributions to
our subject: Christian Huygens. Huygens heard through
culty in obtaining reliable information about the prob-
lem and the methods of the two French mathe-
maticians. Eventually, Carcavi sent him the data as
well as Fermat's solution. Fermat even posed to
Huygens further problems which Huygens worked out
and later included as exercises in a work of his own.
In this work, De ratiociniis in aleae ludi (“On reasoning
in games of chance”) of 1657, he organized all he knew
about the new subject. At the end of the work he
included some questions without indicating the method
of solution. “It seems useful to me to leave something
for my readers to think about (if I have any readers)
and this will serve them both as exercises and as a way
of passing the time.” Jakob (James) Bernoulli gave the
solutions and included them in his Ars conjectandi. The
work of Huygens remained for half a century the
introduction to the “Calculus of Probability.”
7. A related type of investigation concerned mor-
tality and annuities. John Graunt started using the
registers of deaths kept in London since 1592, and
particularly during the years of the great plague. He
used his material to make forecasts on population
trends (Natural and Political Observations... upon
the Bills of Mortality, 1661). He may well be considered
as one of the first statisticians.
John de Witt, grand pensioner of Holland, wrote
on similar questions in 1671 but the precise content
of his work is not known. Leibniz was supposed to
have owned a copy and he was repeatedly asked
by Jakob Bernoulli—but without success—to let him
see it.
The year 1693 is the date of a remarkable work by
the astronomer Edward Halley which deals with life
statistics. Halley noticed also the regularity of the
“boys' rate” (percentage of male births) and other
constancies. He constructed a mortality table, based
on “Bills of Mortality” for the city of Breslau, and a
table of the values of an annuity for every fifth year
of age up to the seventieth.
The application of “chance” in such different do-
mains as games of chance (which received dignity
through the names of Pascal, Fermat, and Huygens)
and mortality impressed the scientific world. Leibniz
himself appreciated the importance of the new science
(as seen in his correspondence with Jakob Bernoulli).
However, he did not contribute to it and he objected
to some of his correspondent's ideas.
II. JAKOB BERNOULLI AND THE
LAW OF LARGE NUMBERS
1. The theory of probability consists, on the one
hand, of the consideration and formulation of problems,
including techniques for solving them, and on the other
hand, of general theorems. It is the latter kind which
is of primary interest to the historian of thought. The
intriguing aspect of some of these theorems is that
starting with probabilistic assumptions we arrive at
statements of practical certainty. Jakob Bernoulli was
the first to derive such a theorem and it will be worth-
while to sketch the main lines of argument, using,
however, modern terminology in the interest of
expediency.
2. We consider a binary alternative (coin tossing;
“ace” or “non-ace” with a die; etc.) to this day called
a Bernoulli trial. If q is the “probability of success,”
p = 1 - q that of “failure,” then the probability of
a successes followed by b failures in a + b trials per-
formed with the same die is qapb. This result follows
from multiplication laws of independent probabilities
already found and applied by Pascal and Fermat. The
use of laws of addition and multiplication of proba-
bilities is a step beyond the mere counting of combina-
tions. It is based on the realization that a calculus exists
which parallels and reflects the observed relations be-
tween frequencies.
The above probability qapb holds for any pattern of
a successes and b failures: fssfffsf.... Lumping to-
gether all of these, writing x for a and a + b = n, we
see that the probability pn(x) of x successes and n - x
failures regardless of pattern is
pn(x) = (nx) qxpn-x, x = 0, 1, 2, …, n, (II.1)
where
(nx)
is the number of combinations of n things
in groups of x, and the sum of all pn(x) is 1.
Often we are more interested in the relative number
z = x/n, the frequency of successes. Then
pn(x) = p′n(z) = (nnz) qnzpn(1-z)
This p′n(z)—that is, the function that gives to every
abscissa z the ordinate p′n(z)—has a maximum at a point
zm, called the mode and zm is equal to or very close
to q. In the vicinity of zm the p′n(z), as function of n,
becomes steeper as n increases.
3. It was Bernoulli's first great idea to consider
increasing values of n and a narrow neighborhood of
q or, in other words, to investigate the behavior of
p′n(z) in the neighborhood of z = q as n increases; this
he did at a time when the interest in the “very large”
and the “very small” was just awakening. Secondly,
he realized that we are not really interested in the
value of p′n(z) for any particular value z but rather in
the total probability belonging to all z's in an interval.
This interval was to contain q which, as we remember,
is our original success probability and at the same time
“mean value.”
Now, with ε a very small number, we call Pn the
probability that z lies between q - ε and q + ε, or,
what is the same, that x = nz lie between nq - nε and
nq + n ε. For this Pn one obtains easily the estimate
And from this follows immediately the fundamental
property of Pn:
This result can be expressed in words:
Let q be a given success probability in a single trial:
n trials are performed with the same q and under
conditions of independence. Then, no matter how small
an ε is chosen, as the number n of repetitions increases
indefinitely, the probability Pn that the frequency of
success lie between q - ε and q + ε, approaches 1. (See
Ars conjectandi, Basel [1713], Part IV, pp. 236-37.)
The above theorem expresses a property of “con-
densation,” namely that with increasing n an increasing
proportion of the total probability (which equals 1) is
concentrated in a fixed neighborhood of the original
q. The term “probability” as used by Bernoulli in his
computations is always a ratio of the number of cases
favorable to an occurrence to the number of all possible
cases. About this great theorem, called today the
“Bernoulli Theorem,” Bernoulli said: “... I had con-
sidered it closely for a period of twenty years, and it
is a problem the novelty of which, as well as its high
utility together with its difficulty adds importance and
weight to all other parts of my doctrine” (ibid.). The
three other parts of the work are likewise very valuable
(but perhaps less from a conceptual point of view).
The second presents the doctrine of combinations. (In
this part Bernoulli also introduces the polynomials
which carry his name.)
4. It will be no surprise to the historian of thought
that the admiration we pay to Bernoulli, the mathe-
matician, is not based on his handling of the conceptual
situation. In addition to the above-explained use of a
quotient for a mathematical probability his views are
of the most varied kind, and, obviously, he is not con-
scious of any possible contradiction: “Probability cal-
culus is a general logic of the uncertain.... Probability
is a degree of certainty and differs from certainty as
the part from the whole.... Of two things the one
which owns the greater part of certainty will be the
more probable.... We denote as ars conjectandi the
art of measuring (metiendi) the probability of things
as precisely as possible.... We estimate the proba
bilities according to the number and the weight (vis
probandi) of the reasons for the occurrence of a thing.”
As to this certitude of which probability is a part he
explains that “the certitude of any thing can be con-
sidered objectively and in this sense it relates to the
actual (present, past, or future) existence of the thing
... or subjectively with respect to ourselves and in
this sense it depends on the amount of our knowl-
edge regarding the thing,” and so on. This vague-
ness is in contrast to the modern viewpoint in which,
however, conceptual precision is bought, sometimes
too easily, by completely rejecting uncongenial inter-
pretations.
5. There appears in Bernoulli's work another con-
ceptual issue which deals with the dichotomy between
the so-called direct and inverse problem. The first one
is the type considered above: we know the probability
q and make “predictions” about future observations.
In the inverse problem we tend to establish from an
observed series of results the parameters of the under-
lying process, e.g., to establish the imperfection of a
die. (The procedures directed at the inverse problem
are today usually handled in mathematical statistics
rather than in probability theory proper.) Bernoulli
himself states that his theorem fails to give results in
very important cases: in the study of games of skill,
in the various problems of life-statistics, in problems
connected with the weather—problems where results
“depend on unknown causes which are interconnected
in unknown ways.”
It is a measure of Bernoulli's insight that he not only
recognized the importance of the inverse problem but
definitely planned (ibid., p. 226) to establish for this
problem a theorem similar to the one we formulated
above. This he did not achieve. It is possible that he
hoped to give a proof of the inverse theorem and that
death intercepted him (Bernoulli's Ars conjectandi was
unfinished at the time of his death and was published
only in 1713); or that he was discouraged by critical
remarks of Leibniz regarding inference. It may also
be that he did not distinguish with sufficient clarity
between the two types of problems. For most of his
contemporaries such a distinction did not exist at all;
actually, even an appropriate terminology was lacking.
We owe the first solid progress concerning the inverse
problem to Thomas Bayes. (See Section IV.)
The Bernoulli theorem forms today the very simplest
case of the Laws of Large Numbers (see e.g., R. von
Mises [1964], Ch. IV). The names Poisson, Tchebychev,
Markov, Khintchine, and von Mises should be men-
tioned in this connection. These theorems are also
called “weak” laws of large numbers in contrast to the
more recently established “strong” laws of large num-
bers (due to Borel, Cantelli, Hausdorff, Khintchine,
laws are mainly of mathematical interest.
III. ABRAHAM DE MOIVRE AND THE
CENTRAL LIMIT THEOREM
1. Shortly after the death of Jakob Bernoulli but
before the publication (1713) of his posthumous work
books of two important mathematicians, P. R. Mont-
mort (1673-1719) and A. de Moivre (1677-1754),
appeared. These were Montmort's Essai d'analyse sur
les jeux de hasard (1708 and 1713) and de Moivre's
De mensura sortis... (1711) and the Doctrine of
Chances (1718 and 1738). We limit ourselves to a few
words on the important work of de Moivre.
De Moivre, the first of the great analytic probabilists,
was, as a mathematician, superior to both Jakob
Bernoulli and Montmort. In addition he had the ad-
vantage of being able to use the ideas of Bernoulli and
the algebraic powers of Montmort, which he himself
then developed to an even higher degree. A charming
quotation, taken from the Doctrine of Chances, might
be particularly appreciated by the secretary. “For
those of my readers versed in ordinary arithmetic it
would not be difficult to make themselves masters, not
only of the practical rules in this book but also of more
useful discoveries, if they would take the small pains
of being acquainted with the bare notation of algebra,
which might be done in the hundredth part of the time
that is spent in learning to read shorthand.”
2. In probability proper de Moivre did basic work
on the “duration of a game,” on “the gambler's ruin,”
and on other subjects still studied today. Of particular
importance is his extension of Bernoulli's theorem
which is really much more than an extension. In Sec-
tion II, 3 we called Pn the sum of the 2r + 1 middle
terms of pn(x) where r = nε and pn(x) is given in
Eq.(II.1). In Eq.(II.2) we gave a very simple esti-
mate of Pn. (Bernoulli himself had given a sharper
one but it took him ten printed pages of computa-
tion, and to obtain the desired result the estimate
Eq.(II.2) suffices.)
De Moivre, who had a deep admiration for Bernoulli
and his theorem, conceived the very fruitful idea of
evaluating Pn directly for large values of n, instead of
estimating it by an inequality. For this purpose one
needs an approximation formula for the factorials of
large numbers. De Moivre derived such a formula,
which coincides essentially with the famous Stirling
formula. He then determined Pn “by the artifice of
mechanical quadrature.” He computed particular
values of his asymptotic formula for Pn correct to five
decimals. We shall return to these results in the section
on Laplace. Under the name of the de Moivre-Laplace
formula, the result, most important by itself, became
the starting point of intensive investigations and far-
reaching generalizations which led to what is called
today the central limit theorem of probability calculus
(Section VIII). I. Todhunter, whose work A History of
the Mathematical Theory of Probability... (1865)
ends, however, with Laplace, says regarding de Moivre:
“It will not be doubted that the theory of probability
owes more to him than to any other mathematician
with the sole exception of Laplace.” Our discussion
of the work of this great mathematician is compara-
tively brief since his contributions were more on the
mathematical than on the conceptual side. We men-
tion, however, one more instance whose conceptual
importance is obvious: de Moivre seems to have been
the first to denote a probability by one single letter
(like p or q, etc.) rather than as a quotient of two
integers.
IV. THOMAS BAYES AND
INVERSE PROBABILITY
1. Bayes (1707-61) wrote two basic memoirs, both
published posthumously, in 1763 and 1765, in Vols. 13
and 14 of the Philosophical Transactions of the Royal
Society of London. The title of the first one is: “An
Essay Towards Solving a Problem in the Doctrine of
Chances” (1763). A facsimile of both papers (and of
some other relevant material) was issued in 1940 in
Washington, edited by W. E. Deming and E. C. Molina.
The following is from Molina's comments: “In order
to visualize the year 1763 in which the essay was
published let us recall some history.... Euler, then
56 years of age, was sojourning in Berlin under the
patronage of Frederick the Great, to be followed
shortly by Lagrange, then 27; the Marquis de Con-
dorcet, philosopher and mathematician who later ap-
plied Bayes's theorem to problems of testimony, was
but 20 years old.... Laplace, a mere body of 14, had
still 11 years in which to prepare for his Mémoires of
1774, embodying his first ideas on the “probability of
causes,” and had but one year short of half a century
to bring out the first edition of the Théorie analytique
des probabilités (1812) wherein Bayes's theorem
blossomed forth in its most general form.” (See, how-
ever, the end of this section.)
2. We explain first the concept of conditional prob-
ability introduced by Bayes. Suppose that of a certain
group of people 90% = P(A) own an automobile and
9% = P(A,B) own an automobile and a bicycle. We
call P(B|A) the conditional probability of owning a
bicycle for people who are known to own also a car.
If P(A) ≠ 0, then
P(B|A) = P(A,B) / P(A)
A. (This will be explained further in Section VII,9.)
In our example
P(B|A) = 9/100/90/100 = 1/10 ;
hence, P(B|A. We may write (IV.1) as
P(A,B) = P(A) · P(B|A)
The compound probability of owning both a car and
a bicycle equals the probability of owning a car times
the conditional probability of owning a bicycle, given
that the person owns a car. Of course, the set AB is
a subset of the set A.
3. We try now to formulate some kind of inverse
to a Bernoulli problem. (The remainder of this section
may not be easy for a reader not schooled in mathe-
matical thinking. A few rathr subtle distinctions will
be needed; however, the following sections will again
be easier.) Some game is played n times and n1
“successes” (e.g., n1 “aces” in n tossings of a die) are
observed. We consider now as known the numbers n
and n1 (more generally, the statistical result) and would
like to make some inference regarding the unknown
success-chance of “ace.” It is quite clear that if we
know nothing but n and n1 and if these numbers are
small, e.g., n = 10, n1 = 7, we cannot make any
inference. Denote by wn(x,n1) the compound proba-
bility that the die has ace-probability x and gave n1
success out of n. Then the conditional probability
of x, given n1, which we call qn(x|n1) equals by (IV.1):
qn (x|n1) = wn (x,n1) / ƃ 01 wn (x,n1) dx
Here, x is taken as a continuous variable, i.e., it can
take any value between 0 and 1. The ʃ01wn(x,n1)dx
is our P(A). It is to be replaced by ∑xwn(x,n1) if x is
a discrete variable which can, e.g., take on only one
of the 13 values 0, 1/12, 2/12,..., 11/12, 1.
Let us analyze wn(x,n1). With the notation of Sec. II.1
we obtain
pn(n1|x) = (nn1)xn1 (1 - x)n-n1
, the con-
ditional probability of n1, given that the success chance
(e.g., the chance of ace) has the value x. Therefore,
wn(x,n1) = v(x)pn(n1|x).
Here v(x) is the prior probability or prior chance, the
chance—prior to the present statistical investiga-
tion—that the ace-probability has the value x. Sub-
stituting (IV.4) into (IV.3) we have
qn(x|n1) = v(x)pn(n1|x) / ʃ01 v(x)pn(n1|x)dx, (IV.5)
where, dependent on the problem, the integral in the
denominator may be replaced by a sum. This is Bayes's
“inversion formula.” If we know v(x) and pn(n1|x) we
can compute qn(x|n1). Clearly, we have to have some
knowledge of v(x) in order to evaluate Eq.(IV.5). We
note also that the problem must be such that x is a
random variable, i.e., that the assumption of many
possible x's which are distributed in a probability
distribution makes sense (compare end of Section IV,
6, below).
4. In some problems it may be justified to assume
that v(x) be constant, i.e., that v has the same value
for all x. (This was so for the geometric problem which
Bayes himself considered.) Boole spoke of this assump-
tion as of a case of “equal distribution of ignorance.”
This is not an accurate denotation since often this
assumption is made not out of ignorance but because
it seems adequate. R. A. Fisher argued with much
passion against “Bayes's principle.” However, Bayes
did not have any such principle. He did not start with
a general formula Eq.(IV.5) and then apply a “princi-
ple” by which v(x) could be neglected. He correctly
solved a particular problem. The general formula,
Eq.(IV.5), is due to Laplace.
How about the v(x) in our original example? Here,
for a body which behaves and looks halfway like a die,
the assumption of constant v(x) makes no sense. If, e.g.,
we bought our dice at Woolworth's we might take v(x)
as a curve which differs from 0 only in the neigh-
borhood of x = 1/6. If we suppose a loaded die another
v(x) may be appropriate. The trouble is, of course, that
sometimes we have no way of knowing anything about
v(x). Before continuing our discussion we review the
facts found so far, regarding Bayes: (a) he was the first
to introduce and use conditional probability; (b) he was
the first to formulate correctly and solve a problem
of inverse probability; (c) he did not consider the gen-
eral problem Eq.(IV.5).
5. Regarding v(x) we may summarize as follows: (a)
if we can make an adequate assumption for v(x) we
can compute qn(x|n1); (b) if we ignore v(x) and have
no way to assume it and n is a small or moderate
number we cannot make an inference; (c) Laplace has
proved (Section V, 6) that even if we do not know v(x)
we can make a valid inference if n is large (and certain
mathematical assumptions for v(x) are known to hold).
This is not as surprising as it may seem. Clearly, if
we toss a coin 10 times and heads turns up 7 times
and we know nothing else about the coin, an inference
however, 7,000 heads out of 10,000 turn up then, even
if this is all we know the inference that q > 1/2 and
not very far from 0.7 is very probable. The proof of
(c) is really quite a simple one (see von Mises [1964],
pp. 339ff.) but we cannot give it here. We merely state
here the most important property of the right-hand
side of Eq.(IV.5)—writing now qn(x) instead of qn(x|n1).
Independently of v(x), qn(x) shows the property of con-
densation, as n increases more and more, a conden-
sation about the observed success frequency n1/n = r.
Indeed the following theorem holds:
If the observation of ann times repeated alternative
has shown a frequency r of success, then, if n is suffi-
ciently large, the probability for the unknown success-
chance to lie between r - ϵ and r + ϵ is arbitrarily
close to unity.
This is called Bayes's theorem, clearly a kind of
converse of Bernoulli's theorem the observed r playing
here the role of the theoretical q.
6. We consider a closely related problem which
aroused much excitement. Suppose we are in a situa-
tion where we have the right to assume that v(x) =
constant holds, and we know the numbers n and n1.
By some additional considerations we can then com-
pute the ace-probability P itself as inferred from these
data (not only the probability qn(x) that P has a certain
value x), and we find that P equals (n1 + 1)/(n + 2),
and correspondingly 1 - P = (n - n1 + 1)/(n + 2).
This formula for P is called Laplace's rule of succession,
and it gives well-known senseless results if applied in
an unjustified way. Keynes in his treatise (p. 82) says:
“No other formula in the alchemy of logic has exerted
more astonishing powers. It has established the exist-
ence of God from the basis of total ignorance and it
has measured precisely the probability that the sun
will rise tomorrow.” This magical formula must be
qualified. First of all, if n is small or moderate we may
use the formula only if we have good reason to assume
a constant prior probability. And then it is correct. A
general “Principle of Indifference” is not a “good
reason.” Such a “principle” states that in the absence of
any information one value of a variable is as probable
as another. However, no inference can be based on
ignorance. Second, if n and n1 are both large, then
indeed the influence of the a priori knowledge vanishes
and we need no principle of indifference to justify the
formula. One can, however, still manage to get sense-
less results if the formula is applied to events that are
not random events, for which therefore, the reasoning
and the computations which lead to it are not valid.
This remark concerns, e.g., the joke—coming from
Laplace it can only be considered as a joke—about
using the formula to compute the “probability” that
the sun will rise tomorrow. The rising of the sun does
not depend on chance, and our trust in its rising to-
morrow is founded on astronomy and not on statistical
results.
7. We finish with two important remarks. (a) The
idea of inference or inverse probability, the subject of
this section, is not limited to the type of problems
considered here. In our discussion, pn(n1|x) was
(nn1)
xn1 (1 - xn-n1, but formulas like Eq.(IV.5) can be used
for drawing inferences on the value of an unknown
parameter from v(x) and some pn for the most varied
pn. This is done in the general theory of inference
which, according to Richard von Mises and many
others finds a sound basis in the methods explained here
(Mises [1964], Ch. X.). The ideas have also entered
“subjective” probability under the label “Bayesean”
(Lindley, 1965). Regarding the unknown v(x) we say:
(i) if n is large the influence of v(x) vanishes in most
problems; (ii) if n is small, and v(x) unknown it may
still be possible to make some well-founded assumption
regarding v(x) using “past experience” (von Mises
[1964], pp. 498ff.). If no assumption is possible then
no inference can be made. (The problem considered
here was concerned with the posterior chance that the
unknown “ace-probability” has a certain value x or falls
in a certain interval. There are, however, other prob-
lems where such an approach is not called for and
where—similarly as in subsection 6—we mainly want
a good estimate of the unknown magnitude on the basis
of the available data. To reach this aim many different
methods exist. R. A. Fisher advanced the “maximum
likelihood” method which has valuable properties. In
our example, the “maximum likelihood estimate”
equals n1/n, i.e., the observed frequency.)
(b) Like the Bernoulli-de Moivre-Laplace theorem
the Bayes-Laplace theorem has found various exten-
sions and generalizations. Von Mises also envisaged
wide generalizations of both types of Laws of Large
Numbers based on his theory of Statistical Functions
(von Mises [1964], Ch. XII).
V. PIERRE SIMON, MARQUIS DE LAPLACE:
HIS DEFINITION OF PROBABILITY, LIMIT
THEOREMS, AND THEORY OF ERRORS
1. It has been said that Laplace was not so much
an originator as a man who completed, generalized,
and consummated ideas conceived by others. Be this
as it may, what he left is an enormous treasure. In his
Théorie analytique des probabilités (1812) he used the
powerful tools of the new rapidly developing analysis
(The elements of probability calculus—addition, mul-
tiplication, division—were by that time firmly estab-
lished.) Not all of his mathematical results are of equal
interest to the historian of thought.
2. We begin with the discussion of his well-known
definition of probability as the number of cases favora-
ble to an event divided by the number of all equally
likely cases. (Actually this conception had been used
before Laplace but not as a basic definition.) The
“equally likely cases” are les cas également possibles,
c'est à dire tels que nous soyons également indécis sur
leur éxistence (Essai philosophique, p. 4). Thus, for
Laplace, “equally likely” means “equal amount of
indecision,” just as in the notorious “principle of
indifference” (Section IV, 6). In this definition, the
feeling for the empirical side of probability, appearing
at times in the work of Jakob Bernoulli, strongly in
that of Hume and the logicians of Port Royal, seems
to have vanished. The main respect in which the
definition is insufficient is the following. The counting
of equally likely cases works for simple games of
chance (dice, coins). It also applies to important prob-
lems of biology and—surprisingly—of physics. But for
a general definition it is much too narrow as seen by
the simple examples of a biased die, of insurance prob-
abilities, and so on. Laplace himself and his followers
did not hesitate to apply the rules derived by means
of his aprioristic definition to problems like the above
and to many others where the definition failed. Also
in cases where equally likely cases can be defined,
different authors have often obtained different answers
to the same problem (this result was then called a
paradox). The reason is that the authors choose differ-
ent sets of cases as equally likely (Section VI, 8).
Laplace's definition, though not unambiguous and
not sufficiently general, fitted extensive classes of prob-
lems and drew authority from Laplace's great name,
and thus dominated probability theory for at least a
hundred years; it still underlies much of today's think-
ing about probability.
3. Laplace's philosophy of chance, as exposed in his
Essai philosophique is that each phenomenon in the
physical world as well as in social developments is
governed by forces of two kinds; permanent and
accidental. In an isolated phenomenon the effect of
the accidental forces may appear predominant. But,
in the long run, the accidental forces average out and
the permanent ones prevail. This is for Laplace a
consequence of Bernoulli's Law of Large Numbers.
However, while Bernoulli saw very clearly the limita-
tions of his theorem, Laplace applies it to everything
between heaven and earth, including the “favorable
chances tied with the eternal principles of reason,
justice and humanity” or “the natural boundaries of
a state which act as permanent causes,” and so on.
4. We have previously mentioned Laplace's contri-
butions to both Bernoulli's and Bayes's problems. It
was de Moivre's (1713) fruitful idea to evaluate Pn
(Section III, 2) directly for large n. There is no need
to discuss here the precise share of each of the two
mathematicians in the De Moivre-Laplace formula.
Todhunter calls this result “one of the most important
in the whole range of our subject.” Hence, for the sake
of those of our readers with some mathematical
schooling we put down the formula. If a trial where
p(0) = p, p(1) = q, p + q = 1, is repeated n times
where n is a large number, then the probability Pn that
the number x of successes be between
nq - δ √npq and nq + δ √npq
or, what is the same, that the frequency z = x/n of
success be between
q - δ √pq/n and q + δ√pq / n (v.1')
equals asymptotically
Here, the first term, for which we also write 2Φ(δ),
is twice the famous Gauss integral
or, if δ is considered variable, the celebrated normal
distribution function. For fairly large n the second term
of Eq.(V.2) can be neglected and the first term comes
even for moderate values of δ very close to unity (e.g.,
for δ=3.5 it equals 1 up to five decimals). The limits
in Eq.(V.1′) can be rendered as narrow as we please
by taking n sufficiently large and Pn will always be
larger than 2Φ(δ).
This is the first of the famous limit theorems of
probability calculus. Eq.(V.2) exhibits the phenomenon
of condensation (Sections II and IV) about the mid-
point, here the mean value, which means that a proba-
bility arbitrarily close to 1 is contained in an arbitrarily
narrow neighborhood of the mean value. The present
result goes far beyond Bernoulli's theorem in sharpness
and precision, but conceptually it expresses the same
properties.
5. Thus, the distribution of the number x of successes
obtained by repetition of a great number of binary
alternatives is asymptotically a normal curve. As pre-
viously indicated more general theorems of this type
hold. If, as always, we denote success by 1, failure by
0, then x = x1 + x2 + ... + xn, where each xi is either
0 or 1. It is then suggestive to study also cases where
as in the above problem (Section VIII, 2).
6. We pass to Laplace's limit theorem for Bayes's
problem. Set (Section IV, 3) q(x|nn and
; let n tend towards
infinity while n1/n = r is kept fixed. The difference
Qn(x2) - Qn(x1) is the probability that the object of
our inference (for example, the unknown “ace”-
probability) be between x1 and x2. Laplace's limit result
looks similar to Eq.(V.1′) and Eq.(V.2). The probability
that the inferred value lies in the interval
(r - t √r(1 - r/n, r + t √ r(1 - r/n
tends to 2Φ(t) as n → ∞. Bayes's theorem (Section IV,
5) follows as a particular case. The most remarkable
feature of this Laplace result is that it holds inde-
pendently of the prior probability. This is proved with-
out any sort of “principle of indifference.” This mathe-
matical result corresponds, of course, to the fact that
any prior knowledge regarding the properties of the
die becomes irrelevant if we are in possession of a large
number of results of ad hoc observations.
7. To appreciate what now follows we go back for
a moment to our introductory pages in Section I. We
said that the Greek ideal of science was opposed
to the construction of hypotheses on the basis of
empirical data. “The long history of science and phi-
losophy is in large measure the progressive emancipa-
tion of men's minds from the theory of self-evident
truth and from the postulate of complete certainty as
the mark of scientific insight” (Nagel, p. 3).
The end of the eighteenth and the beginning of the
nineteenth century saw the beginnings and develop-
ment of a “theory of errors” developed by the greatest
minds of the time. A long way from the ideal of abso-
lute certitude, scientists are now ready to use observa-
tions, even inaccurate ones. Most observations which
depend on measurements (in the widest sense) are liable
to accidental errors. “Exact” measurements exist only
as long as one is satisfied with comparatively crude
results.
8. Using the most precise methods available one still
obtains small variations in the results, for example, in
the repeated measurements of the distance of two fixed
points on the surface of the earth. We assume that this
distance has some definite “true” value. Let us call
it a and it follows that the results x1, x2,... of several
measurements of the same magnitude must be incorrect
(with the possible exception of one). We call z1 =
x1 - a, z2 = x2 - a,... the errors of measurement.
These errors are considered as random deviations
which oscillate around 0. Therefore, there ought to
exist a law of error, that is a probability w(z) of a certain
error z.
It is a fascinating mathematical result that, by means
of the so-called “theory of elementary errors” we ob-
tain at once the form of w(z). This theory, due to Gauss,
assumes that each observation is subject to a large
number of sources of error. Their sum results in the
observed error z. It follows then at once from the
generalization of the de Moivre-Laplace result (Section
V, 5, Section VIII, 3) that the probability of any result-
ing error z follows a normal or Gaussian law w(z) =
(h/√π)-h2z2. This h, the so-called measure of preci-
sion, is not determined by this theory. The larger h
is, the more concentrated is this curve around z = 0.
9. The problem remains to determine the most
probable value of x. The famous method of least squares
was advanced as a manipulative procedure by
Legendre (1806) and by Gauss (1809). Various attempts
have been made to justify this method by means of
the theory of probability, and here the priority regard-
ing the basic ideas belongs to Laplace. His method was
adopted later (1821-23) by Gauss. The last steps to-
wards today's foundation of the least squares method
are again due to Gauss.
10. Any evaluation of Laplace's contribution to the
history of probabilistic thought must mention his deep
interest in the applications. He realized the applica-
bility of probability theory in the most diverse fields
of man's thinking and acting. (Modern physics and
modern biology, replete with probabilistic ideas, did
not exist in Laplace's time.) In his Mécanique céleste
Laplace advanced probabilistic theories to explain
astronomical facts. Like Gauss he applied the theory
of errors to astronomical and geodetic operations. He
made various applications of his limit theorems. Of
course, he studied the usual problems of human statis-
tics, insurances, deaths, marriages. He considered
questions concerned with legal matters (which later
formed the main subjects of Poisson's great work). As
soon as Laplace discovered a new method, a new
theorem, he investigated its applicability. This close
connection between theory and meaningful observa-
tional problems—which, in turn, originated new theo-
retical questions—is an unusually attractive feature of
this great mind.
VI. A TIME OF TRANSITION
1. The influence of the work of Laplace may be
considered under three aspects: (a) his analytical
achievements which deepened and generalized the
results of his predecessors and opened up new avenues;
(b) his definition of probability which seemed to pro-
vide a firm basis for the whole subject; (c) in line with
the rationalistic spirit of the eighteenth century, a wide
field of applications seemed to have been brought
within the domain of reason. Speaking of probability,
de nos impressions (“Our reason would cease to be the
slave of our impresions”).
2. Of the contributions of the great S. D. Poisson
laid down in his Reherches sur la probabilité des
jugements... (1837), we mention first a generalization
of James Bernoulli's theorem (Section II). Considered
again is a sequence of binary alternatives—in terms
of repeatedly throwing a die for “ace” or “not-ace”—
Poisson abandoned the condition that all throws must
be carried out with the same or identical dice; he
allowed a different die to be used for each throw. If
q(n) denotes the arithmetical mean of the first n ace-
probabilities q1, q2,..., qn then a theorem like
Bernoulli's holds where now q(n) takes the place of the
previously fixed q. Poisson denotes this result as the
Law of Large Numbers. A severe critic like J. M.
Keynes called it “a highly ingenious theorem which
extends widely the applicability of Bernoulli's result.”
To Keynes's regret the condition of independence still
remains. It was removed by Markov (Section VIII, 7).
3. Ever since the time of Bernoulli one could ob-
serve the duality between the empirical aspect of
probability (i.e., frequencies) and a mathematical the-
ory, an algebra, that reflected the relations among the
frequencies. Poisson made an important step by stating
this correspondence explicitly. In the Introduction to
his work he says: “In many different fields we observe
empirical phenomena which appear to obey a certain
general law.... This law states that the ratios of
numbers derived from the observation of very many
similar events remain practically constant provided
that the events are governed partly by constant factors
and partly by variable factors whose variations are
irregular and do not cause a systematic change in a
definite direction. Characteristic values of these pro-
portions correspond to the various kinds of events. The
empirical ratios approach these characteristic values
more and more closely the greater the number of
observations.” Poisson called this law again the Law
of Large Numbers. We shall, however, show in detail
in Section VII that this “Law” and the Bernoulli-
Poisson theorem, explained above, are really two
different statements. The sentences quoted above from
Poisson's Introduction together with a great number
of examples make it clear that here Poisson has in mind
a generalization of empirical results. The “ratios” to
which he refers are the frequencies of certain events
in a long series of observations. And the “characteristic
values of the proportions” are the chances of the
events. We shall see that this is essentially the “postu-
late” which von Mises was to introduce as the
empirical basis of frequency theory (Sections VII, 2-4).
4. Poisson distinguished between “subjective” and
“objective” probability, calling the latter “chance,” the
former “probability” (a distinction going back to
Aristotle). “An event has by its very nature a chance,
small or large, known or unknown, and it has a proba-
bility with respect to our knowledge regarding the
event.” We see that we are relinquishing Laplace's
definition in more than one direction.
5. Ideas expressed in M. A. A. Cournot's beautifully
written book, Exposition de la théorie des chances et
des probabilités (Paris, 1843) are, in several respects
similar to those of Poisson. For Cournot probability
theory deals with certain frequency quotients which
would take on completely determined fixed values if
we could repeat the observations towards infinity. Like
Poisson he discerned a subjective and objective aspect
of probability. “Chance is objective and independent
of the mind which conceives it, and independent of
our restricted knowledge.” Subjective probability may
be estimated according to “the imperfect state of our
knowledge.”
6. Almost from the beginning, certainly from the
time of the Bernoullis, it was hoped that probability
would serve as a basis for dealing with problems con-
nected with the “Sciences Morales.” Laplace studied
judicial procedures, the credibility of witnesses, the
probability of judgments. And we know that Poisson
was particularly concerned with these questions.
Cournot made legalistic applications aux documents
statistiques publiés en France par l'Administration de
la Justice. A very important role in these domains of
thought is to be attributed to the Belgian astronomer
L. A. J. Quételet who visited Paris in 1823 and was
introduced to the mathematicians of La grande école
française, to Laplace, and, in particular, to Poisson.
Between 1823 and 1873 Quételet studied statistical
problems. His Physique sociale of 1869 contains the
construction of the “average man” (homme moyen).
Keynes judged that Quételet “has a fair claim to be
regarded as the parent of modern statistical methods.”
7. It is beyond the scope of this article to delve
into statistics. Nevertheless, since Laplace, Poisson,
Cournot, and Quételet have been mentioned with re-
spect to such applications, we have to add the great
name of W. Lexis whose Theorie der Massenerschei-
nungen in der menschlichen Gesellschaft (“Theory of
Mass Phenomena in Society”) appeared in 1877. He
was perhaps the first one to attempt an investigation
whether, and to what extent, general series of observa-
tions can be compared with the results of games of
chance and to propose criteria regarding these ques-
tions. In other words, he inaugurated “theoretical sta-
tistics.” His work is of great value with respect to
methods and results.
8. We return to probability proper. The great pres-
likely events and actually to the “principle of insuffi-
cient reason” (or briefly “indifference principle”) on
which this concept rests (Section IV, 6). The principle
enters the classical theory in two ways: (a) in Laplace's
definition (Section V, 2) and (b) in the so-called Bayes
principle (Section IV, 4). However, distrust of the
indifference principle kept mounting. It is so easy to
disprove it. We add one particularly striking counter-
example where the results are expressed by continuous
variables.
A glass contains a mixture of wine and water and
we know that the ratio x = water/wine lies between
1 and 2 (at least as much water as wine and at most
twice as much water). The Indifference Principle tells
us to assume that to equal parts of the interval (1, 2)
correspond equal probabilities. Hence, the probability
of x to lie between 1 and 1.5 is the same as that to
lie between 1.5 and 2. Now let us consider the same
problem in a different way, namely, by using the ratio
y = wine/water. On the data, y lies between 1/2 and
1, hence by the Indifference Principle, there corre-
sponds to the interval (2/2, 3/4) the same probability as
to (3/4, 1). But if y = 3/4, then x = 4/3 = 1.333... while
before, the midpoint was at x = 1.5. The two results
clearly contradict each other.
With the admiration of the impressive structure
Laplace had erected—supposedly on the basis of his
definition—the question arose how the mathematicians
managed to derive from abstractions results relevant
to experience. Today we know that the valid objections
against Laplace's equally likely cases do not invalidate
the foundations of probability which are not based on
equally likely cases; we also understand better the
relation between foundations and applications.
9. One way to a satisfactory foundation was to
abandon the obviously unsatisfactory Laplacean de-
finition and to build a theory based on the empirical
aspect of probability, i.e., on frequencies. Careful
observations led again and again to the assumption that
the “chances” were approached more and more by the
empirical ratios of the frequencies. This conception—
which was definitely favored by Cournot—was fol-
lowed by more or less outspoken statements of
R. L. Ellis, and with the work of J. Venn an explicit
frequency conception of probability emerged. This
theory had a strong influence on C. S. Peirce. In respect
to probability Peirce was “more a philosopher than
a mathematician.” The theory of probability is “the
science of logic quantitatively treated.” In contrast to
today's conceptions (Section VII, 5) the first task of
probability is for him to compute (or approximate) a
probability by the frequencies in a long sequence of
observations; this is “inductive inference.” The prob
lem considered almost exclusively in this article, the
“direct” problem, is his “probable inference.” He
strongly refutes Laplace's definition, and subjective
probability is to be excluded likewise. He has then—
understandably—great difficulty to justify or to deduce
a meaning for the probability of a single event (see
Section IV of Peirce's “Doctrine of Chances”). The
concept of probability as a frequency in Poisson,
Cournot, Ellis, Venn, and Peirce (see also Section VII,
6) appears clearly in von Mises' so-called “first postu-
late” (Section VII, 4). These ideas will be discussed
in the context of the next section.
VII. FREQUENCY THEORY OF PROBABILITY.
RICHARD VON MISES
1. As stated at the end of Section VI, the tendency
developed of using frequency objective as the basis
of probability theory. L. Ellis, J. Venn, C. S. Peirce,
K. Pearson, et al. embarked on such an empirical
definition of probability (Section VI, 9 and 3). In this
direction, but beyond them in conceptual clarity and
completeness, went Richard von Mises who published
in 1919 an article “Grundlagen der Wahrscheinlich-
keitsrechnung” (Mathematische Zeitschrift, 5 [1919],
52-99). Probability theory is considered as a scientific
theory in mathematical form like mechanics or
thermodynamics. Its subjects are mass phenomena or
repeatable events, as they appear in games of chance,
in insurance problems, in heredity theory, and in the
ever growing domain of applications in physics.
2. We remember the conception of Poisson given
in Section VI, 3. Poisson maintains that in many differ-
ent fields of experience a certain stabilization of rela-
tive frequencies can be observed as the number of
observations—of the same kind—increases more and
more. He considered this “Law of Large Numbers,”
as he called it, the basis of probability theory. Follow-
ing von Mises, we reserve “Law of Large Numbers”
for the Bernoulli-Poisson theorem (Sections II, and VI,
2), while the above empirical law might be denoted as
Poisson's law.
3. The essential feature of the probability concept
built on Poisson's Law is the following. For certain
types of events the outcome of a single observation
is (either in principle or practically) not available, or
not of interest. It may, however, be possible to consider
the single case as embedded in an ensemble of similar
cases and to obtain for this mass phenomenon mean-
ingful global statements. This coincides so far with
Venn's notion. The classical examples are, of course,
the games of chance. If we toss a die once we cannot
predict what the result will be. But if we toss it 10,000
times, we observe the emergence of an increasing con-
stancy of the six frequencies.
A similar situation appears in social problems
(observed under carefully specified conditions) such as
deaths, births, marriages, suicides, etc.; in the “random
motion” of the molecules of a gas; or in the inheritance
of Mendelian characters.
In each of these examples we are concerned with
events whose outcome may differ in one or more re-
spects: color of a certain species of flowers; shape of
the seed; number on the upper face of a die; death
or survival between age 40 and 41 within a precisely
defined group of men; components of the velocity of
a gas molecule under precise conditions, and so on.
For the mass phenomenon, the large group of flowers,
the tosses with the die, the molecules, we use provi-
sionally the term collective (see complete definition in
subsection 7, below), and we call labels, or simply
results, the mutually exclusive and exhaustive proper-
ties under observation. In Mendel's experiment of the
color of the flower of peas, the labels are the three
colors red, white, pink. If a die is tossed until the 6
appears for the first time with the number of this toss
as result, the labels are the positive integers. If the
components of a velocity vector are observed the
collective is three-dimensional.
4. Von Mises assumed like Poisson that to the various
kinds of repetitive events characteristic values corre-
spond which characterize them in respect to the fre-
quency of each label. Take the die experiment: putting
a die into a dice box; shaking the cup; tossing the die.
The labels are, for example, the six numbers 1, 2,...,
6 and it is assumed that there is a characteristic value
corresponding to the frequency of the event “6.” This
value is a physical constant of the event (it need, of
course, not be 1/6) and it is measured approximately
by the frequency of “6” in a long sequence of such
tosses and is approached more and more the longer
the sequence of observations. We call it the probability
of “6” (Poisson says “chance”) within the considered
collective. If the die is tossed 1,000 times within an
hour we may notice that the frequency of “6” will
no longer change in the first decimal, and if the experi-
ment is continued for ten hours, three decimals, say,
will remain constant and the fourth will change only
slightly. To get rid of the clumsiness of this statement
von Mises used the concept of limit. If in n tosses the
“6” has turned up n6 times we consider
as the probability of “6” in this collective. Similarly,
a probability exists for the other labels. The definition
(VII.1), which essentially coincides with Poisson's, Ellis'
and Venn's assumptions, is often denoted as von Mises'
first postulate. It is of the same type as one which defines
“velocity” as
, where Δs/Δt is the ratio of
the displacement of a particle to the time used for it.
5. Objections of the type that one cannot make
infinitely many tosses are beside the point. We consider
frequency as an approximate measure of the physical
constant probability, just as we measure temperature
by the extension of the mercury, or density by Δm/Δv
as Δv the volume of the body decreases more and more
(containing always the point at which the density is
measured). It is true that we cannot make infinitely
many tosses. But neither do we have procedures to
construct and measure an infinitely small volume and
actually we cannot measure any physical magnitude
with absolute accuracy. Likewise, an infinitely long,
infinitely thin straight line does not “exist” in our real
world; its home is the boundless emptiness of Euclidean
space. Nevertheless, theories based on such abstract
concepts are fundamental in the study of spatial rela-
tions.
We mention a related viewpoint: as in rational
theories of other areas of knowledge it is not the task
of probability theory to ascertain by a frequency ex-
periment the probability of every conceivable event
to which the concept applies, just as the direct meas-
urement of lengths and angles is not the task of geome-
try. Given probabilities serve as the initial data from
which we derive new probabilities by means of the
rules of the calculus of probability. Note also that we
do not imply that in scientific theories probabilities
are necessarily introduced by Eq.(VII.1). The famous
probabilities 1/4, 1/2, 1/4 of the simplest case of Mendel's
theory follow from his theory of heredity and are then
verified (approximately) by frequency experiments. In
a similar way, other theories, notably in physics, provide
theoretical probability distributions which are then
verified either directly, or indirectly through their
consequences.
6. We have mentioned before that von Mises' con-
ception of a long sequence of observations of the same
kind, and even definition Eq.(VII.1), are not absolutely
new. Similar ideas had been proposed by Ellis, Venn,
and Peirce. Theories of Fechner and of Bruns are
related to the above ideas and so is G. Helm's Proba-
bility Theory as the Theory of the Concept of Collectives
(1902). These works did not lead to a complete theory
of probability since they failed to incorporate some
property of a “collective” which would characterize
randomness. To have attempted this is the original and
characteristic feature of von Mises' theory.
7. If in the throwing of a coin we denote “heads”
by 1 and “tails” by 0 the sequence of 0's and 1's
be a “random sequence.” It will exhibit an irregular
appearance like 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1,...
and not look like a regular sequence as 0, 1, 0, 1, 0,
1,.... Attempting to characterize a random sequence
von Mises was led to the concept of a place selection.
From an infinite sequence ω:x1, x2 of labels an
infinite subsequence ω′: x′1, x′2 is selected by means
of a rule which determines univocally for every xv of
ω whether or not it appears in ω′. The rule may depend
on the subscript v of x and on the values x1, x2,...,
xv