Dictionary of the History of Ideas Studies of Selected Pivotal Ideas |

V. |

V. |

V. |

V. |

V. |

V. |

V. |

VII. |

VII. |

III. |

III. |

I. |

II. |

V. |

V. |

VI. |

II. |

V. |

V. |

VII. |

VII. |

I. |

VI. |

VI. |

VI. |

III. |

III. |

VI. |

III. |

III. |

III. |

III. |

III. |

III. |

III. |

III. |

III. |

III. |

III. |

III. |

V. |

V. |

III. |

I. |

VI. |

III. |

VI. |

I. |

III. |

VII. |

I. |

I. |

IV. |

VI. |

V. |

VI. |

VI. |

IV. |

III. |

V. |

VI. |

III. |

VI. |

VI. |

VI. |

III. |

VI. |

VI. |

VI. |

VI. |

II. |

II. |

II. |

VII. | PROBABILITY:OBJECTIVE THEORY |

IV. |

IV. |

V. |

VI. |

VI. |

V. |

Dictionary of the History of Ideas | ||

#### PROBABILITY:

OBJECTIVE THEORY

*I. THE BEGINNING*

*1.* Games and gambling are as old as human history.

It seems that gambling, a specialty of the human spe-

cies, was spread among virtually all human groups. The

Rig Veda, one of the oldest known poems, mentions

gambling; the Germans of Tacitus' times gambled

heavily, so did the Romans, and so on. All through

history man seems to have been attracted by uncer-

tainty. We can still observe today that as soon as an

“infallible system” of betting is found, the game will

be abandoned or changed to beat the system.

While playing around with chance happenings is

very old, attempts towards any systematic investigation

were slow in coming. Though this may be how most

disciplines develop, there appears to have been a par-

ticular resistance to the systematic investigation of

chance phenomena, which by their very nature seem

opposed to regularity, whereas regularity was generally

considered a necessary condition for the scientific un-

derstanding of any subject.

The Greek conception of science was modelled after

the ideal of Euclidean geometry which is supposedly

derived from a few immediately grasped axioms. It

seems that this rationalistic conception limited philos-

ophers and mathematicians well beyond the Middle

Ages. Friedrich Schiller, in a poem of 1795 says of the

“sage”: *Sucht das vertraute Gesetz in des Zufalls
grausenden Wundern/Sucht den ruhenden Pol in der
Erscheinungen Flucht* (“Seeks the familiar law in the

dreaded wonders of chance/Looks for the unmoving

pole in the flux of appearances”).

*2.* However, the hardened gambler, not influenced

by philosophical scruples, could not fail to notice some

sort of long-run regularity in the midst of apparent

irregularity. The use of loaded dice confirms this.

The first “theoretical” work on games of chance

is by Girolamo Cardano (Cardanus), the gambling

scholar: *De ludo aleae* (written probably around 1560

but not published until 1663). Todhunter describes it

as a kind of “gambler's manual.” Cardano speaks of

chance in terms of the frequency of an event. His

mathematics was influenced by Luca Pacioli.

A contribution by the great Galileo was likewise

stimulated directly by gambling. A friend—probably

the duke of Ferrara—consulted Galileo on the follow-

ing problem. The sums 9 and 10 can be each produced

by three dice, through six different combinations,

namely:

9 = 1 + 2 + 6 = 1 + 3 + 5 = 1 + 4 + 4 = 2+ 2 + 5

= 2 + 3 + 4 = 3 + 3 + 3,

10 = 1 + 3 + 6 = 1 + 4 + 5 = 2 + 2 + 6 = 2 + 3 + 5

= 2 + 4 + 4 = 3 + 3 + 4,

and yet the sum 10 appears more often than the sum

9. Galileo pointed out that in the above enumeration,

for the sum 9, the first, second, and fifth combination

can each appear in 6 ways, the third and fourth in

3 ways, and the last in 1 way; hence, there are alto-

gether 25 ways out of 216 compared to 27 for the sum

10. It is interesting that the “friend” was able to detect

empirically a difference of 1/108 in the frequencies.

*3.* Of the same type is the well-known question

gambler. It was usual among gamblers to bet even

money that among 4 throws of a true die the “6” would

appear at least once. De Méré concluded that the same

even chance should prevail for the appearance of the

“double 6” in 24 throws (since 6 times 6 is 36 and

4 times 6 is 24).

*Un problème relatif aux jeux de hasard,*

proposé à un austère Janséniste par un homme du

monde a été l'origine du calcul des probabilités(“A

proposé à un austère Janséniste par un homme du

monde a été l'origine du calcul des probabilités

problem in games of chance, proposed to an austere

Jansenist by a man of the world was the origin of

the calculus of probability”), writes S. D. Poisson in

his

*Recherches sur la probabilité des jugements*...

(Paris, 1837). The Chevalier's experiences with the

second type of bet compared unfavorably with those

in the first case. Putting the problem to Blaise Pascal

he accused arithmetic of unreliability. Pascal writes

on this subject to his friend Pierre de Fermat (29 July

1654):

*Voilà quel était son grand scandale que lui*

faisait dire hautement que les propositions [proportions

(?)] n'étaient pas constantes et que l'arithmétique se

démentait(“This was for him a great scandal which

faisait dire hautement que les propositions [proportions

(?)] n'étaient pas constantes et que l'arithmétique se

démentait

made him say haughtily that the propositions [propor-

tions (?)] are not constant and that arithmetic is self-

contradictory”).

Clearly, this problem is of the same type as that of

Galileo's friend. Again, the remarkable feature is the

gambler's accurate observation of the frequencies.

Pascal's computation might have run as follows. There

are 64 = 1296 different combinations of six signs *a, b,
c, d, e, f* in groups of four. Of these, 54 = 625 contain

no “

*a*” (no “6”) and, therefore, 1296 - 625 = 671

contain at least one “

*a,*” and 671/1296 = 0.518 =

*p*1 is

the probability for the first bet. A similar computation

gives for the second bet

*p*2 = 0.491, indeed smaller

than

*p*1.

Both Fermat and Pascal, just as had previously

Galileo, found it natural to base their reasoning on

observed frequencies. They were interested in the an-

swers to actual problems and created the simplest

“theory” which was logically sound and explained the

observations.

*4.* Particularly instructive is another problem exten-

sively discussed in the famous correspondence between

the two eminent mathematicians, the *problème des
parties* (“problem of points”), which relates to the

question of the just division of the stake between

players if they decide to quit at a moment when neither

has definitely won. Take a simple case. Two players,

A and B, quit at a moment when A needs two points

and B three points to win. Then, reasons Pascal, the

game will certainly be decided in the course of four

more “trials.” He writes down explicitly the combina-

tions which lead to the winning of A, namely

*aaaa,*

aaab, aabb.Here,

aaab, aabb.

*aaab*stands for four different ar-

rangements, namely

*aaab, aaba,*... and similarly

*aabb*

stands for six different arrangements. Hence, 1 + 4 +

6 = 11 arrangements out of 16 lead to the winning

of A and 5 to that of B. The stake should, therefore,

be divided in the ratio 11:5. (It is worthwhile men-

tioning that mathematicians like Roberval and

d'Alembert doubted Pascal's solution.)

The same results were obtained in a slightly different

way by Fermat. The two greatest mathematicians of

their time, Pascal and Fermat, exchanged their dis-

coveries in undisturbed harmony. In the long letter

quoted above, Pascal wrote to Fermat: *Je ne doute plus
maintenant que je suis dans la vérité après le rencontre
admirable où je me trouve avec vous.... Je vois bien
que la vérité est la même à Toulouse et à Paris* (“I do

not doubt any longer that I have the truth after finding

ourselves in such admirable agreement.... I see that

truth is the same in Toulouse and in Paris”). In connec-

tion with such questions Pascal and Fermat studied

combinations and permutations (Pascal's

*Traité du tri-*

angle arithmétique,1664) and applied them to various

angle arithmétique,

problems.

*5.* We venture a few remarks regarding the ideas

on probability of the great philosophers of the seven-

teenth century. “Probability is likeness to be true,” says

Locke. “The grounds of it are in short, these two

following. First, the conformity of anything with our

knowledge, observation, and experience. Secondly, the

testimony of others” (*Essay concerning Human Under-
standing,* Book IV). This is the empirical viewpoint,

a viewpoint suggested by the observation of gambling

results as well as of deaths, births, and other social

happenings. “But,” continues Keynes “in the meantime

the subject had fallen in the hands of the mathe-

maticians and an entirely new method of approach was

in course of development. It had become obvious that

many of the judgments of probability, which we, in

fact, make do not depend upon past experience in a

way which satisfied the canon laid down by the logi-

cians of Port Royal and by Locke” (“

*La logique ou*

l'art de penser...,” by A. Arnauld, Peter Nicole, and

l'art de penser

others, 1662, called the “Port Royal Logic”). As we

have seen, in order to explain observations, the mathe-

maticians created a theory

*based on the counting of*

combinations.The decisive assumption was that the

combinations.

observed frequency of an event (e.g., of the “9” in

Galileo's problem) be proportional to the corre-

sponding relative number of combinations (there,

25/216).

*6.* We close our description of the first steps in

probability calculus with one more really great name,

though his fame was not due to his contributions to

our subject: Christian Huygens. Huygens heard through

culty in obtaining reliable information about the prob-

lem and the methods of the two French mathe-

maticians. Eventually, Carcavi sent him the data as

well as Fermat's solution. Fermat even posed to

Huygens further problems which Huygens worked out

and later included as exercises in a work of his own.

In this work,

*De ratiociniis in aleae ludi*(“On reasoning

in games of chance”) of 1657, he organized all he knew

about the new subject. At the end of the work he

included some questions without indicating the method

of solution. “It seems useful to me to leave something

for my readers to think about (if I have any readers)

and this will serve them both as exercises and as a way

of passing the time.” Jakob (James) Bernoulli gave the

solutions and included them in his

*Ars conjectandi.*The

work of Huygens remained for half a century

*the*

introduction to the “Calculus of Probability.”

*7.* A related type of investigation concerned mor-

tality and annuities. John Graunt started using the

registers of deaths kept in London since 1592, and

particularly during the years of the great plague. He

used his material to make forecasts on population

trends (*Natural and Political Observations... upon
the Bills of Mortality,* 1661). He may well be considered

as one of the first statisticians.

John de Witt, grand pensioner of Holland, wrote

on similar questions in 1671 but the precise content

of his work is not known. Leibniz was supposed to

have owned a copy and he was repeatedly asked

by Jakob Bernoulli—but without success—to let him

see it.

The year 1693 is the date of a remarkable work by

the astronomer Edward Halley which deals with life

statistics. Halley noticed also the regularity of the

“boys' rate” (percentage of male births) and other

constancies. He constructed a mortality table, based

on “Bills of Mortality” for the city of Breslau, and a

table of the values of an annuity for every fifth year

of age up to the seventieth.

The application of “chance” in such different do-

mains as games of chance (which received dignity

through the names of Pascal, Fermat, and Huygens)

and mortality impressed the scientific world. Leibniz

himself appreciated the importance of the new science

(as seen in his correspondence with Jakob Bernoulli).

However, he did not contribute to it and he objected

to some of his correspondent's ideas.

*II. JAKOB BERNOULLI AND THE
LAW OF LARGE NUMBERS*

*1.* The theory of probability consists, on the one

hand, of the consideration and formulation of problems,

including techniques for solving them, and on the other

hand, of general theorems. It is the latter kind which

is of primary interest to the historian of thought. The

intriguing aspect of some of these theorems is that

starting with probabilistic assumptions we arrive at

statements of practical certainty. Jakob Bernoulli was

the first to derive such a theorem and it will be worth-

while to sketch the main lines of argument, using,

however, modern terminology in the interest of

expediency.

*2.* We consider a binary alternative (coin tossing;

“ace” or “non-ace” with a die; etc.) to this day called

a Bernoulli trial. If *q* is the “probability of success,”

*p* = 1 - *q* that of “failure,” then the probability of

*a* successes followed by *b* failures in *a* + *b* trials per-

formed with the same die is *qapb.* This result follows

from multiplication laws of independent probabilities

already found and applied by Pascal and Fermat. The

use of laws of addition and multiplication of proba-

bilities is a step beyond the mere counting of combina-

tions. It is based on the realization that a calculus exists

which parallels and reflects the observed relations be-

tween frequencies.

The above probability *qapb* holds for any pattern of

*a* successes and *b* failures: fssfffsf.... Lumping to-

gether all of these, writing *x* for *a* and *a* + *b* = *n,* we

see that *the probability pn*(*x*) of *x successes and n - x
failures regardless of pattern* is

*pn*(

*x*) = (

*nx*)

*qxp*

*n*-

*x*,

*x*= 0, 1, 2, …, n, (II.1)

where (

*nx*) is the number of combinations of

*n*things

in groups of

*x,*and the sum of all

*pn*(

*x*) is 1.

Often we are more interested in the relative number

*z = x/n,* the *frequency* of successes. Then

*pn*(*x*) = *p′n*(*z*) = (*nnz*) *qnzp**n*(*1-z*)

This *p′n*(*z*)—that is, the function that gives to every

abscissa *z* the ordinate *p′n*(*z*)—has a maximum at a point

*zm,* called the *mode* and *zm* is equal to or very close

to *q.* In the vicinity of *zm* the *p′n*(*z*), as function of *n,*

becomes steeper as *n* increases.

*3.* It was Bernoulli's first great idea to consider

increasing values of *n* and a narrow neighborhood of

*q* or, in other words, to investigate the behavior of

*p′n*(*z*) in the neighborhood of *z = q* as *n* increases; this

he did at a time when the interest in the “very large”

and the “very small” was just awakening. Secondly,

he realized that we are not really interested in the

value of *p′n*(*z*) for any particular value *z* but rather in

the total probability belonging to all *z*'s in an interval.

This interval was to contain *q* which, as we remember,

is our original success probability and at the same time

*p′n*(

*z*) (for large

*n*) and likewise its so-called

“mean value.”

Now, with ε a very small number, we call *Pn* the

probability that *z* lies between *q* - ε and *q* + ε, or,

what is the same, that *x* = *nz* lie between *nq* - *n*ε and

*nq* + *n* ε. For this *Pn* one obtains easily the estimate

And from this follows immediately the fundamental

property of *Pn*:

This result can be expressed in words:

*Let q be a given success probability in a single trial:
n trials are performed with the same q and under
conditions of independence. Then, no matter how small
an ε is chosen, as the number n of repetitions increases
indefinitely, the probability Pn that the frequency of
success lie between * (See

*q*- ε and

*q*+ ε, approaches 1.

*Ars conjectandi,*Basel [1713], Part IV, pp. 236-37.)

The above theorem expresses a property of “con-

densation,” namely that with increasing *n* an increasing

proportion of the total probability (which equals 1) is

concentrated in a fixed neighborhood of the original

*q.* The term “probability” as used by Bernoulli in his

computations is always a *ratio* of the number of cases

favorable to an occurrence to the number of all possible

cases. About this great theorem, called today the

“Bernoulli Theorem,” Bernoulli said: “... I had con-

sidered it closely for a period of twenty years, and it

is a problem the novelty of which, as well as its high

utility together with its difficulty adds importance and

weight to all other parts of my doctrine” (ibid.). The

three other parts of the work are likewise very valuable

(but perhaps less from a conceptual point of view).

The second presents the doctrine of combinations. (In

this part Bernoulli also introduces the polynomials

which carry his name.)

*4.* It will be no surprise to the historian of thought

that the admiration we pay to Bernoulli, the mathe-

matician, is not based on his handling of the conceptual

situation. In addition to the above-explained use of a

quotient for a mathematical probability his views are

of the most varied kind, and, obviously, he is not con-

scious of any possible contradiction: “Probability cal-

culus is a general logic of the uncertain.... Probability

is a degree of certainty and differs from certainty as

the part from the whole.... Of two things the one

which owns the greater part of certainty will be the

more probable.... We denote as *ars conjectandi* the

art of measuring (*metiendi*) the probability of things

as precisely as possible.... We estimate the proba

bilities according to the number and the weight (*vis
probandi*) of the reasons for the occurrence of a thing.”

As to this certitude of which probability is a part he

explains that “the certitude of any thing can be con-

sidered

*objectively*and in this sense it relates to the

actual (present, past, or future) existence of the thing

... or

*subjectively*with respect to ourselves and in

this sense it depends on the amount of our knowl-

edge regarding the thing,” and so on. This vague-

ness is in contrast to the modern viewpoint in which,

however, conceptual precision is bought, sometimes

too easily, by completely rejecting uncongenial inter-

pretations.

*5.* There appears in Bernoulli's work another con-

ceptual issue which deals with the dichotomy between

the so-called *direct* and *inverse* problem. The first one

is the type considered above: we know the probability

*q* and make “predictions” about future observations.

In the *inverse* problem we tend to establish from an

observed series of results the parameters of the under-

lying process, e.g., to establish the imperfection of a

die. (The procedures directed at the inverse problem

are today usually handled in mathematical statistics

rather than in probability theory proper.) Bernoulli

himself states that his theorem fails to give results in

very important cases: in the study of games of skill,

in the various problems of life-statistics, in problems

connected with the weather—problems where results

“depend on unknown causes which are interconnected

in unknown ways.”

It is a measure of Bernoulli's insight that he not only

recognized the importance of the inverse problem but

definitely planned (ibid., p. 226) to establish for this

problem a theorem similar to the one we formulated

above. This he did not achieve. It is possible that he

hoped to give a proof of the inverse theorem and that

death intercepted him (Bernoulli's *Ars conjectandi* was

unfinished at the time of his death and was published

only in 1713); or that he was discouraged by critical

remarks of Leibniz regarding inference. It may also

be that he did not distinguish with sufficient clarity

between the two types of problems. For most of his

contemporaries such a distinction did not exist at all;

actually, even an appropriate terminology was lacking.

We owe the first solid progress concerning the inverse

problem to Thomas Bayes. (See Section IV.)

The Bernoulli theorem forms today the very simplest

case of the Laws of Large Numbers (see e.g., R. von

Mises [1964], Ch. IV). The names Poisson, Tchebychev,

Markov, Khintchine, and von Mises should be men-

tioned in this connection. These theorems are also

called “weak” laws of large numbers in contrast to the

more recently established “strong” laws of large num-

bers (due to Borel, Cantelli, Hausdorff, Khintchine,

laws are mainly of mathematical interest.

*III. ABRAHAM DE MOIVRE AND THE
CENTRAL LIMIT THEOREM*

*1.* Shortly after the death of Jakob Bernoulli but

before the publication (1713) of his posthumous work

books of two important mathematicians, P. R. Mont-

mort (1673-1719) and A. de Moivre (1677-1754),

appeared. These were Montmort's *Essai d'analyse sur
les jeux de hasard* (1708 and 1713) and de Moivre's

*De mensura sortis*... (1711) and the

*Doctrine of*

Chances(1718 and 1738). We limit ourselves to a few

Chances

words on the important work of de Moivre.

De Moivre, the first of the great analytic probabilists,

was, as a mathematician, superior to both Jakob

Bernoulli and Montmort. In addition he had the ad-

vantage of being able to use the ideas of Bernoulli and

the algebraic powers of Montmort, which he himself

then developed to an even higher degree. A charming

quotation, taken from the *Doctrine of Chances,* might

be particularly appreciated by the secretary. “For

those of my readers versed in ordinary arithmetic it

would not be difficult to make themselves masters, not

only of the practical rules in this book but also of more

useful discoveries, if they would take the small pains

of being acquainted with the bare notation of algebra,

which might be done in the hundredth part of the time

that is spent in learning to read shorthand.”

*2.* In probability proper de Moivre did basic work

on the “duration of a game,” on “the gambler's ruin,”

and on other subjects still studied today. Of particular

importance is his extension of Bernoulli's theorem

which is really much more than an extension. In Sec-

tion II, 3 we called *Pn* the sum of the 2*r* + 1 middle

terms of *pn*(*x*) where *r* = *n*ε and *pn*(*x*) is given in

Eq.(II.1). In Eq.(II.2) we gave a very simple esti-

mate of *Pn*. (Bernoulli himself had given a sharper

one but it took him ten printed pages of computa-

tion, and to obtain the desired result the estimate

Eq.(II.2) suffices.)

De Moivre, who had a deep admiration for Bernoulli

and his theorem, conceived the very fruitful idea *of
evaluating Pn directly for large values of n,* instead of

estimating it by an inequality. For this purpose one

needs an

*approximation formula for the factorials of*

large numbers.De Moivre derived such a formula,

large numbers.

which coincides essentially with the famous

*Stirling*

formula.He then determined

formula.

*Pn*“by the artifice of

mechanical quadrature.” He computed particular

values of his asymptotic formula for

*Pn*correct to five

decimals. We shall return to these results in the section

on Laplace. Under the name of the

*de Moivre-Laplace*

formula,the result, most important by itself, became

formula,

the starting point of intensive investigations and far-

reaching generalizations which led to what is called

today the central limit theorem of probability calculus

(Section VIII). I. Todhunter, whose work

*A History of*

the Mathematical Theory of Probability... (1865)

the Mathematical Theory of Probability

ends, however, with Laplace, says regarding de Moivre:

“It will not be doubted that the theory of probability

owes more to him than to any other mathematician

with the sole exception of Laplace.” Our discussion

of the work of this great mathematician is compara-

tively brief since his contributions were more on the

mathematical than on the conceptual side. We men-

tion, however, one more instance whose conceptual

importance is obvious: de Moivre seems to have been

the first to denote a probability by one single letter

(like

*p*or

*q,*etc.) rather than as a quotient of two

integers.

*IV. THOMAS BAYES AND
INVERSE PROBABILITY*

*1.* Bayes (1707-61) wrote two basic memoirs, both

published posthumously, in 1763 and 1765, in Vols. 13

and 14 of the *Philosophical Transactions of the Royal
Society of London.* The title of the first one is: “An

Essay Towards Solving a Problem in the Doctrine of

Chances” (1763). A facsimile of both papers (and of

some other relevant material) was issued in 1940 in

Washington, edited by W. E. Deming and E. C. Molina.

The following is from Molina's comments: “In order

to visualize the year 1763 in which the essay was

published let us recall some history.... Euler, then

56 years of age, was sojourning in Berlin under the

patronage of Frederick the Great, to be followed

shortly by Lagrange, then 27; the Marquis de Con-

dorcet, philosopher and mathematician who later ap-

plied Bayes's theorem to problems of testimony, was

but 20 years old.... Laplace, a mere body of 14, had

still 11 years in which to prepare for his

*Mémoires*of

1774, embodying his first ideas on the “probability of

causes,” and had but one year short of half a century

to bring out the first edition of the

*Théorie analytique*

des probabilités(1812) wherein Bayes's theorem

des probabilités

blossomed forth in its most general form.” (See, how-

ever, the end of this section.)

*2.* We explain first the concept of *conditional prob-
ability* introduced by Bayes. Suppose that of a certain

group of people 90% =

*P*(

*A*) own an automobile and

9% =

*P*(

*A,B*) own an automobile and a bicycle. We

call

*P*(

*B*|

*A*) the conditional probability of owning a

bicycle for people who are known to own also a car.

If

*P(A)*≠ 0, then

*P*(

*B*|

*A*) =

*P*(

*A,B*) /

*P*(

*A*)

*by definition the conditional probability of B given*

A.(This will be explained further in Section VII,9.)

A.

In our example

*P*(

*B*|

*A*) = 9/100/90/100 = 1/10 ;

hence,

*P*(

*B*|

*A*. We may write (IV.1) as

*P*(

*A*,

*B*) =

*P*(

*A*) ·

*P*(

*B*|

*A*)

The

*compound probability*of owning both a car and

a bicycle equals the probability of owning a car times

the conditional probability of owning a bicycle, given

that the person owns a car. Of course, the set

*AB*is

a subset of the set

*A.*

*3.* We try now to formulate some kind of inverse

to a Bernoulli problem. (The remainder of this section

may not be easy for a reader not schooled in mathe-

matical thinking. A few rathr subtle distinctions will

be needed; however, the following sections will again

be easier.) Some game is played *n* times and *n*1

“successes” (e.g., *n*1 “aces” in *n* tossings of a die) are

observed. We consider now as known the numbers *n*

and *n*1 (more generally, the statistical result) and would

like to *make some inference* regarding the unknown

success-chance of “ace.” It is quite clear that if we

know nothing but *n* and *n*1 and if these numbers are

small, e.g., *n* = 10, *n*1 = 7, we cannot make any

inference. Denote by *wn*(*x,n*1) the compound proba-

bility that the die has ace-probability *x* and gave *n*1

success out of *n.* Then the conditional probability

of *x,* given *n*1, which we call *q*n(*x*|*n*1) equals by (IV.1):

*qn* (*x*|*n*1) = *wn* (*x,n*1) / ƃ 01 *wn* (*x,n*1) *dx*

Here, *x* is taken as a continuous variable, i.e., it can

take any value between 0 and 1. The ʃ01*wn*(*x,n*1)*dx*

is our *P*(*A*). It is to be replaced by ∑x*w*n(*x*,*n*1) if *x* is

a discrete variable which can, e.g., take on only one

of the 13 values 0, 1/12, 2/12,..., 11/12, 1.

Let us analyze *wn*(*x,n*1). With the notation of Sec. II.1

we obtain
*pn*(*n*1|*x*) = (*nn*1)*x**n*1 (1 - *x*)n-n1
, the con-

ditional probability of *n*1, given that the success chance

(e.g., the chance of ace) has the value *x.* Therefore,

*w*n(*x*,*n*1) = *v*(*x*)*p*n(*n*1|*x*).

Here *v*(*x*) is the *prior* probability or prior chance, the

chance—prior to the present statistical investiga-

tion—that the ace-probability has the value *x.* Sub-

stituting (IV.4) into (IV.3) we have

*qn*(*x*|*n*1) = *v*(*x*)*pn*(*n*1|*x*) / ʃ01 *v*(*x*)*pn*(*n*1|*x*)*dx*, (IV.5)

where, dependent on the problem, the integral in the

denominator may be replaced by a sum. This is Bayes's

“inversion formula.” If we know *v*(*x*) and *p*n(*n*1|*x*) we

can compute *q*n(*x*|*n*1). Clearly, we have to have some

knowledge of *v*(*x*) in order to evaluate Eq.(IV.5). We

note also that the problem must be such that *x* is a

*random variable,* i.e., *that the assumption of many
possible x's which are distributed in a probability
distribution* makes sense (compare end of Section IV,

6, below).

*4.* In some problems *it may be justified to assume
that v*(

*x*)

*be constant,*i.e.,

*that v has the same value*

for all x.(This was so for the geometric problem which

for all x.

Bayes himself considered.) Boole spoke of this assump-

tion as of a case of “equal distribution of ignorance.”

This is not an accurate denotation since often this

assumption is made not out of ignorance but because

it seems adequate. R. A. Fisher argued with much

passion against “Bayes's principle.” However, Bayes

did not have any such principle. He did not start with

a general formula Eq.(IV.5) and then apply a “princi-

ple” by which

*v*(

*x*) could be neglected. He correctly

solved a particular problem. The general formula,

Eq.(IV.5), is due to Laplace.

How about the *v*(*x*) in our original example? Here,

for a body which behaves and looks halfway like a die,

the assumption of constant *v*(*x*) makes no sense. If, e.g.,

we bought our dice at Woolworth's we might take *v*(*x*)

as a curve which differs from 0 only in the neigh-

borhood of *x* = 1/6. If we suppose a loaded die another

*v*(*x*) may be appropriate. The trouble is, of course, that

sometimes we have no way of knowing anything about

*v*(*x*). Before continuing our discussion we review the

facts found so far, regarding Bayes: (a) he was the first

to introduce and use conditional probability; (b) he was

the first to formulate correctly and solve a problem

of inverse probability; (c) he did not consider the gen-

eral problem Eq.(IV.5).

*5.* Regarding *v*(*x*) we may summarize as follows: (a)

if we *can* make an adequate assumption for *v*(*x*) we

can compute *q*n(*x*|*n*1); (b) if we ignore *v*(*x*) and have

no way to assume it and *n* is a small or moderate

number we cannot make an inference; (c) Laplace has

proved (Section V, 6) that *even if we do not know v*(*x*)

*we can make a valid inference if n is large* (and certain

mathematical assumptions for *v*(*x*) are known to hold).

This is not as surprising as it may seem. Clearly, if

we toss a coin 10 times and heads turns up 7 times

and we know nothing else about the coin, an inference

*q*of this coin is unwarranted. If

however, 7,000 heads out of 10,000 turn up then, even

if this is all we know the inference that

*q*> 1/2 and

not very far from 0.7 is very probable. The proof of

(c) is really quite a simple one (see von Mises [1964],

pp. 339ff.) but we cannot give it here. We merely state

here the most important property of the right-hand

side of Eq.(IV.5)—writing now

*qn*(

*x*) instead of

*q*n(

*x*|

*n*1).

Independently of

*v*(

*x*),

*qn*(

*x*)

*shows the property of con-*

densation,as

densation,

*n*increases more and more, a conden-

sation about the observed success frequency

*n*1/

*n*=

*r.*

Indeed the following theorem holds:

If the observation of an*n* times repeated alternative

has shown a frequency r of success, then, if n is suffi-

ciently large, the probability for the unknown success-

chance to lie between *r* - ϵ and *r* + ϵ *is arbitrarily
close to unity.*

This is called *Bayes's theorem,* clearly a kind of

converse of Bernoulli's theorem the observed *r* playing

here the role of the theoretical *q.*

*6.* We consider a closely related problem which

aroused much excitement. Suppose we are in a situa-

tion *where we have the right to assume that v*(*x*) =

constant *holds,* and we know the numbers *n* and *n*1.

By some additional considerations we can then com-

pute the ace-probability *P* itself *as inferred from these
data* (not only the probability

*qn*(

*x*) that

*P*has a certain

value

*x*), and we find that

*P*equals (

*n*1 + 1)/(

*n*+ 2),

and correspondingly 1 -

*P*= (

*n - n*1 + 1)/(

*n*+ 2).

This formula for

*P*is called

*Laplace's rule of succession,*

and it gives well-known senseless results if applied in

an unjustified way. Keynes in his treatise (p. 82) says:

“No other formula in the alchemy of logic has exerted

more astonishing powers. It has established the exist-

ence of God from the basis of total ignorance and it

has measured precisely the probability that the sun

will rise tomorrow.” This magical formula must be

qualified. First of all, if

*n*is small or moderate we may

use the formula

*only if we have good reason to assume*

a constant prior probability.And then it is correct. A

a constant prior probability.

general “Principle of Indifference” is not a “good

reason.” Such a “principle” states that in the absence of

any information one value of a variable is as probable

as another. However, no inference can be based on

ignorance. Second, if

*n*and

*n*1 are both large, then

indeed

*the influence of the a priori knowledge vanishes*

and we need no principle of indifference to justify the

formula. One can, however, still manage to get sense-

less results if the formula is applied to events that are

not random events, for which therefore, the reasoning

and the computations which lead to it are not valid.

This remark concerns, e.g., the joke—coming from

Laplace it can only be considered as a joke—about

using the formula to compute the “probability” that

the sun will rise tomorrow. The rising of the sun does

not depend on chance, and our trust in its rising to-

morrow is founded on astronomy and not on statistical

results.

*7.* We finish with two important remarks. (a) The

idea of inference or inverse probability, the subject of

this section, is not limited to the type of problems

considered here. In our discussion, *p*n(*n*1|*x*) was
(*nn*1)

*x*n1 (1 - *x*n-n1, but formulas like Eq.(IV.5) *can be used
for drawing inferences on the value of an unknown
parameter from v*(

*x*)

*and some pn for the most varied*

pn.This is done in

pn.

*the general theory of inference*

which, according to Richard von Mises and many

others finds a sound basis in the methods explained here

(Mises [1964], Ch. X.). The ideas have also entered

“subjective” probability under the label “Bayesean”

(Lindley, 1965). Regarding the unknown

*v*(

*x*) we say:

(i) if

*n*is large the influence of

*v*(

*x*) vanishes in most

problems; (ii) if

*n*is small, and

*v*(

*x*) unknown it may

still be possible to make some well-founded assumption

regarding

*v*(

*x*) using “past experience” (von Mises

[1964], pp. 498ff.). If no assumption is possible then

no inference can be made. (The problem considered

here was concerned with the posterior chance that the

unknown “ace-probability” has a certain value

*x*or falls

in a certain interval. There are, however, other prob-

lems where such an approach is not called for and

where—similarly as in subsection 6—we mainly want

a good

*estimate*of the unknown magnitude on the basis

of the available data. To reach this aim many different

methods exist. R. A. Fisher advanced the “maximum

likelihood” method which has valuable properties. In

our example, the “maximum likelihood estimate”

equals

*n*1/

*n,*i.e., the observed frequency.)

(b) Like the Bernoulli-de Moivre-Laplace theorem

the Bayes-Laplace theorem has found various exten-

sions and generalizations. Von Mises also envisaged

wide generalizations of both types of Laws of Large

Numbers based on his theory of Statistical Functions

(von Mises [1964], Ch. XII).

*V. PIERRE SIMON, MARQUIS DE LAPLACE:
HIS DEFINITION OF PROBABILITY, LIMIT
THEOREMS, AND THEORY OF ERRORS*

*1.* It has been said that Laplace was not so much

an originator as a man who completed, generalized,

and consummated ideas conceived by others. Be this

as it may, what he left is an enormous treasure. In his

*Théorie analytique des probabilités* (1812) he used the

powerful tools of the new rapidly developing analysis

(The elements of probability calculus—addition, mul-

tiplication, division—were by that time firmly estab-

lished.) Not all of his mathematical results are of equal

interest to the historian of thought.

*2.* We begin with the discussion of his well-known

*definition* of probability as the number of cases favora-

ble to an event divided by the number of all equally

likely cases. (Actually this conception had been used

before Laplace but not as a basic definition.) The

“equally likely cases” are *les cas également possibles,
c'est à dire tels que nous soyons également indécis sur
leur éxistence (Essai philosophique,* p. 4). Thus, for

Laplace, “equally likely” means “equal amount of

indecision,” just as in the notorious “principle of

indifference” (Section IV, 6). In this definition, the

feeling for the empirical side of probability, appearing

at times in the work of Jakob Bernoulli, strongly in

that of Hume and the logicians of Port Royal, seems

to have vanished. The main respect in which the

definition is insufficient is the following. The counting

of equally likely cases works for simple games of

chance (dice, coins). It also applies to important prob-

lems of biology and—surprisingly—of physics. But for

a general definition it is much too narrow as seen by

the simple examples of a biased die, of insurance prob-

abilities, and so on. Laplace himself and his followers

did not hesitate to apply the rules derived by means

of his aprioristic definition to problems like the above

and to many others where the definition failed. Also

in cases where equally likely cases can be defined,

different authors have often obtained different answers

to the same problem (this result was then called a

paradox). The reason is that the authors choose differ-

ent sets of cases as equally likely (Section VI, 8).

Laplace's definition, though not unambiguous and

not sufficiently general, fitted extensive classes of prob-

lems and drew authority from Laplace's great name,

and thus dominated probability theory for at least a

hundred years; it still underlies much of today's think-

ing about probability.

*3.* Laplace's *philosophy* of chance, as exposed in his

*Essai philosophique* is that each phenomenon in the

physical world as well as in social developments is

governed by forces of two kinds; permanent and

accidental. In an isolated phenomenon the effect of

the accidental forces may appear predominant. But,

in the long run, the accidental forces average out and

the permanent ones prevail. This is for Laplace a

consequence of Bernoulli's Law of Large Numbers.

However, while Bernoulli saw very clearly the limita-

tions of his theorem, Laplace applies it to everything

between heaven and earth, including the “favorable

chances tied with the eternal principles of reason,

justice and humanity” or “the natural boundaries of

a state which act as permanent causes,” and so on.

*4.* We have previously mentioned Laplace's contri-

butions to both Bernoulli's and Bayes's problems. It

was de Moivre's (1713) fruitful idea to evaluate *Pn*

(Section III, 2) directly for large *n.* There is no need

to discuss here the precise share of each of the two

mathematicians in the *De Moivre-Laplace formula.*

Todhunter calls this result “one of the most important

in the whole range of our subject.” Hence, for the sake

of those of our readers with some mathematical

schooling we put down the formula. *If a trial where
p*(0) =

*p, p*(1) =

*q, p + q*= 1,

*is repeated n times*

where n is a large number, then the probability Pn that

the number x of successes be between

where n is a large number, then the probability Pn that

the number x of successes be between

*nq*- δ √

*npq*and

*nq*+ δ √

*npq*

*or, what is the same, that the frequency z = x/n of*

success be between

success be between

*q*- δ √

*pq*/

*n*and

*q*+ δ√

*pq*/

*n*(v.1')

*equals asymptotically*

Here, the first term, for which we also write 2Φ(δ),

is twice the famous

*Gauss integral*

or, if δ is considered variable, the celebrated

*normal*

distribution function.For fairly large

distribution function.

*n*the second term

of Eq.(V.2) can be neglected and the first term comes

even for moderate values of δ very close to unity (e.g.,

for δ=3.5 it equals 1 up to five decimals).

*The limits*

in Eq.(

in Eq.

*V.1*′)

*can be rendered as narrow as we please*

by taking n sufficiently large and Pn will always be

larger than2Φ(δ).

by taking n sufficiently large and Pn will always be

larger than

This is the first of the famous *limit theorems of
probability calculus.* Eq.(V.2) exhibits the phenomenon

of

*condensation*(Sections II and IV) about the mid-

point, here the mean value, which means that

*a proba-*

bility arbitrarily close to 1 is contained in an arbitrarily

narrow neighborhood of the mean value.The present

bility arbitrarily close to 1 is contained in an arbitrarily

narrow neighborhood of the mean value.

result goes far beyond Bernoulli's theorem in sharpness

and precision, but conceptually it expresses the same

properties.

*5.* Thus, the distribution of the number *x* of successes

obtained by repetition of a great number of binary

alternatives is asymptotically a normal curve. As pre-

viously indicated more general theorems of this type

hold. If, as always, we denote success by 1, failure by

0, then *x* = *x*1 + *x*2 + ... + *x*n, where each *xi* is either

0 or 1. It is then suggestive to study also cases where

*x*1,

*x*2,...,

*xn*are not as simple

as in the above problem (Section VIII, 2).

*6.* We pass to Laplace's limit theorem for Bayes's

problem. Set (Section IV, 3) *q*(*x*|*n*n and

; let *n* tend towards

infinity while *n*1/*n = r* is kept fixed. The difference

*Qn*(*x*2) - *Qn*(*x*1) is the probability that the object of

our inference (for example, the unknown “ace”-

probability) be between *x*1 and *x*2. Laplace's limit result

looks similar to Eq.(V.1′) and Eq.(V.2). *The probability
that the inferred value lies in the interval*

(

*r*-

*t*√

*r*(1 -

*r*/

*n*,

*r*+

*t*√

*r*(1 -

*r*/

*n*

*tends to*2Φ(

*t*) as

*n*→ ∞. Bayes's theorem (Section IV,

5) follows as a particular case. The most remarkable

feature of this Laplace result is that

*it holds inde-*

pendently of the prior probability.This is proved with-

pendently of the prior probability.

out any sort of “principle of indifference.” This mathe-

matical result corresponds, of course, to the fact that

any prior knowledge regarding the properties of the

die becomes irrelevant if we are in possession of a large

number of results of ad hoc observations.

*7.* To appreciate what now follows we go back for

a moment to our introductory pages in Section I. We

said that the Greek ideal of science was opposed

to the construction of hypotheses on the basis of

empirical data. “The long history of science and phi-

losophy is in large measure the progressive emancipa-

tion of men's minds from the theory of self-evident

truth and from the postulate of complete certainty as

the mark of scientific insight” (Nagel, p. 3).

The end of the eighteenth and the beginning of the

nineteenth century saw the beginnings and develop-

ment of a “theory of errors” developed by the greatest

minds of the time. A long way from the ideal of abso-

lute certitude, scientists are now ready to use observa-

tions, even inaccurate ones. Most observations which

depend on measurements (in the widest sense) *are* liable

to accidental errors. “Exact” measurements exist only

as long as one is satisfied with comparatively crude

results.

*8.* Using the most precise methods available one still

obtains small variations in the results, for example, in

the repeated measurements of the distance of two fixed

points on the surface of the earth. We assume that this

distance *has* some definite “true” value. Let us call

it *a* and it follows that the results *x*1, *x*2,... of several

measurements of the same magnitude must be incorrect

(with the possible exception of one). We call *z*1 =

*x*1 - *a, z*2 = *x*2 - *a,*... the *errors* of measurement.

These errors are considered as *random deviations*

which oscillate around 0. Therefore, there ought to

exist a *law of error,* that is a probability *w*(*z*) of a certain

error *z.*

It is a fascinating mathematical result that, by means

of the so-called “theory of elementary errors” we ob-

tain at once the form of *w*(*z*). This theory, due to Gauss,

assumes that each observation is subject to a large

number of sources of error. Their sum results in the

observed error *z. It follows then at once from the
generalization of the de Moivre-Laplace result* (Section

V, 5, Section VIII, 3)

*that the probability of any result-*

ing error z follows a normal or Gaussian law w(

ing error z follows a normal or Gaussian law w

*z*) =

(

*h*/√π)-h2z2. This

*h,*the so-called

*measure of preci-*

sion,is not determined by this theory. The larger

sion,

*h*

is, the more concentrated is this curve around

*z*= 0.

*9.* The problem remains to determine *the most
probable value of x.* The famous

*method of least squares*

was advanced as a manipulative procedure by

Legendre (1806) and by Gauss (1809). Various attempts

have been made to justify this method by means of

the theory of probability, and here the priority regard-

ing the basic ideas belongs to Laplace. His method was

adopted later (1821-23) by Gauss. The last steps to-

wards today's foundation of the least squares method

are again due to Gauss.

*10.* Any evaluation of Laplace's contribution to the

history of probabilistic thought must mention his deep

interest in the applications. He realized the applica-

bility of probability theory in the most diverse fields

of man's thinking and acting. (Modern physics and

modern biology, replete with probabilistic ideas, did

not exist in Laplace's time.) In his *Mécanique céleste*

Laplace advanced probabilistic theories to explain

astronomical facts. Like Gauss he applied the theory

of errors to astronomical and geodetic operations. He

made various applications of his limit theorems. Of

course, he studied the usual problems of human statis-

tics, insurances, deaths, marriages. He considered

questions concerned with legal matters (which later

formed the main subjects of Poisson's great work). As

soon as Laplace discovered a new method, a new

theorem, he investigated its applicability. This close

connection between theory and meaningful observa-

tional problems—which, in turn, originated new theo-

retical questions—is an unusually attractive feature of

this great mind.

*VI. A TIME OF TRANSITION*

*1.* The influence of the work of Laplace may be

considered under three aspects: (a) his analytical

achievements which deepened and generalized the

results of his predecessors and opened up new avenues;

(b) his definition of probability which seemed to pro-

vide a firm basis for the whole subject; (c) in line with

the rationalistic spirit of the eighteenth century, a wide

field of applications seemed to have been brought

within the domain of reason. Speaking of probability,

*Notre raison cesserait d'être esclave*

de nos impressions(“Our reason would cease to be the

de nos impressions

slave of our impresions”).

*2.* Of the contributions of the great S. D. Poisson

laid down in his *Reherches sur la probabilité des
jugements*... (1837), we mention first a generalization

of James Bernoulli's theorem (Section II). Considered

again is a sequence of binary alternatives—in terms

of repeatedly throwing a die for “ace” or “not-ace”—

Poisson abandoned the condition that all throws must

be carried out with the same or identical dice; he

allowed

*a different die*to be used for each throw. If

*q*(

*n*) denotes

*the arithmetical mean*of the first

*n*ace-

probabilities

*q*1,

*q*2,...,

*qn*then a theorem like

Bernoulli's holds where now

*q*(

*n*) takes the place of the

previously fixed

*q.*Poisson denotes this result as the

Law of Large Numbers. A severe critic like J. M.

Keynes called it “a highly ingenious theorem which

extends widely the applicability of Bernoulli's result.”

To Keynes's regret the condition of independence still

remains. It was removed by Markov (Section VIII, 7).

*3.* Ever since the time of Bernoulli one could ob-

serve the duality between the empirical aspect of

probability (i.e., frequencies) and a mathematical the-

ory, an algebra, that reflected the relations among the

frequencies. Poisson made an important step by stating

this correspondence explicitly. In the Introduction to

his work he says: “In many different fields we observe

empirical phenomena which appear to obey a certain

general law.... This law states that the ratios of

numbers derived from the observation of very many

similar events remain practically constant provided

that the events are governed partly by constant factors

and partly by variable factors whose variations are

irregular and do not cause a systematic change in a

definite direction. Characteristic values of these pro-

portions correspond to the various kinds of events. The

empirical ratios approach these characteristic values

more and more closely the greater the number of

observations.” Poisson called this law again the Law

of Large Numbers. We shall, however, show in detail

in Section VII that this “Law” and the Bernoulli-

Poisson theorem, explained above, are really two

different statements. The sentences quoted above from

Poisson's Introduction together with a great number

of examples make it clear that here Poisson has in mind

a generalization of empirical results. The “ratios” to

which he refers are the frequencies of certain events

in a long series of observations. And the “characteristic

values of the proportions” are the chances of the

events. We shall see that this is essentially the “postu-

late” which von Mises was to introduce as the

empirical basis of frequency theory (Sections VII, 2-4).

*4.* Poisson distinguished between “subjective” and

“objective” probability, calling the latter “chance,” the

former “probability” (a distinction going back to

Aristotle). “An event has by its very nature a *chance,*

small or large, known or unknown, and it has a *proba-
bility* with respect to our knowledge regarding the

event.” We see that we are relinquishing Laplace's

definition in more than one direction.

*5.* Ideas expressed in M. A. A. Cournot's beautifully

written book, *Exposition de la théorie des chances et
des probabilités* (Paris, 1843) are, in several respects

similar to those of Poisson. For Cournot probability

theory deals with certain frequency quotients which

would take on completely determined fixed values if

we could repeat the observations towards infinity. Like

Poisson he discerned a subjective and objective aspect

of probability. “Chance is objective and independent

of the mind which conceives it, and independent of

our restricted knowledge.” Subjective probability may

be estimated according to “the imperfect state of our

knowledge.”

*6.* Almost from the beginning, certainly from the

time of the Bernoullis, it was hoped that probability

would serve as a basis for dealing with problems con-

nected with the “*Sciences Morales.*” Laplace studied

judicial procedures, the credibility of witnesses, the

probability of judgments. And we know that Poisson

was particularly concerned with these questions.

Cournot made legalistic applications *aux documents
statistiques publiés en France par l'Administration de
la Justice.* A very important role in these domains of

thought is to be attributed to the Belgian astronomer

L. A. J. Quételet who visited Paris in 1823 and was

introduced to the mathematicians of

*La grande école*

française,to Laplace, and, in particular, to Poisson.

française,

Between 1823 and 1873 Quételet studied statistical

problems. His

*Physique sociale*of 1869 contains the

construction of the “average man” (

*homme moyen*).

Keynes judged that Quételet “has a fair claim to be

regarded as the parent of modern statistical methods.”

*7.* It is beyond the scope of this article to delve

into statistics. Nevertheless, since Laplace, Poisson,

Cournot, and Quételet have been mentioned with re-

spect to such applications, we have to add the great

name of W. Lexis whose *Theorie der Massenerschei-
nungen in der menschlichen Gesellschaft* (“Theory of

Mass Phenomena in Society”) appeared in 1877. He

was perhaps the first one to attempt an investigation

whether, and to what extent, general series of observa-

tions can be compared with the results of games of

chance and to propose criteria regarding these ques-

tions. In other words, he inaugurated “theoretical sta-

tistics.” His work is of great value with respect to

methods and results.

*8.* We return to probability proper. The great pres-

likely events and actually to the “principle of insuffi-

cient reason” (or briefly “indifference principle”) on

which this concept rests (Section IV, 6). The principle

enters the classical theory in two ways: (a) in Laplace's

definition (Section V, 2) and (b) in the so-called Bayes

principle (Section IV, 4). However, distrust of the

indifference principle kept mounting. It is so easy to

disprove it. We add one particularly striking counter-

example where the results are expressed by continuous

variables.

A glass contains a mixture of wine and water and

we know that the ratio *x* = water/wine lies between

1 and 2 (at least as much water as wine and at most

twice as much water). The Indifference Principle tells

us to assume that to equal parts of the interval (1, 2)

correspond equal probabilities. Hence, the probability

of *x* to lie between 1 and 1.5 is the same as that to

lie between 1.5 and 2. Now let us consider the same

problem in a different way, namely, by using the ratio

*y* = wine/water. On the data, *y* lies between 1/2 and

1, hence by the Indifference Principle, there corre-

sponds to the interval (2/2, 3/4) the same probability as

to (3/4, 1). But if *y* = 3/4, then *x* = 4/3 = 1.333... while

before, the midpoint was at *x* = 1.5. The two results

clearly contradict each other.

With the admiration of the impressive structure

Laplace had erected—supposedly on the basis of his

definition—the question arose how the mathematicians

managed to derive from abstractions results relevant

to experience. Today we know that the valid objections

against Laplace's equally likely cases do not invalidate

the foundations of probability which are not based on

equally likely cases; we also understand better the

relation between foundations and applications.

*9.* One way to a satisfactory foundation was to

abandon the obviously unsatisfactory Laplacean de-

finition and to build a theory based on the empirical

aspect of probability, i.e., on frequencies. Careful

observations led again and again to the assumption that

the “chances” were approached more and more by the

empirical ratios of the frequencies. This conception—

which was definitely favored by Cournot—was fol-

lowed by more or less outspoken statements of

R. L. Ellis, and with the work of J. Venn an explicit

frequency conception of probability emerged. This

theory had a strong influence on C. S. Peirce. In respect

to probability Peirce was “more a philosopher than

a mathematician.” The theory of probability is “the

science of logic quantitatively treated.” In contrast to

today's conceptions (Section VII, 5) the first task of

probability is for him to compute (or approximate) a

probability by the frequencies in a long sequence of

observations; this is “inductive inference.” The prob

lem considered almost exclusively in this article, the

“direct” problem, is his “probable inference.” He

strongly refutes Laplace's definition, and subjective

probability is to be excluded likewise. He has then—

understandably—great difficulty to justify or to deduce

a meaning for the probability of a single event (see

Section IV of Peirce's “Doctrine of Chances”). The

concept of probability as a frequency in Poisson,

Cournot, Ellis, Venn, and Peirce (see also Section VII,

6) appears clearly in von Mises' so-called “first postu-

late” (Section VII, 4). These ideas will be discussed

in the context of the next section.

*VII. FREQUENCY THEORY OF PROBABILITY.
RICHARD VON MISES*

*1.* As stated at the end of Section VI, the tendency

developed of using frequency objective as the basis

of probability theory. L. Ellis, J. Venn, C. S. Peirce,

K. Pearson, et al. embarked on such an empirical

definition of probability (Section VI, 9 and 3). In this

direction, but beyond them in conceptual clarity and

completeness, went Richard von Mises who published

in 1919 an article “Grundlagen der Wahrscheinlich-

keitsrechnung” (*Mathematische Zeitschrift,* 5 [1919],

52-99). Probability theory is considered as a scientific

theory in mathematical form like mechanics or

thermodynamics. Its subjects are *mass phenomena* or

*repeatable events,* as they appear in games of chance,

in insurance problems, in heredity theory, and in the

ever growing domain of applications in physics.

*2.* We remember the conception of Poisson given

in Section VI, 3. Poisson maintains that in many differ-

ent fields of experience a certain *stabilization of rela-
tive frequencies* can be observed as the number of

observations—of the same kind—increases more and

more. He considered this “Law of Large Numbers,”

as he called it, the basis of probability theory. Follow-

ing von Mises, we reserve “Law of Large Numbers”

for the Bernoulli-Poisson theorem (Sections II, and VI,

2), while the above empirical law might be denoted as

Poisson's law.

*3.* The essential feature of the probability concept

built on Poisson's Law is the following. For certain

types of events the outcome of a single observation

is (either in principle or practically) not available, or

not of interest. It may, however, be possible to consider

the single case as embedded in an ensemble of similar

cases and to obtain for this mass phenomenon mean-

ingful global statements. This coincides so far with

Venn's notion. The classical examples are, of course,

the games of chance. If we toss a die once we cannot

predict what the result will be. But if we toss it 10,000

times, we observe the emergence of an increasing con-

stancy of the six frequencies.

A similar situation appears in social problems

(observed under carefully specified conditions) such as

deaths, births, marriages, suicides, etc.; in the “random

motion” of the molecules of a gas; or in the inheritance

of Mendelian characters.

In each of these examples we are concerned with

events whose outcome may differ in one or more re-

spects: color of a certain species of flowers; shape of

the seed; number on the upper face of a die; death

or survival between age 40 and 41 within a precisely

defined group of men; components of the velocity of

a gas molecule under precise conditions, and so on.

For the mass phenomenon, the large group of flowers,

the tosses with the die, the molecules, we use provi-

sionally the term *collective* (see complete definition in

subsection 7, below), and we call *labels,* or simply

results, the mutually exclusive and exhaustive proper-

ties under observation. In Mendel's experiment of the

color of the flower of peas, the labels are the three

colors red, white, pink. If a die is tossed until the 6

appears for the first time with the number of this toss

as result, the labels are the positive integers. If the

components of a velocity vector are observed the

collective is three-dimensional.

*4.* Von Mises assumed like Poisson that to the various

kinds of repetitive events characteristic values corre-

spond which characterize them in respect to the fre-

quency of each label. Take the die experiment: putting

a die into a dice box; shaking the cup; tossing the die.

The labels are, for example, the six numbers 1, 2,...,

6 and it is assumed that there is a characteristic value

corresponding to the frequency of the event “6.” This

value is a *physical constant* of the event (it need, of

course, not be 1/6) and it is measured approximately

by the frequency of “6” in a long sequence of such

tosses and is approached more and more the longer

the sequence of observations. We call it *the probability
of “6”* (Poisson says “chance”)

*within the considered*

collective.If the die is tossed 1,000 times within an

collective.

hour we may notice that the frequency of “6” will

no longer change in the first decimal, and if the experi-

ment is continued for ten hours, three decimals, say,

will remain constant and the fourth will change only

slightly. To get rid of the clumsiness of this statement

von Mises used the concept of

*limit.*If in

*n*tosses the

“6” has turned up

*n*6 times we consider

as the probability of “6” in this collective. Similarly,

a probability exists for the other labels. The definition

(VII.1), which essentially coincides with Poisson's, Ellis'

and Venn's assumptions, is often denoted as

*von Mises'*

first postulate.It is of the same type as one which defines

first postulate.

“velocity” as , where Δ

*s*/Δ

*t*is the ratio of

the displacement of a particle to the time used for it.

*5.* Objections of the type that one cannot make

infinitely many tosses are beside the point. We consider

frequency as an approximate measure of the physical

constant probability, just as we measure temperature

by the extension of the mercury, or density by Δ*m*/Δ*v*

as Δ*v* the volume of the body decreases more and more

(containing always the point at which the density is

measured). It is true that we cannot make infinitely

many tosses. But neither do we have procedures to

construct and measure an infinitely small volume and

actually we cannot measure any physical magnitude

with absolute accuracy. Likewise, an infinitely long,

infinitely thin straight line does not “exist” in our real

world; its home is the boundless emptiness of Euclidean

space. Nevertheless, theories based on such abstract

concepts are fundamental in the study of spatial rela-

tions.

We mention a related viewpoint: as in rational

theories of other areas of knowledge it is not the task

of probability theory to ascertain by a frequency ex-

periment the probability of every conceivable event

to which the concept applies, just as the direct meas-

urement of lengths and angles is not the task of geome-

try. Given probabilities serve as the *initial data* from

which we derive new probabilities by means of the

rules of the calculus of probability. Note also that we

do not imply that in scientific theories probabilities

are necessarily *introduced* by Eq.(VII.1). The famous

probabilities 1/4, 1/2, 1/4 of the simplest case of Mendel's

theory *follow from his theory of heredity* and are then

verified (approximately) by frequency experiments. In

a similar way, other *theories,* notably in physics, *provide
theoretical probability distributions* which are then

verified either directly, or indirectly through their

consequences.

*6.* We have mentioned before that von Mises' con-

ception of a long sequence of observations of the same

kind, and even definition Eq.(VII.1), are not absolutely

new. Similar ideas had been proposed by Ellis, Venn,

and Peirce. Theories of Fechner and of Bruns are

related to the above ideas and so is G. Helm's *Proba-
bility Theory as the Theory of the Concept of Collectives*

(1902). These works did not lead to a complete theory

of probability since they failed to incorporate some

property of a “collective” which would characterize

randomness. To have attempted this is the original and

characteristic feature of von Mises' theory.

*7.* If in the throwing of a coin we denote “heads”

by 1 and “tails” by 0 the sequence of 0's and 1's

be a “random sequence.” It will exhibit an

*irregular*

appearance like 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1,...

and not look like a regular sequence as 0, 1, 0, 1, 0,

1,.... Attempting to characterize a random sequence

von Mises was led to the concept of a

*place selection.*

From an infinite sequence ω:x1, x2 of labels an

infinite subsequence ω′: x′1, x′2 is selected by means

of a rule which determines univocally for every xv of

ω whether or not it appears in ω′. The rule may depend

on the subscript v of x and on the values x1,

From an infinite sequence ω:x1, x2 of labels an

infinite subsequence ω′: x′1, x′2 is selected by means

of a rule which determines univocally for every xv of

ω whether or not it appears in ω′. The rule may depend

on the subscript v of x and on the values x

*x*2,...,

xv