The mathematical notions are now at hand with which one may state the postulates of a mathematical model of a random phenomenon. Let us recall that in our heuristic discussion of the notion of a random phenomenon in section 1 we accepted the so-called “frequency” interpretation of probability, according to which the probability of an event \(E\) is a number (which we denote by \(P[E]\) ). This number can be known to us only by experience as the result of a very long series of observations of independent trials of the event \(E\) . (By a trial of \(E\) is meant an occurrence of the phenomenon on which \(E\) is defined.) Having observed a long series of trials, the probability of \(E\) represents the fraction of trials whose outcome has a description that is a member of \(E\) . In view of the frequency interpretation of \(P[E]\) , it follows that a mathematical definition of the probability of an event cannot tell us the value of \(P[E]\) for any particular event \(E\) . Rather a mathematical theory of probability must be concerned with the properties of the probability of an event considered as a function defined on all events. With these considerations in mind, we now give the following definition of probability.

The definition of probability as a function of events on the subsets of a sample description space of a random phenomenon: 

Given a random situation, which is described by a sample description space \(S\) , probability is a function 1 \(P[\cdot]\) that to every event \(E\) assigns a nonnegative real number, denoted by \(P[E]\) and called the probability of the event \(E\) . The probability function must satisfy three axioms: 

Axiom 1. \(P[E] \geq 0\) for every event \(E\)

Axiom 2. \(P[S]=1\) for the certain event \(S\)

Axiom 3. \(P[E \cup F]=P[E]+P[F]\) , if \(E F=\emptyset\) , or in words, the probability of the union of two mutually exclusive events is the sum of their probabilities. 

It should be clear that the properties stated by the foregoing axioms do constitute a formal statement of some of the properties of the numbers \(P[E]\) and \(P[F]\) , interpreted to represent the relative frequency of occurrence of the events \(E\) and \(F\) in a large number \(N\) of occurrences of the random phenomenon on which they are defined. For any event, \(E\) , let \(N_{E}\) be the number of occurrences of \(E\) in the \(N\) occurrences of the phenomenon. Then, by the frequency interpretation of probability, \(P[E]=N_{E} / N\) . Clearly, \(P[E] \geq 0\) . Next, \(N_{S}=N\) , since, by the construction of \(S\) , it occurs on every occurrence of the random phenomenon. Therefore, \(P[S]=1\) . Finally, for two mutually exclusive events, \(E\) and \(F, N_{(E \cup F)}=\) \(N_{E}+N_{F}\) . Thus axiom 3 is satisfied.

It therefore follows that any property of probabilities that can be shown to be logical consequences of axioms 1 to 3 will hold for probabilities interpreted as relative frequencies. We shall see that for many purposes axioms 1 to 3 constitute a sufficient basis from which to derive the properties of probabilities. In advanced studies of probability theory, in which more delicate questions concerning probability are investigated, it is found necessary to strengthen the axioms somewhat. At the end of this section we indicate briefly the two most important modifications required.

We now show how one can derive from axioms 1 to 3 some of the important properties that probability possesses. In particular, we show how axiom 3 suffices to enable us to compute the probabilities of events constructed by means of complementations and unions of other events in terms of the probabilities of these other events.

In order to be able to state briefly the hypotheses of the theorems subsequently proved, we need some terminology. It is to be emphasized that one can speak of the probability of an event only if the event is a subset of a definite sample description space \(S\) , on whose subsets a probability function has been defined. Consequently, the hypothesis of a theorem concerning events should begin, “Let \(S\) be a sample description space on the subsets of which a probability function \(P[\cdot]\) has been defined. Let \(E\) and \(F\) be any two events on \(S\) ”. For the sake of brevity, we write instead “Let \(E\) and \(F\) be any two events on a probability space”; by a probability space we mean a sample description space on which a probability function (satisfying axioms 1,2, and 3) has been defined.

Formula for the Probability of the Impossible Event \(\emptyset\)

\[P[\emptyset]=0 . \tag{5.1}\] 

 

Proof

By (4.4) it follows that the certain event \(S\) and the impossible event are mutually exclusive; further, their union \(S \cup \emptyset=S\) . Consequently, \(P[S]=P[S \cup \emptyset]=P[S]+P[\emptyset]\) , from which it follows that \(P[\emptyset]=0\) .

 

Formula for the Probability of a Difference \(F E^{C}\) of Two Events \(E\) And \(F\) . For any two events, \(E\) and \(F\) , on a probability space

\[P\left[F E^{c}\right]=P[F]-P[E F] \tag{5.2}\] 

 

Proof

The events \(F E\) and \(F E^{c}\) are mutually exclusive, and their union is \(F\) [compare (4.5) ]. Then, by axiom \(3, P[F]=P[E F]+P\left[F E^{c}\right]\) , from which (5.2) follows immediately.

 

Formula for the Probability of the Complement of an Event . For any event \(E\) on a probability space

\[P\left[E^{c}\right]=1-P[E] \tag{5.3}\] 

 

Proof

Let \(F=S\) in (5.2). Since \(S E^{c}=E^{c}, S E=E\) , and \(P[S]=1\) , we have obtained (5.3).

 

Formula for the Probability of a Union \(E \cup F\) of Two Events \(E\) And \(F\) . For any two events, \(E\) and \(F\) , on a probability space

\[P[E \cup F]=P[E]+P[F]-P[E F]. \tag{5.4}\] 

 

Proof

We use the fact that the event \(E \cup F\) may be written as the union of the two mutually exclusive events, \(E\) and \(F E^{c}\) . Then, by axiom 3, \(P[E \cup F]=P[E]+P\left[F E^{c}\right]\) . By evaluating \(P\left[F E^{c}\right]\) by (5.2), one obtains (5.4).

 

Note that (5.4) extends axiom 3 to the case in which the events whose union is being formed are not necessarily mutually exclusive.

We next obtain a basic property of the probability function, namely, that if an event \(F\) is a subevent of another event \(E\) , then the probability that \(F\) will occur is less than or equal to the probability that \(E\) will occur.

Inequality for the Probability of a Subevent . Let \(E\) and \(F\) be events on a probability space \(S\) such that \(F \subset E\) (that is, \(F\) is a subevent of \(E\) ). Then \[P\left[E F^{c}\right]=P[E]-P[F] \quad \text { if } F \subset E \tag{5.5}\] \[P[F] \leq P[E], \quad \text { if } F \subset E. \tag{5.6}\] 

 

Proof

By (5.2), \(P[E]-P[E F]=P\left[E F^{c}\right]\) . Now, since \(F \subset E\) , it follows that, as in (4.6), \(E F=F\) . Therefore, \(P[E]-P[F]=P\left[E F^{c}\right]\) , which proves (5.5). Next, \(P\left[E F^{c}\right] \geq 0\) , by axiom 1. Therefore, \(P[E]-P[F] \geq 0\) , from which it follows that \(P[F] \leq P[E]\) , which proves (5.6).

 

From the preceding inequality we may derive the basic fact that probabilities are numbers between 0 and 1: \[\text { for any event } E \quad 0 \leq P[E] \leq 1. \tag{5.7}\] This is proved as follows. By axiom \(1,0 \leq P[E]\) . Next, any event \(E\) is a subevent of the certain event. Therefore, by (5.6), \(P[E] \leq P[S]\) . However, by axiom \(2, P[S]=1\) , and the proof of the assertion is completed.

Formula for the Probability of the Union of a Finite Number of Mutually Exclusive Events . For any positive integer \(n\) the probability of the union of \(n\) mutually exclusive events \(E_{1}, E_{2}, \ldots, E_{n}\) is equal to the sum of the probabilities of the events; in symbols, \[P\left[E_{1} \cup E_{2} \cup \cdots \cup E_{n}\right]=P\left[E_{1}\right]+P\left[E_{2}\right]+\cdots+P\left[E_{n}\right] \tag{5.8}\] if, for every two integers \(i\) and \(j\) which are not equal and which are between 1 and \(n\) , inclusive, \(E_{i} E_{j}=\emptyset\) .

 

Proof

To prove (5.8), we make use of the principle of mathematical induction, which states that a proposition \(p(n)\) , which depends on an integer \(n\) , is true for \(n=1,2, \ldots\) , if one shows that (i) it is true for \(n=1\) , and (ii) it satisfies the implication: \(p(n)\) implies \(p(n+1)\) . Now, for any positive integer \(n\) let \(p(n)\) be the proposition that for any set of \(n\) mutually exclusive events, \(E_{1}, \ldots, E_{n},(5.8)\) holds. That \(p(1)\) is true is obvious, since in the case that \(n=1(5.8)\) states that \(P\left[E_{1}\right]=P\left[E_{1}\right]\) . Next, let \(n\) be a definite integer, and let us assume that \(p(n)\) is true. Let us show that from the assumption that \(p(n)\) is true it follows that \(p(n+1)\) is true. Let \(E_{1}, E_{2}, \ldots\) , \(E_{n}, E_{n+1}\) be \(n+1\) mutually exclusive events. Since the events \(E_{1} \cup\) \(E_{2} \cup \ldots \cup E_{n}\) and \(E_{n+1}\) are then mutually exclusive, it follows, by axiom 3, that \[P\left[E_{1} \cup E_{2} \cup \cdots \cup E_{n+1}\right]=P\left[E_{1} \cup E_{2} \cup \cdots \cup E_{n}\right]+P\left[E_{n+1}\right] \tag{5.9}\] From (5.9), and the assumption that \(p(n)\) is true, it follows that \(P\left[E_{1} \cup \ldots \cup E_{n+1}\right]=P\left[E_{1}\right]+\cdots+P\left[E_{n+1}\right]\) . We have thus shown that \(p(n)\) implies \(p(n+1)\) . By the principle of mathematical induction, it holds that the proposition \(p(n)\) applies to any positive integer \(n\) . The proof of (5.8) is now complete.

 

The foregoing axioms are completely adequate for the study of random phenomena whose sample description spaces are finite. For the study of infinite sample description spaces, however, it is necessary to modify axiom 3. We may wish to consider an infinite sequence of mutually exclusive events, \(E_{1}, E_{2}, \ldots, E_{n}, \ldots\) That the probability of the union of an infinite number of mutually exclusive events is equal to the sum of the probabilities of the events cannot be proved by axiom 3 but must be postulated separately. Consequently, in advanced studies of probability theory, instead of axiom 3, the following axiom is adopted.

Axiom \(3^{\prime}\) . For any infinite sequence of mutually exclusive events, \(E_{1}, E_{2}, \ldots, E_{n}, \ldots\) , \begin{align} & P\left[E_{1} \cup E_{2} \cup \cdots \cup E_{n} \cup \cdots\right] \tag{5.10} \\ & \quad=P\left[E_{1}\right]+P\left[E_{2}\right]+\cdots+P\left[E_{n}\right]+\cdots . \end{align} 

A somewhat more esoteric modification in the foregoing axioms becomes necessary when we consider a random phenomenon whose sample description space \(S\) is non-countably infinite. It may then turn out that there are subsets of \(S\) that are nonprobabilizable, in the sense that it is not possible to assign a probability to these sets in a manner consistent with the axioms. If such is the case, then only probabilizable subsets of \(S\) are defined as events. Since it may be proved that the union, intersection, and complements of events are events, this restriction of the notion of event causes no difficulty in application and renders the mathematical theory rigorous.

Exercises

5.1. Boole’s inequality. For a finite set of events, \(A_{1}, A_{2}, \ldots, A_{n}\) , \[P\left[A_{1} \cup A_{2} \cup \cdots \cup A_{n}\right] \leq P\left[A_{1}\right]+P\left[A_{2}\right]+\cdots+P\left[A_{n}\right]. \tag{5.11}\] Prove this assertion by means of the principle of mathematical induction.

5.2. Formula for the probability that exactly 1 of 2 events will occur. Show that for any 2 events, \(A\) and \(B\) , on a probability space \[P\left[A B^{c} \cup B A^{c}\right]=P[A]+P[B]-2 P[A B]. \tag{5.12}\] The event \(A B^{c} \cup B A^{c}\) is the event that exactly 1 of the events, \(A\) and \(B\) , will occur. Contrast (5.12) with (5.4), which could be called the formula for the probability that at least 1 of 2 events will occur.

5.3. Show that for any 3 events, \(A, B\) , and \(C\) , defined on a probability space, the probability of the event that at least 1 of the events will occur is given by \begin{align} P[A \cup B \cup C]=P[A] & +P[B]+P[C]-P[A B]-P[A C] \\ & -P[B C]+P[A B C]. \end{align} 

5.4. Let \(A\) and \(B\) be 2 events on a probability space. Show that \[P[A B] \leq P[A] \leq P[A \cup B] \leq P[A]+P[B].\] 

5.5. Let \(A\) and \(B\) be 2 events on a probability space. In terms of \(P[A], P[B]\) , and \(P[A B]\) , express (i) for \(k=0,1,2, P\) [exactly \(k\) of the events, \(A\) and \(B\) , occur], (ii) for \(k=0,1,2, P\) [at least \(k\) of the events, \(A\) and \(B\) , occur], (iii) for \(k=0,1,2, P\) [at most \(k\) of the events, \(A\) and \(B\) , occur], (iv) \(P[A\) occurs and \(B\) does not occur].

 

Answer

\(P[\) exactly 0 \(]=1+P[A B]-P[A]-P[B]. P[\) exactly 1 \(]=P[A]+P[B]-2 P[A B]\) . \(P\) [exactly 2] \(=P[A B]. P\) [at least 0 \(]=1\) . \(P\) [at least 1 \(]=P[A]+P[B]-P[A B]\) . \(P\) [at least 2 \(]=P[A B]. P\) [at most 0 \(]=1+P[A B]-P[A]-P[B]\) . \(P[\) at most 1 \(]=1-P[A B]. P[\) at most 2 \(]=1\) .

 

5.6. Let \(A, B\) , and \(C\) be 3 events on a probability space. In terms of \(P[A], P[B]\) , \(P[C], P[A B], P[A C], P[B C]\) , and \(P[A B C]\) express for \(k=0,1,2,3\) (i) \(P\) [exactly \(k\) of the events, \(A, B, C\) , occur], (ii) \(P\) [at least \(k\) of the events, \(A, B, C\) , occur], (iii) \(P\) [at most \(k\) of the events, \(A, B, C\) , occur].

5.7. Evaluate the probabilities asked for in exercise 5.5 in the case that 
(i) \(P[A]=P[B]=\frac{1}{3}, \quad P[A B]=\frac{1}{6}\)
(iii) \(P[A]=P[B]=\frac{1}{3}, P[A B]=0\)
(ii) \(P[A]=P[B]=\frac{1}{3}, \quad P[A B]=\frac{1}{9}\) 

5.8. Evaluate the probabilities asked for in exercise 5.6 in the case that

(i) \(P[A]=P[B]=P[C]=\frac{1}{3}, P[A B]=P[A C]=P[B C]=\frac{1}{9}, P[A B C]=\frac{1}{27}\) ,

(ii) \(P[A]=P[B]=P[C]=\frac{1}{3}, P[A B]=P[A C]=P[B C]=P[A B C]=0\) .

The size of sets: The various formulas that have been developed for probabilities continue to hold true if one replaces \(P\) by \(N\) and for any set \(A\) define \(N[A]\) as the number of elements in, or the size of, the set \(A\) . Further, replace 1 by \(N[S]\) .

5.9. Suppose that a study of 900 college graduates 25 years after graduation revealed that 300 were “successes”, 300 had studied probability theory in college, and 100 were both “successes” and students of probability theory. Find, for \(k=0,1,2\) , the number of persons in the group who had done of these two things: (i) exactly \(k\) , (ii) at least \(k\) , (iii) at most \(k\) .

 

Answer

\(N\) [exactly 0 ] \(=400 . N\) [exactly 1 \(]=400 . N\) [exactly 2 \(]=100\) .

 

\(N\) [at least 0 ] \(=900 . N\) [at least 1 ] \(=500 . N\) [at least 2 \(]=100\) .

\(N[\) at most 0 \(]=400 . N[\) at most 1 \(]=800 . N[\) at most 2 \(]=900\) .

5.10. In a very hotly fought battle in a small war 270 men fought. Of these, 90 lost an eye, 90 lost an arm, and 90 lost a leg: 30 lost both an eye and an arm, 30 lost both an arm and a leg, and 30 lost both a leg and an eye; 10 lost all three. Find, for \(k=0,1,2,3\) , the number of men who suffered of these injuries: (i) exactly \(k\) , (ii) at least \(k\) , (iii) no more than \(k\) .

5.11. Certain data obtained from a study of a group of 1000 subscribers to a certain magazine relating to their sex, marital status, and education were reported as follows: 312 males, 470 married, 525 college graduates, 42 male college graduates, 147 married college graduates, 86 married males, and 25 married male college graduates. Show that the numbers reported in the various groups are not consistent.

 

Answer

Let \(M, W\) , and \(C\) denote, respectively, a set of college graduates, males and married persons. Show \(N[M \cup W \cup C]=1057>1000\) .

 


  1. Definition: A function is a rule that assigns a real number to each element of a set of objects (called the domain of the function). Here the domain of the probability function \(P[\cdot]\) is the set of all events on \(S\) . ↩︎