# UNREAL PROBABILITIES Partial Truth with Clifford Numbers

## Abstract

This paper introduces and studies the basic properties of Clifford algebra valued conditional measures.

## 1  Introduction

Probability theory was given a firm mathematical foundation in 1933, when Kolmogorov [] introduced his axioms. By defining probability as an uninterpreted special case of a positive measure with total unit mass (plus an additional definition for independence), the subject exploded with new results and found innumerable applications. In 1946, Cox (see []) showed that the Kolmogorov axioms for probability are really theorems that follow from basic desiderata about the representation of partial truth with real numbers. We owe to Ed Jaynes (see []) the discovery of the importance of Cox's 1946 work ([]). After Jaynes, it became clear why the calculus of probability is so successful in the real world. Probability works because its axioms axiomatize the right thing: partial truth of a logical proposition given another. Even more, the rules of probability are unique in the sense that any other set of consistent rules can be brought into the standard sum and product rules by a change of scale (or we may say logical gauge). This is in fact Cox's main result and it makes futile the enterprise of looking for alternatives to the calculus of normalized real valued probabilities. It is only by allowing the partial truth of a proposition to be encoded by an object other than a real number in the interval [0,1] that we could find alternatives to the standard theory of probability.

We seek to find out what happens when standard probability theory is modified by relaxing the axiom that the probability of an event must be a real number in the interval [0,1]. We show that, by allowing the measure of a proposition to take a value in a Clifford Algebra, we automatically find the methods of standard quantum theory without ever introducing anything specifically related to nature itself.

The main motivation for this article has come from realizing that the derivations in Cox [] still apply if real numbers are replaced by complex numbers as the encoders of partial truth. This was first mentioned by Youssef [] and checked in more detail by Caticha [] who also showed that non-relativistic Quantum theory, as formulated by Feynman [], is the only consistent calculus of probability amplitudes. By measuring propositions with Clifford numbers we automatically include the reals, complex, quaternions, spinors and any combination of them (among others) as special cases.

## 2  The Axioms

In this section we introduce the notation and collect the simple properties about Boolean and Clifford algebras that will be needed for the definition of y below.

### 2.1  The Boolean Algebra \cal A

Let \cal A be a boolean s-algebra of propositions a,b,c,Ľ. We denote by 0 the false proposition, by 1 the true proposition, by a+b the logical sum, by ab the logical product and by [a] the negation. Each proposition b Î \cal A  defines the set \cal A b where,

 \cal A b = { ba : a Î \cal A } = b\cal A
(1)
Clearly, \cal A b is a subset of \cal A that contains b and 0, it is closed for sums and products and thus, Ab is a sub algebra of \cal A with b as the unit. From the fact that a = ac + a[c] it follows that

 \cal A  = \cal A c Ĺ\cal A [c]
(2)
Given two propositions a,b Î \cal A  we have,

 \cal A  É \cal A a É \cal A ab
(3)
The set X Ě \cal A  is called the set of elementary propositions of \cal A (and we say that \cal A is a s-algebra of propositions in X) if,

1. for x,y Î X, xy = 0 whenever x ą y

2. every a Î \cal A  is the sum of propositions in X. We write

 a = ĺ x Î a x
(4)

If \cal A and B are two Boolean s-algebras of propositions in X and Y respectively, then \cal A ×B is a Boolean s-algebra of propositions in X×Y if one defines the truth value of (a,b) Î \cal A ×B as the truth value of ab i.e. true only when both a Î \cal A  and b Î B are true. We denote by \cal A n the s-algebra of n copies of \cal A of propositions in Xn. We have,

 P Î \cal A n Ű P = ĺ x Î P x1x2Ľxn
(5)
and by this we mean that P is always the sum of propositions in Xn and each x Î Xn is the conjunction of n propositions, one for each copy of X. Finally we let \cal A * = \cal A \{0}.

Notice that these Boolean s-algebras are nothing but the standard sets where general measures (in particular probability measures) are defined. We chose the notation of logical sums and products instead of the traditional set notation of unions and intersections to emphasize the fact that we are interested in the encoding of partial truth of logical propositions, but this is only a choice of notation and there is a complete one to one correspondence between the two languages. As general references see e.g. Halmos [] or Chow and Teicher [].

### 2.2  The algebra of Clifford numbers \cal G

Let \cal G be an arbitrary finite dimensional Clifford Algebra with real scalars. We try to follow the notation in []. We denote the elements of \cal G by capital letters like, A,B,C,Ľ. A general Clifford number M always expands as the sum of its scalar, vector, bivector, etc parts like:

 M
 =
 < M > 0 + < M > 1 + < M > 2 + Ľ
(6)
 =
 a+ u + B + Ľ
Where < M > k denotes the k-vector part of M. If u and v are vectors in \cal G then their geometric (Clifford) product uv can be decomposed into a symmetric part u·v and an antisymmetric part uŮv as

 uv
 =
 1 2 (uv + vu) + 1 2 (uv - vu)
(7)
 =
 u·v + uŮv
(8)
The inner product between two vectors is always a scalar and their wedge product is always a bivector. The operation of reversion of a clifford number M is denoted by M\dagger and defined as a linear operation with the properties,

 a\dagger = a,  u\dagger = u,  (MN)\dagger = M\daggerN\dagger
(9)
where a is a scalar, u is a vector, and M and N are arbitrary Clifford numbers. The euclidean inner product on \cal G is given by,

 < M,N > \cal G  = < M\daggerN > 0
(10)

### 2.3  Definition of y

By a clifford algebra valued conditional measure (or simply a y) we mean a function,

 y:
 \cal A ×\cal A *
 \longmapsto
 \cal G
 (a,c)
 \longmapsto
 y(a,c)
(11)
satisfying the following two axioms:

(I)
If c Ţ b then

 y(a,c) = y(ab,c)
(12)

(II)
If {a1,a2,Ľ} Ě \cal A , with ajak = 0 for j ą k, then

 y ćč ĺ j aj,c öř = ĺ j y(aj,c)
(13)

Since the only property a proposition in \cal A always has is its truth value, we can interpret y(a,c) as the clifford number that represents the truth in a when c is certain. Axiom (I) says that c is certain (e.g. take a = 1 and b = c) and axiom (II) says that the whole truth of a for a given c is always the sum of the truths of its separate parts.

#### 2.3.1  The truth of 0

By taking each aj = 0 in (13) we get,

 y(0,c) = y(0,c) + y(0,c) + Ľ
(14)
and therefore, y(0,c) is either 0 Î \cal G  or unbounded but if it is unbounded then all the propositions will be assigned an unbounded value since y(a,c) = y(a+0,c) = y(a,c) + y(0,c). Hence,

 y(0,c) = 0   for all c Î \cal A *
(15)

## 3  The spaces Hc

The functions y, as defined by (12) and (13), are specified independently at each c Î \cal A *. So far, there is no link between the y in the domain of discourse of c, i.e. y(·,c) and y in the more specialized domain of discourse of bc, i.e. y(·,bc). We shall talk about changing domains of discourse in the next section but in this section we describe the important properties that the functions y(·,c) have as functions of their first argument only, for fix c Î \cal A *. To simplify the notation simply write y(a) instead of y(a,c) in the formulas below. Thus, whenever the background proposition c is not subject to change we take y as any s-additive function defined on \cal A c with values in \cal G . The condition (12) is automatically satisfied since c is the true proposition in \cal A c.

Let Hc be the set of all s-additive functions defined on \cal A c with values in \cal G .

### 3.1  The Hc are Hilbert spaces

Since the sum of two s-additive functions and the product of a s-additive function by a scalar are still s-additive functions we have that the Hc are vector spaces. The scalars are the scalars in \cal G . In principle the field of scalars could be taken as the reals or the complex numbers but it seems that the reals is all that is needed in most applications.

#### 3.1.1  The inner product in Hc

For, f,y Î Hc define the real inner product between them by:

 < f,y >
 =
 ĺ x Î X < f(x),y(x) > \cal G
(16)
 =
 ĺ x Î X < f(x)\daggery(x) > 0
(17)
By considering only ys with finite norm we make Hc a real Hilbert space. From now on we assume the finite norm to be part of the definition of Hc itself, i.e.

 Hc = {y: y is  s-additive on  \calA c  and ĺ x Î X < y\dagger(x)y(x) > 0   < Ą}
(18)

Notice that the spaces Hc are complete for the inner product (17) since \cal G  with the scalar product < .,. > \cal G  is complete. When X is a finite set (i.e. when it contains only a finite number of propositions) the proof is trivial, just use the fact that if {fn}n Î IN Ě Hc is a Cauchy sequence then for each x Î X the sequence {fn(x)}n Î IN is also a Cauchy sequence of elements of \cal G  and thus it converges to some f(x) Î \cal G  and therefore f Î Hc is the limit of the original sequence in Hc. When X is infinite we need to reinterpret the sums as integrals, (for which we need a measure in X), and we also need to reinterpret the ys as \cal A -measurable densities, but after that, the proof is essentially the standard proof that L2 is complete. An important example of an infinite X occurs when the propositions in X are labeled with the vectors in \cal G . In this case the sum in (17) is replaced by the integral with respect to the standard Lebesgue measure in X.

### 3.2  The isomorphic spaces: H(\cal A c) @ Hc(\cal A )

In order to be able to understand the differences between the current approach and ordinary probability theory it is convenient to introduce two other spaces closely related to Hc. These are, the space of all s-additive functions on \cal A c with values on \cal G ,

 H(\cal A c) = {yc:\cal A c ® \cal G | yc  s-additive on  \cal A c }
(19)
and the space,

 Hc(\cal A ) = {y:\cal A ® \cal G | y s-additive on  \cal A  AND if  c Ţ b, y(ab) = y(a) }
(20)
Both are Hilbert with the inner product (17) and considering only elements of finite norm.

Notice that if y Î Hc(\cal A ) then its restriction to \cal A c belongs to H(\cal A c), i.e.

 y|\cal A c = yc Î H(\cal A c)
and conversely, if yc Î H(\cal A c) then the function y defined by:

 y(a) = yc(ac)  "a Î \cal A
belongs to Hc(\cal A ) since it is clearly s-additive and if cŢ b then [c]+b = 1 and multiplying both sides by c we get, bc = c from where,

 y(ab) = yc(abc) = yc(ac) = y(a)
The map y® yc is obviously linear one to one and onto so it makes the two spaces isomorphic.

Consider now two propositions b and c such that c, bc Î \cal A *. Then we can write:

 H(\cal A bc) @ H(b\cal A c) @ Hb(\cal A c)
(21)
In other words each y(·,bc) Î H(\cal A bc) uniquely defines a function f(·b,c) Î Hb(\cal A c) and that is all we can say. Since the ys are unnormalized we can not write a general product rule as in normalized standard probability theory. Nevertheless, it is possible to justify a restricted product rule for independence as we do in section 5 below. When c = 1 Î \cal A  we simply write H(\cal A ) instead of H1(\cal A ) or H(\cal A 1).

## 4  The truth with y

The remarkable fact about the functions y is that without committing to a particular value for y(1) in \cal G , they still allow to tell what propositions are true. We show in this section that, b Î \cal A  is considered to be true by y when y([b]a) = 0 "a Î \cal A . By liberating ordinary probabilities from the constrain that the probability of the whole space must always be fix at one, we make the space of all possible assignments of partial truth into a Hilbert space without losing the ability to identify truth.

### 4.1  Propositions as operators

Each proposition b Î \cal A  defines two complementary linear operators on Hc by multiplication, [^b], and by addition, \checkb to the first argument of y. In symbols

 ^b y(a,c)
 =
 y(ab,c)  "a Î \cal A
(22)
 \checkby(a,c)
 =
 y(a+b,c) "a Î \cal A
(23)
To simplify the notation we often omit the hats and simply write by instead of [^b]y. From bb = b and b+b = b it follows that [^b] and \checkb are projectors and therefore they are self-adjoint with eigen values either 0 or 1. We can write,

Theorem 1 The following two complementary statements are true.

1. If y Î Hc is an eigen vector of the operator [^b] with eigen value 1 then y(b) = y(1) and we say that y makes b true conditional on c. Conversely, if cŢ b then every y Î Hc is an eigen vector of the operator [^b] with eigen value 1.

2. If y Î Hc is an eigen vector of the operator \checkb with eigen value 1 then y(b) = 0 and we say that y makes b false conditional on c . Conversely, if cŢ [b] then every y Î Hc is an eigen vector of the operator \checkb with eigen value 1.

Proof

1.  ^b y = yŢ y(ab,c) = y(a,c)  "a Î \cal A Ţ y(b,c) = y(1,c)
where the last implication follows by taking a = 1. Conversely, if c Ţ b then from (12) we have,

 y(ab,c) = y(a,c)  "a Î \cal A
Thus, [^b]y = y.

2.  \checkby = yŢ y(a+b,c) = y(a,c)  "a Î \cal A Ţ y(b,c) = y(0,c)
where the last implication follows by taking a = 0. Conversely,

 (c Ţ _b ) Ţ _c + _b = 1 Ţ a+bc = a  "a Î \cal A
Now from the fact that y is a function and applying (12) twice we have,

 y(a+bc,c)
 =
 y(a,c)  "a Î \cal A
 y(ac+bc,c)
 =
 y(a,c)  "a Î \cal A
 y(a+b,c)
 =
 y(a,c)  "a Î \cal A
Thus, \checkby = y ·

The following theorem elaborates on the same theme.

Theorem 2 Let b Î \cal A  be an arbitrary proposition in a s-algebra of propositions in X and let y Î H(\cal A ). The following are all equivalent:

1. by = y i.e., y makes b true.
2. [b]y = 0 i.e., y makes [b] false.
3. ||[b]y|| = 0
4. ||y|| = ||by||

Proof:
We show that 1Ű 2Ű 3Ű 4. First equivalence follows from y = by+ [b]y, the second equivalence is a property of the norm and the third equivalence is Pythagoras theorem since (by) ^([b]y·

It is evident from this last theorem that the norm in the Hilbert spaces Hc provides a mechanism for translating the clifford numbers y(b) assigned to the propositions in \cal A by a function y Î H(\cal A ) into positive real numbers

 || _b y||2
 =
 ||(1-b)y||2 = ||y- by||2
(24)
 =
 ||y||2 - ||by||2
(25)
measuring how close is y from making the proposition b true. It is also clear from (24) and (25) that it is the square of the norm and not just the norm what is needed. It is only with the square of the norms that we can say that the amount of truth of [b] (measured by ||[b]y||2) equals the amount of truth assigned to the true proposition (measured by ||1y||2) minus the amount of truth assigned to b (measured by ||by||2).

### 4.2  Commutativity, orthogonality and a Clifford number times a proposition

Propositional operators can be composed to form other operators. Thus, if a,b Î \cal A  and y Î H(\cal A ) we have,

 ( ^a ^b )y(x)
 =
 ^a ( ^b y(x)) = y(abx) = ^(ab) y(x)
(26)
 ( ^a \checkb)y(x)
 =
 ^a y(x+b) = y(ax+ab)
(27)
 (\checkb ^a )y(x)
 =
 \checkby(ax) = y(ax+b)
(28)
and we can see that checks commute with other checks and hats commute with other hats, but in general, hats don't commute with checks.

If A Î \cal G  and b Î \cal A  we can define the operator Ab by,

 (Ab)y(x) = A(by(x)) = Ay(bx)
(29)
and similarly for A\checkb. These definitions allow a very rich algebra of operators that mix boolean and clifford algebra properties in new ways. One particularly interesting example of this kind of mix is given by the following statement: mutually exclusive propositions are orthogonal. More explicitly, if a,b Î \cal A  and y1,y2 Î H(\cal A ) then, ab = 0 Ţ < ay1,by2 > = 0 and pythagoras theorem holds,

 || ay1 + by2 ||2 = ||ay1||2 + ||by2||2
(30)

## 5  Independence

If the clifford number y(a,c) is interpreted as a representation of the partial truth of a when we assume c to be certain then there is only one rational way to define independence namely:

Preliminary Definition: We say that y makes propositions a and b in \cal A  logically independent conditionally on c Î \cal A * if the additional knowledge of one of them does not affect the value of y for the other. i.e,

 y(a,bc)
 =
 y(a,c)
 AND
(31)
 y(b,ac)
 =
 y(b,c)
whenever the conditional ys exist.

### 5.1  A restricted product rule

If we try to find the value of y(ab,c) in terms of the partial truths that y assigns to a and b, then the most general relation is,

 y(ab,c) = F(y(a,c),y(a,bc),y(b,c),y(b,ac))
(32)
where F is an arbitrary function of its arguments. If we assume further that a and b are logically independent conditionally on c then using (31) the most general relation becomes,

 y(ab,c) = F(y(a,c),y(b,c))
(33)
Let u = y(a,c), v = y(b,c) and w = y(d,c) and use the commutativity and associativity properties of the logical product to get the following two properties for the function F:

 F(u,v)
 =
 F(v,u)
(34)
 F(F(u,v),w)
 =
 F(u,F(v,w))
(35)
In other words, F must be symmetric and it must satisfy the usual associativity equation. If the ys take values only on a commutative subspace of \cal G (e.g. reals, complex or pseudo scalars) then the only solution is F(u,v) = uv (see []) but this can not be the solution if uv ą vu. Given that F(u,v) must be symmetric, and that it must reduce to uv when u and v commute, and obvious solution is given by the symmetrization of the product, i.e. (uv+vu)/2. In principle, it seems feasible that a modification of the standard argument of Aczel (see [] or []) may yield the symmetrized product as the unique solution of (35) and (34) for u,v Î \cal G  at least for u,v in some subset of \cal G  for which (35) is still true. At the present time there is no such proof. In any case the lack of a proof for the uniqueness is not a deterrent to turn the formula into the definition for independence. If it turns out that there are multiple solutions (which seems highly unlikely) the results obtained from this particular solution will still be valid. Thus, from now on we say that y makes a and b (logically) independent given c if,

 y(ab,c) = 1 2 [ y(a,c)y(b,c) + y(b,c)y(a,c)]
(36)
More generally we have,

Definition: We say that y makes a1,a2,Ľ,an logically independent given c if, for k = 1,2,Ľ,n and 1 Ł i1 < i2 < Ľ < ik Ł n

 y( k Ő j = 1 aij,c) = 1 k! ĺ s y(as(i1),c) y(as(i2),c)Ľy(as(ik),c)
(37)
where the sum runs over all the permutations, s of (i1,i2,Ľ,ik).

The associativity equation (35) imposes a heavy restriction on the possible y assignments for independent propositions. In fact we have,

Theorem 3 If y makes three or more propositions a,b,d,Ľ independent conditionally on c then the clifford numbers u = y(a,c), v = y(b,c), w = y(d,c),Ľ are such that each of them commutes with the anticommutator of any other two.

Proof:
From (37) it suffices to show that v must commute with F(u,w) = (uw+wu)/2 when a,b,d are independent given c.

The right hand side of (35) simplifies to,

 F(u,F(v,w)) = 1 4 { uvw - uwv + vwu - wvu }
(38)
Similarly the left hand side of (35) is given by,

 F(F(u,v),w) = 1 4 { uvw - vuw + wuv - wvu }
(39)
Equating (38) to (39) and simplifying we get,

 [v,F(u,w)] = 0
(40)
where [u,v] = uv - vu denotes the usual commutator product ·

Notice that when the clifford numbers u,v,w,Ľ either commute or anticommute with each other then (40) is true. But there are many other solutions. For example (40) is also true when u,v,w,Ľ are arbitrary vectors.

### 5.2  Independence and Orthogonality

The above definition for independence makes the following statement true:

Theorem 4 If y(ab,c) = 0 then y(a,c) anticommutes with y(b,c) when and only when y makes a and b independent given c.

There is nothing like this in standard probability theory where mutually exclusive events that are possible (i.e. that have positive probability) are never independent. We are used to think that this makes sense, for if we know that one of the events happens then we also know that the other couldn't happen. The events are totally linked so they can't be independent.

This is fine for real numbers, that are commutative, but not with clifford numbers. There is however, a extreme case where the above theorem is true even in standard probability theory. Suppose that a,b and c are three mutually exclusive propositions. Then, y(ab,c) = y(a,c) = y(b,c) = y(a,c)y(b,c) = 0 and we would have to say that a and b are independent given c even though neither a nor b are possible given c. Anticommutativity allows this to happen even when y(a,c) and y(b,c) are not zero. Two events can be completely linked (i.e. mutually exclusive) and at the same time be logically independent from each other! This is as weird as entanglement in quantum mechanics.

## 6  Flipping n coins

If \cal A is a s-algebra of propositions in X then, by the s-additivity property, every y Î H(\cal A ) is completely specified on \cal A  by just giving y(x) for all x Î X, i.e.,

 y(x) = ĺ y Î X y(y)dy(x)
(41)
where for x,y Î X, dy(x) is 1 Î \cal G  if x = y and 0 Î \cal G  otherwise.

We consider the following special case.

### 6.1  The Binomial experiment with ys

Let a be an arbitrary proposition and let X = {a,[a]} and \cal A  = {1,0,a,[a]}. Clearly \cal A  is a boolean algebra of propositions in X. From (41) we have

(42)
where y Î H(\cal A ) and A,B Î \cal G . This is the canonical Bernoulli experiment. There are only two possible outcomes a and [a] with partial truths encoded by the clifford numbers A = y(a) and B = y([a]). As in standard probability theory, consider now n independent repetitions of the Bernoulli experiment. i.e., consider Xn with its corresponding boolean algebra \cal A n of elements in Xn (see (5)). From (41) a general yn Î H(\cal A n) is given by,

 yn(x) = ĺ y Î Xn yn(y)dy(x)
(43)
From the assumption that yn make the different repetitions independent, we obtain, using (37) that

 yn(x) = yn(x1,...,xn) = Mn(m(x))
(44)
where for each integer k with 0 Ł k Ł n, Mn(k) Î \cal G  is the symmetrization of the product AkBn-k and m(x) is the number of a's in x. Here are some examples for n = 2 and n = 3,

 y2(a,a)
 =
 A2 = M2(2),  y2(a, _a ) = y2( _a ,a) = 1 2 [AB + BA] = M2(1)
 y3(a,a, _a )
 =
 y3(a, _a ,a) = y3( _a ,a,a) = 1 3 [A2B + ABA + BA2] = M3(2).

Now define the proposition Pnk Î \cal A n by,

 Pnk = exactly k of the n repetitions is an a˘˘
(45)
Recall that by (22) we have,

Pnkyn(x) = yn(Pnkx) = ě
í
î
 yn(x)
 if   Pnkx ą 0
 0
 otherwise
(46)
By the first part of theorem (1) we have that yn makes Pnk true when Pnkyn = yn. So the question is: How far is yn from making Pnk true?. Answer: ||yn - Pnkyn ||2.

### 6.2  Computation of ||yn - Pnkyn||2

To compute this distance we use the fact that Pnk, and its negation in \cal A n, 1-Pnk, are mutually exclusive propositions hence orthogonal (see (30)) and by pythagoras,

 ||yn-Pnkyn||2 = ||yn||2-||Pnkyn||2
(47)
Let us compute each of these terms. From (17),

 ||yn||2 = ĺ x Î Xn áyn\dagger(x)yn(x)ń0.
(48)
and using (43) and (44) we can write,

 yn(x) = ĺ y Î Xn Mn(m(y))dy(x)
(49)
from where we obtain,

 yn\dagger (x)yn(x)
 =
 ĺ y1,y2 Î Xn Mn\dagger (m(y1)Mn(m(y2))dy1(x)dy2(x)
 =
 ĺ x Î Xn Mn\dagger (m(x))Mn(m(x))
and replacing in (48) we get,

 ||yn||2
 =
 ĺ x Î Xn | Mn(m(x))| 2
 ||yn||2
 =
 n ĺ j = 0 ćç č n )j |Mn(j)|2
(50)
the last equation followed from the fact that there are (n/)j] propositions in Xn with exactly j components equal to a. We use the same fact again to compute the other norm in (47),

 ||Pnkyn||2 = ĺ x Î Xn áyn\dagger (Pnkx)yn(Pnkx)ń0
(51)
to obtain,

 ||Pnkyn||2 = ćç č n )k |Mn(k)|2
(52)
Replacing (50) and (52) in (47) we get,

 ||yn-Pnkyn||2 = n ĺ j = 0 ćç č n )j |Mn(j)|2- ćç č n )k |Mn(k)|2
(53)
Let us consider the proposition, Pn,ef Î \cal A n defined by,

 Pn,ef
 =
 The observed frequency of a˘s in n independent repetitions is
 k/n with f-e Ł k n Ł f+e˘˘
(54)
in other words for x Î Xn, Pn,efx ą 0 when and only when the proportion of a's in x = (x1,Ľ,xn) is within e from the specified frequency f. The proposition Pn,ef is equal to the following disjunction of 2ne+1 mutually exclusive propositions Pnk:

 Pn,ef = n(f+e) ĺ k = n(f-e) Pnk
(55)
hence, from (30) we get,

 ||Pn,efyn||2 = n(f+e) ĺ k = n(f-e) ||Pnkyn||2
(56)
and from (47) and (53) we can write,

 ||yn-Pn,efyn||2 = n ĺ j = 0 ćç č n )j |Mn(j)|2- n(f+e) ĺ k = n(f-e) ćç č n )k |Mn(k)|2
(57)
In general this distance increases without limit as n® Ą but it can converge relative to the size of yn. Let us define the relative error by,

 Dn,ef = ||yn-Pn,efyn||2 ||yn||2
(58)
Using (57) and (50) we have,

Dn,ef = 1 -
 n(f+e) ĺ k = n(f-e) ćç č n )k |Mn(k)|2

 n ĺ k = 0 ćç č n )k |Mn(k)|2
(59)
We separate the computation of Dn,ef into three different cases.

### 6.3  Case: AB = BA

From (59) we can write the following,

Theorem 5 If AB ą 0, AB = BA and |AkBn-k| = |A|k|B|n-k then,

 Dn,ef = 1 - n(f+e) ĺ k = n(f-e) ćç č n )k pk(1-p)n-k
(60)
where,

 p = |A|2 |A|2 + |B|2
(61)

Proof:
Under the conditions of the theorem we have,

 |Mn(k)|2 = |A|2k|B|2(n-k)
replacing this last equation in (59) and noticing that,

 n ĺ k = 0 ćç č n )k |A|2k|B|2(n-k) = ( |A|2 + |B|2 )n
we immediately obtain (60) and (61) ·

It is not always true that for A,B Î \cal G , |AB| = |A| |B| even when AB = BA (take for example A = 1+au, B = 1-bu for a unit vector u and scalars a and b) so the extra condition besides commutativity is needed for the theorem to be true.

### 6.4  Case: AB = 0

Unlike the real (or complex) numbers, the product of non zero clifford numbers can be zero (e.g. take a = b = 1 in the example above) so this case is not trivial. When AB = 0 the following is true,

Theorem 6 If AB = 0 then,

Dn,ef = ě
ď
ď
ď
í
ď
ď
ď
î
 |A|2n |A|2n+|B|2n
 if f = 0
 1
 if 0 < f < 1
 |B|2n |A|2n+|B|2n
 if f = 1
(62)

Proof:
Notice that when AB = 0 then all the symmetrized products, except the two extremes are zero, i.e., Mn(k) = 0 for all 0 < k < n and Mn(n) = An and Mn(0) = Bn. Substituting these values into (59) we obtain (62) ·

### 6.5  Case: AB = -BA

When A and B anticommute we have,

Theorem 7 If AB ą 0, AB = -BA and |AkBn-k| = |A|k|B|n-k then,

Dn,ef = 1 -
 n(f+e) ĺ k = n(f-e) bn(k)(1-2ln(k))2

 n ĺ k = 0 bn(k)(1-2ln(k))2
(63)
where bn(k) are the binomial probabilities,

 bn(k) = ćç č n )k pk(1-p)n-k,  with p as before. i.e., p = |A|2 |A|2+|B|2
(64)
and the numbers ln(k) satisfy ln(k) = ln(n-k) and for k Ł n/2, ln(k) is the chance of drawing and odd number of RED balls out of k draws without replacement from a box containing either: n/2 REDS and n/2 BLUES if n is even or (n+1)/2 REDS and (n-1)/2 BLUES if n is odd.

Proof:
Recall that Mn(k) is the symmetrization of AkBn-k, i.e., the average over all the permutations of AkBn-k. There are (n/)k] permutations and, by the assumed anticommutativity of A with B, each permutation is either AkBn-k or -AkBn-k so we have,

Mn(k) = r(n,k)AkBn-k
 ćç č n )k
(65)
where r(n,k) is an integer. From the fact that |Mn(k)| is invariant under the transformation: A® B, B® A, and k® (n-k) it follows that |r(n,k)| = |r(n,n-k)|. In order to prove the theorem it is sufficient to show that,

|r(n,k)|
 ćç č n )k
= |1 - 2ln(k)|
(66)
since if (66) is true, by using the conditions of the theorem we have,

 |Mn(k)|2 = (1-2ln(k))2 |A|2k|B|2(n-k)
(67)
and dividing the numerator and the denominator of (58) by (|A|2+|B|2)n we obtain (63).

Let us show that (66) is true by giving an explicit formula for |r(n,k)| when k Ł n/2. To do this, represent each permutation of AkBn-k by the k integers (j1j2Ľjk) that correspond to the positions of the A's in increasing order. For example, for n = 6 and k = 3, the permutation ABABBA is represented by (136), since the As are found at positions 1,3 and 6. The permutation AABBBA is represented by (126) etc. Define the parity of (j1Ľjk) as

 parity of   (j1j2Ľjk) = (-1)j1+j2Ľ+jk = (-1)j1(-1)j2Ľ(-1)jk
(68)
Note that the transposition of an A with a B, located next to it, changes by one the position of that A in the permutation and hence, the parity of the permutation obtained after the transposition is always the reverse of the parity of the original permutation. From this and the fact that we can transform any permutation into any other by a sequence of transpositions it follows that two permutations have the same parity if and only if the number of flips (transpositions) necessary for transforming one permutation into the other is even.

The permutation AkBn-k always corresponds to (12Ľk) and therefore an arbitrary permutation (j1j2Ľjk) will have the same parity as AkBn-k if the parity of the number of odd integers in the set {j1,j2,Ľ,jk} is the same as the parity of the number of odd integers in the set {1,2,Ľ,k}. In other words, if there are an even number of odd integers in the set {1,2,Ľ,k} then every permutation (j1j2Ľjk) which also contains an even number of odd integers can be reorder into AkBn-k but if the number of odd integers in {j1,Ľ,jk} is odd then the permutation reorders into -AkBn-k. Therefore, we can write

 |r(n,k)| = | ĺ 1 Ł j1 < j2Ľ < jk Ł n (-1)j1+j2+Ľ+ jk|
(69)
Thus, if we call Ne the number of permutations with an even number of odd integers among {j1Ľ,jk} and we call No the number of permutations with an odd number of odds, then,

 |r(n,k)| = | Ne - No |
(70)
using the fact that Ne+No = (n/)k] we also have that,

 |r(n,k)| = | ćç č n )k - 2 No|
(71)
We now turn to the computation of No. Let No(m) be the total number of permutations (j1j2Ľjk) with exactly m of the positions of the A's being odd. We have,

No = ě
ď
ď
í
ď
ď
î
 k/2-1 ĺ t = 0 No(2t+1)
 if k is even
 [(k-1)/ 2] ĺ t = 0 No(2t+1)
 if k is odd
(72)
where, for 0 Ł m Ł k Ł n/2

No(m) = ě
ď
ď
í
ď
ď
î
 ćç č n/2 )m ćç č n/2 )k-m
 if n is even
 ćç č (n+1)/2 )m ćç č (n-1)/2 )k-m
 if n is odd
(73)
this is because the set {1,2,Ľ,n} contains an equal number of odd and even numbers when n is even but the number of odds is one more than the number of even when n is odd. So dividing (71) by (n/)k] and using (72) and (73) we obtain (66) with ln(k) defined as the theorem says. There are four different formulas for ln(k) depending on the parities of n and k.Let us check one of them. When n and k are both even and k Ł n/2 we have,

ln(k) = k/2-1
ĺ
t = 0
 ćç č n/2 )2t+1 ćç č n/2 )k-2t-1

 ćç č n )k
(74)
and we can see that (74) is the chance of drawing an odd number of red balls when drawing at random k balls, without replacement, from a box containing n/2 red balls and n/2 blue balls. This completes the proof of the theorem ·

## 7  The weak law of large numbers

### 7.1  Taking limits as n®Ą

In this section we compute
 lim n® Ą Dn,ef
for the three cases considered in the previous section.

Theorem 8 If AB ą 0, |AkBn-k| = |A|k|B|n-k and either AB = BA or AB = -BA then, for all sufficiently small e > 0,

lim
n®Ą
Dn,ef = ě
í
î
 0
 if f = p
 1
 if f ą p
(75)
where as before, p = [(|A|2)/( |A|2+|B|2)]. Moreover, if AB = 0, then "e > 0,

lim
n®Ą
Dn,ef = ě
ď
ď
í
ď
ď
î
 0
 if (f = 0 and |A| < |B|) or (f = 1 and |A| > |B|)
 1/2
 if |A| = |B| and either f = 0 or f = 1
 1
 otherwise
(76)

Proof
For the first part we use equations, (60) and (63). By the usual gaussian approximation for the binomial probabilities (e.g. see [] p.59) we have that for any integers 0 Ł k1 Ł k2 Ł n and any function gn with finite expectation with respect to the standard gaussian,

k2
ĺ
k = k1
bn(k) gn(k) = ó
ő
[(k2-np)/( [Önpq])]

[(k1-np)/( [Önpq])]
gn(np + x   ___
Önpq

) 1
 æÖ 2p
e[(-x2)/ 2] dx (1+o(n0))
(77)
thus, taking k1 = n(f-e), k2 = n(f+e), gn(y) = 1 for 0 Ł y Ł n and gn(y) = 0 outside [0,n] we obtain from equation (60) that,

lim
n®Ą
Dn,ef = 1 -
lim
n®Ą
ó
ő
[(n(f-p+e))/( [Önpq])]

[(n(f-p-e))/( [Önpq])]
1
 æÖ 2p
e[(-x2)/ 2] dx
(78)
hence, when f ą p for any 0 < e < |f-p| the limits of the integral in equation (78) are both positive or both negative and both going to Ą as n®Ą so the desired limit is 1-0 = 0. On the other hand when f = p for any e > 0 the desired limit is 1-1 = 0 and this shows that (75) is true for the commutative case. To show (75) for the anticommutative case we take

 gn(y) = 1 4 y2(y-1)2
(79)
which increases like y6 and therefore it has finite expectation with respect to the standard gaussian. If we show that

 gn(k) = n2(1-2ln(k))2 + o(n0)
(80)
then it will follow from (80), (77) and (63) that,

lim
n®Ą
Dn,ef = 1 -
lim
n®Ą
ó
ő
[(n(f-p+e))/( [Önpq])]

[(n(f-p-e))/( [Önpq])]
gn(np + x   ___
Önpq

) 1
 æÖ 2p
e[(-x2)/ 2] dx

ó
ő
[nq/( [Önpq])]

[(-np)/( [Önpq])]
gn(np + x   ___
Önpq

) 1
 æÖ 2p
e[(-x2)/ 2] dx
(81)
and by the same reasoning as in the commutative case we obtain (75) for the anticommutative case. Let us then show (80). Notice that from (74) we can write,

 ln(k) = k/2-1 ĺ t = 0 W(n,2t+1,k)
(82)
where the hypergeometric probabilities,

 W(n,m,k)
 =
 ćç č n/2 )m ćç č n/2 )k-m

 ćç č n )k
 =
 éę ë 1 nm ćç č n/2 )m ůú ű éę ë 1 nk-m ćç č n/2 )k-m ůú ű

 éę ë 1 nk ćç č n )k ůú ű
(83)
 =
 éę ë 1 m! 1 2 ( 1 2 - 1 n )Ľ( 1 2 - m-1 n ) ůú ű éę ë 1 (k-m)! 1 2 ( 1 2 - 1 n )Ľ( 1 2 - k-m-1 n ) ůú ű

 éę ë 1 k! 1(1- 1 n )(1- 2 n )Ľ(1- k-1 n ) ůú ű
expanding the products up to terms of order (1/n) and letting W = W(n,m,k) we have,

 W
 =
ć
ç
č
k
)m
 éę ë 2-m{1 - m(m-1) n + o(n-1)} ůú ű éę ë 2m-k{1- (k-m)(k-m-1) n + o(n-1)} ůú ű

 1 - k(k-1) n + o(n-1)
 =
 ćç č k )m ćç č 1 2 ö÷ ř k ěí î 1 - 1 n [m2+(k-m)2-k] + o(n-1) üý ţ ěí î 1 + k(k-1) n + o(n-1) üý ţ
 =
 ćç č k )m ćç č 1 2 ö÷ ř k ěí î 1 + 2 n m(k-m) + o(n-1) üý ţ
(84)

 k/2-1 ĺ t = 0 ćç č k )2t+1 ćç č 1 2 ö÷ ř k = 1 2
(85)
and that,

 k/2-1 ĺ t = 0 (2t+1)(k-2t-1) ćç č k )2t+1 = 1 4 k(k-1)2k-1
(86)
From (85), (86), (84) and (82) we have,

 ln(k) - 1 2 = k(k-1) 4n + o(n-1).
(87)
Squaring both sides of (87) and multiplying through by 4n2 we obtain,

 n2(1-2ln(k))2 = 1 4 k2(k-1)2 + o(n0)
(88)
which is exactly (80). This ends the proof for the anticommutative case. The second part of the theorem i.e. (76) follows directly from (62) by taking limits as n®Ą·

### 7.2  Flipping an infinite number of coins

As in standard probability theory there is a subtle nuisance with limits such as (75) and (76) that needs to be faced in order to have a straight probabilistic interpretation for laws of large numbers. The problem with (75) and (76) is that it is not clear how to paste all the yn together into one global yĄ. It was due to these kind of problems that modern measure-theoretic probability theory was born.

To be able to make statements about infinite sequences of bernoulli trials we need to specify a boolean s-algebra, \cal A Ą, that contains at least those statements. This can be done as in standard probability theory (e.g. see []), i.e. \cal A Ą is defined as the smallest s-algebra containing the cylinder sets, in particular it contains the propositions Pnk defined in (45) but now n refers to the first n repetitions in an infinite sequence of bernoulli trials. Having constructed \cal A Ą we also need to construct the Hilbert space, H(\cal A Ą), containing the functions y = yĄ. Again, the construction is not trivial but well known in functional analysis as the standard construction of an infinite tensor product of Hilbert spaces (e.g. see []). These standard constructions allow us to write,

 Pnky = Pnkyn
(89)
where y = yĄ Î H(\cal A Ą). Equation (89) can be used to re-write the statements (75) and (76) as,

Theorem 9 Let XĄ be the space of infinite sequences of independent tosses of a coin and let \cal A Ą be the smallest s-algebra containing all the propositions Pnk about elements in XĄ. If for each toss the y values for falling heads and tails are the clifford numbers A and B satisfying,

1. |A|2+|B|2 = 1
2. AB ą 0
3. either AB = BA or AB = -AB
4. |AkBn-k| = |A|k|B|n-k   "n Î IN, "0 Ł k Ł n.

Then, for all sufficiently small e > 0 the propositions,

 PĄ,e|A|2 Î \cal A Ą
are true.

Proof
Under the conditions of the theorem we have from (89) and (75) that when the yn are normalized i.e. when ||yn|| = 1 for all n then,

 ||y- Pn,e|A|2y|| ® 0  as  n ®Ą
or equivalently,

 lim n®Ą Pn,e|A|2 y = PĄ,e|A|2y = y
(90)
so that y is an eigen vector of the operator PĄ,e|A|2 with eigen value 1 and thus, it makes the proposition true ·

We also have,

Theorem 10 Let XĄ and \cal A Ą be as in the previous theorem but now suppose that the clifford numbers A and B satisfy,

1. AB = 0
2. |A| > |B|

Then for all e > 0 the propositions,

 PĄ,e1 Î \cal A Ą
are true.

Proof
Under the conditions of the theorem we have from (89) and (76) that when the yn are all of unit norm then

 ||y- Pn,e1y|| ® 0  as  n ®Ą
or equivalently,

 lim n®Ą Pn,e1 y = PĄ,e1y = y
(91)
so that y is an eigen vector of the operator PĄ,e1 with eigen value 1 and thus, it makes the proposition true ·

### 7.3  Interpretation and examples

The previous two theorems can be interpreted as in standard probability theory. They say that an infinite sequence of independent tosses of a coin with y( heads ) = A and y( tails ) = B will have for sure (relative to y) a frequency of heads within e from |A|2 in the first case and within e from 1 in the AB = 0 case. When AB = 0 the theorem assures us that (again relative to y) the coin will show up heads with frequency 100% whenever |A| > |B| !

The four conditions on A and B that are needed for the AB ą 0 case, impose heavy restrictions on the possible values that A and B can take but there are lots of examples. Let p be a real number in the interval [0,1] and consider,

Example 1

 A = Öp B = ___Ö1-p
(92)
Example 2

 A = Öp B = ___Ö1-p ^B
(93)
where [^B] = s1s2Ľsr is a unit blade, i.e. it can be factorized into a product of orthogonal (anticommuting) unit vectors sj.
Example 3

 A = Öp ^A B = ___Ö1-p ^B
(94)
where [^A] and [^B] are both unit blades possibly of different dimensions.
Example 4

 A = Öp eia ^A B = ___Ö1-p eib ^B
(95)
where a and b are scalars, [^A] and [^B] are both unit blades and i is any multivector such that i2 = -1 and i commutes or anticommutes with both [^A] and [^B] i.e. i[^A] = ±[^A]i and i[^B] = ±[^B]i

It can be readily check that all these examples satisfy the four conditions of the theorem and hence, coin tosses with these ys will show up heads with probability p.

### 7.4  Why isn't every one a frequentist?

For the same reason as in probability theory these laws of large numbers can not be used to define what we mean by the partial truth that the coin will show up heads in the next toss since the theorem only says that the propositions PĄ,ep are made true by y. So any attempt to use the law of large numbers as the definition of what y is, or means, is therefore circular.

## 8  The Boolean algebra of Caticha's temporal filters

Let X be a set and let \cal B  be a s-algebra of subsets of X. Notice that we are using the standard set notation for the elements of \cal B  instead of the logical notation used in the rest of the paper. The reason for changing the notation is that the boolean s-algebra that we are trying to define is not \cal B  itself but only based on \cal B . Think of X as the set of possible locations for a point particle and define the elementary propositions e(x,t) by the statement: the particle is at location x at time t. As in [], e(x,t) is a pure hypothesis not the result of a measurement. The truth value of e(x,t) can be obtained, at least in principle, by imagining a filter that covers all of X except at location x where it has an infinitesimal hole. This magical filter materializes only for an instant at time t and then disappears leaving no trace of its existence. If after time t we still find the particle somewhere then we conclude that e(x,t) is true. These filters form a boolean algebra with the definitions below.

Let T be a subset of the real line and define for t Î T and B Î \cal B  the proposition e(B,t) as: an elementary filter at time t with B open. Thus, e(B,t) is true if and only if the statement: the particle is somewhere in B at time t is true. We define the logical product of two elementary filters as the operation of putting one on top of the other and we define the negation of an elementary filter as the filter that closes the holes and opens the rest. In symbols:

 e(B1,t)e(B2,t)
 =
 e(B1ÇB2,t)
(96)
 e(B,t)
 =
 e( B ,t)
(97)
 e(B1,t)+e(B2,t)
 =
 e(B1ČB2,t)
(98)
where [B] = X \B is the complement of B with respect to X. Notice that (98) follows from (96) and (97) by using De'Morgan's law i.e.,

 e(B1,t)+e(B2,t)
 =

 e( B1 ,t)e( B2 ,t)

 =

 e( B1 Ç B2 ,t)

 =
 e(B1ČB2,t)
(99)
We also have that for all s,t Î T,

 e(B1,s)e(B2,t)
 =
 Filter at time s followed (or on top of) filter at time t˘˘
 e(B1,s)+e(B2,t)
 =
 Filter at time s OR filter at time t˘˘
 e(f,t)
 =
 Barrier (nothing open) at time t˘˘ = 0
(100)
 e(X,t)
 =
 `Absence of filter (all open) at time t˘˘ = 1
(101)
We define \cal F  as the smallest s-algebra containing the elementary filters e(B,t) i.e.,

 \cal F  = s{e(B,t) : B Î \cal B , t Î T }
(102)
The boolean algebra of temporal filters \cal F  is a spell out of the usual algebra of events of a stochastic process with state space X.

### 8.1  The Markov Property

Due to the fact that there is no product rule for the unnormalized ys we cannot make use of the standard Markov property of probability theory directly. The following definition is all that is needed to recover non relativistic quantum mechanics,

Definition: y Î H(\cal F ) is said to have independent segments given c Î \cal F  if for all n = 1,2,Ľ, all times t1 < t2 < Ľ < tn in T and all locations x1,x2,Ľ,xn in X the propositions

 e(x1,t1)e(x2,t2), e(x2,t2)e(x3,t3),Ľ,e(xn-1,tn-1)e(xn-1,tn-1)
are independent given c.

### 8.2  Time evolution and the Shrödinger equation

When y Î H(\cal F ) has independent segments, it evolves according to the Shrödringer equation. The usual jargon of quantum mechanics is recovered with the notation,

Probability Amplitude:
y(e(x,s)e(y,t),e(x0,t0)) is the amplitude for the particle to go from location x at time s to location y at time t > s given that it was initially prepared at location x0 at time t0. We denote this amplitude by K(y,t;x,s).

Wave Function:
y(e(x,t),e(x0,t0)) is the amplitude of going from the initial position to location x at time t. It is often denoted by just Y(x,t).

Thus, with this notation, a particle which is prepared by e(x0,t0) and for which y Î H(\cal F ) has independent segments conditionally on this preparation, will satisfy,

 Y(x,t) = ĺ y Î X 1 2 [ K(x,t;y,s)Y(y,s) + Y(y,s)K(x,t;y,s)]
(103)
since

 Y(x,t)
 =
 ĺ y Î X y(e(x,t)e(y,s),e(x0,t0))
 =
 ĺ y Î X y ( [e(x0,t0)e(y,s)] [e(y,s)e(x,t)],e(x0,t0))
taking derivatives in (103) with respect to t and evaluating at t = s we obtain,

 ¶Y(x,t) ¶t ęę ę t = s = ĺ y Î X 1 2 éę ë ¶K(x,t;y,s) ¶t ęę ę t = s Y(y,s) + Y(y,s) ¶K(x,t;y,s) ¶t ęę ę t = s ůú ű
Defining the Hamiltonian H by,

 ¶K(x,t;y,s) ¶t ęę ę t = s = - i (h/2p) H(x,y,s)
(104)
where i is any multivector that squares to -1 and that it commutes with all the ys. Relabeling s with t we can write Shrödinger equation for possible non-commuting ys as,

 i(h/2p) ¶Y(x,t) ¶t = ĺ y Î X 1 2 [ H(x,y,t)Y(y,t) + Y(y,t)H(y,x,t)]
(105)
when the wave functions Y commute with the Hamiltonian, (e.g. when all the ys take values in a commutative subspace of \cal G ) (105) reduces to the usual Shrödinger equation.

## 9  Next:

Using the Spacetime algebra
How to connect the above with the Dirac-Hestenes equation.
y assignments in the real continuous case
Minimum Fisher information and the Huber-Frieden derivation of the time independent Shrödinger equation.
y and Brownian motion
Nagasawa's diffusion model.