*Part I: TOWARDS THE THEORY*

*Part II: CONVERSION*

Chapter 6: Classical Lambda Calculus.

Chapter 7: The Theory of Combinators.

Chapter 8: Classical Lambda Calculus (Continued)

*Part III: REDUCTION*

Chapter 11: Fundamental Theorems.

Chapter 13: Reduction Strategies.

Appendix A: Typed Lambda Calculus.

*Part I: TOWARDS THE THEORY*

**Chapter 1: Introduction.** Lambda calculus is a theory of
functions as rules instead of graphs. The objects of study (type
free lambda terms) can be used as both function and argument.

Lambda calculus was originally invented to provide a general theory of functions which could be extended to provide a foundation for mathematics. Church's original system [1932/33] was inconsistent due to the Kleene-Rosser paradox [1935]. One response by Church [1941] was to study the subsystem given by the lambda-I calculus. [Another response [1940] was to study a system based on typed lambda calculus.] Curry proposed to extend pure combinatory logic by illative notions to give a foundation for mathematics. This program has not been completed. Another foundational approach related to lambda calculus is Feferman's [1975/80] systems for constructive mathematics based on partial application. These systems are related to Wagner [1969] and Strong's [1968] Uniformly Reflexive Structures. Also, there are relationships between typed lambda calculus, proof theory and category theory.

Lambda calculus provided a successful characterization of computability. Kleene [1936] proved lambda-definability coincides with Godel-Herbrand recursiveness. Turing showed Turing Machine computability coincides with lambda-definability. Later, the lambda calculus influenced the design of programming languages including Algol 60 and 68 and Pascal (all of which allow procedures to be arguments), LISP (which allows functions to be passed as arguments and to be return values), and Reynold's GEDANKEN [1970] (which was explicitly based on lambda calculus). Lambda calculus provides an operational semantics of programming languages. Denotational semantics require models X with a function space X->X which can be embedded in X. Scott [1969] solved this problem by considering continuous functions where X is given an appropriate topology. Given such denotational semantics, we can interpret recursion using least fixed points.

Pure lambda calculus is the formal theory of functions under application and abstraction. This may be contrasted with category theory which may be considered a formal theory of functions under composition. In pure lambda calculus, we study terms modulo convertibility. A theory in lambda calculus gives equations between terms. The theory Lambda is the base theory (with alpha and beta conversion) with no extra equations. Lambda-Eta is the extensional lambda calculus and includes eta conversion.

We can embed the natural numbers into lambda calculus and prove that the functions from naturals (or products of naturals) to naturals definable by lambda terms are precisely the recursive functions. The range of a definable function on terms is always infinite or a singleton. The Scott topology can be translated to Bohm trees associated with terms. This provides a topology on terms for which the definable functions on terms are continuous.

Some properties of the applicative behavior of lambda terms correspond to the syntactic form of terms. For example, F is solvable (i.e., for every Q there is an n and lambda terms P_1,...,P_n so that FP_1...P_n = Q) iff F has a head normal form (i.e., F = lambda x_1 . . . x_n [x_i M_1 . . . M_m]). Also, F is beta-eta-invertible iff "F is a finite hereditary permutation" (a syntactically defined class).

We may consider the theory of lambda calculus to be loosely analogous to number theory. We begin with the term model of lambda calculus M_Lambda (corresponding to the integers). We may add equations to the theory giving term models M_T (corresponding to Z/m for some m). There are canonical homomorphisms from M_T onto M_T' whenever T is a subset of T' (corresponding to Z/m->>Z/m' whenever m' divides m). In number theory, the integers can be extended to the rationals, and we may use topology to extend the rationals to the reals or to the p-adic fields Q_p. In lambda calculus, we may construct models P-omega and D-infinity via topological notions. These models can be used to prove results about lambda terms.

A subset X of a poset is directed if it is nonempty and for every x and y in X there is a z in X with x<=z and y<=z. A poset is a complete partial ordering [cpo] if there is a bottom element and every directed set has a sup. For any set X, the power set of X is a complete lattice, hence a cpo, under the subset ordering. Another cpo is {bottom,t,f} where bottom < t and bottom < f. Also, given two sets X and Y, we can define the cpo of partial functions from X to Y (given as graphs) ordered by subset (on the graphs).

A cpo (D,<=) induces a topology called the Scott topology on D. An open set O in the Scott topology is closed upwards (i.e., x in O and x<=y imply y in O) and cannot be approached from the outside by the sup of a directed set (i.e., if X is directed and sup(X) is in O, then there is an x in X and O). [Perhaps it is easier to think of closed sets--closed downwards and closed under sups of directed sets.] For each x in D, U_x = {z | z is not <= x} is open. D is a T_0 space (if x and y are distinct, then WLOG x is not <= y and U_y separates x from y), but is not in general T_1 (if x < y, then every open set containing x contains y). A function from D to D' is continuous iff for every directed X in D, f(X) has a sup in D' and f(sup(X)) = sup(f(X)). [Note that we do not need bottom to map to bottom.] All such continuous functions are monotone. This gives the category CPO of cpo's with continuous functions.

Given two cpo's we can form the product cpo using the componentwise ordering (with bottom being the pair of bottoms and sup's given by the pair of sup's of projections of directed sets). This yields a Scott topology on the product cpo. This is not, in general, the product topology (the Scott topology is finer than the product topology). The Scott topology does give the product in CPO.

We also have exponents in the category CPO formed from cpo's D and D' as follows. Let [D->D'] by the set of continuous functions from D to D' partially ordered by f <= g iff forall x in D f(x) <= g(x). Given any directed family of maps f_i, we have f_i(x) directed in D' for each x in D, giving sup(f_i(x)) for each x in D. We can define f(x) = sup(f_i(x)). This f is continuous and is the sup of the directed family {f_i}. The bottom is given by the constant bottom function. A function f:D x D' -> D'' is continuous iff all the functions D->D'' and D'->D'' given by fixing one of the arguments are continuous. This result can be used to show application:[D->D'] x D -> D' [evaluation] is continuous by showing it is continuous in both arguments. Also, abstraction is continuous. That is, given f:D x D' -> D'' continuous, we have the transpose f^:D->[D'->D''] (given by Currying) is continuous; furthermore, the map taking f in the cpo [D x D' -> D''] to f^ in the cpo [D -> [D' -> D'']] is continuous.

CPO is a CCC. Products and exponents are given above. Terminal objects are given by singleton cpo's.

Every f in [D->D'] has a least fixed point. Furthermore, there is a function Fix in [[D->D]->D] taking f to its least fixed point. This least fixed point is given by the sup of {f^n(bottom) : n natural}.

In CPO we can form the limit of a diagram {f_i:D_{i+1}->D_i : i natural} giving D_infinity. We can explicitly define D_infinity to be the set of sequences <x_i>_i with ith component from D_i and f_i(x_{i+1}) = x_i. The ordering is given componentwise. The bottom is given by starting with the sequence {bottom_i}_i (which need not be in D_infinity, but is in Pi_i D_i) and applying the {f_i}_i sequence to the [reindexed] sequence omega times. The sup of a directed set is given componentwise. In categorical terms, we can construct D_infinity as the equalizer of 1=<pi_i>_i and <f_i o pi_{i+1}>_i:Pi_i D_i -> Pi_i D_i.

Given a cpo D, a retraction f of D onto X subset of D, is a member of [D->D] with image(f) = X and f o f = f. When such an f exists, we say X is a retract of D. When X is a retract of D, X is a cpo (under the same ordering as D, with bottom_X = f(bottom_D) and the same sup's of directed sets as D). The Scott topology on X is the same as the subspace topology (since the ordering and the sup's are the same).

An element x of D is compact if every directed X with x <= sup(X) has x <= x_0 for some x_0 in X. D is an algebraic cpo if for every x in D, x = sup({y <= x | y compact}). For example, the compact elements of (P-omega,subset) are the finite sets. P-omega is algebraic. In fact, this is true of any power set cpo. Also, the compact elements of the cpo of partial functions from X to Y are the finite partial functions (i.e., those with finite graphs); this cpo is also algebraic. Given an algebraic cpo D and function f:D->D, f is continuous iff f(x) = sup({f(e) : e <= x, e compact}). In terms of the power set cpo P(X), the continuous functions f:P(X)->P(X) are determined by the value of f on finite sets. [Note that we need not have f({0,1}) = f({0}) union f({1}) since {{0},{1}} is not directed. {{0},{1},{0,1}} is directed, but in this case f continuous only implies the trivial equation f({0,1}) = f({0}) union f({1}) union f({0,1}). This is trivial since we know f({0}) is a subset of f({0,1}) and f({1}) is a subset of f({0,1}). In general, all we have is Union_{x in X} f({x}) is a subset of f(X) for finite X. So, it is not enough to know f on singletons.]

In an algebraic cpo D, let O_e = {x in D | e <= x}. O_e is open iff e is compact. Furthermore, {O_e | e compact} is a basis for the topology on D. First, if O_e is open and e <= sup(X) with X directed, then openness implies there is an x_0 in X with x_0 in O_e, i.e., e <= x_0. Next, given e compact, O_e is obviously closed upwards. If sup(X) is in O_e with X directed, then e <= x_0 for some x_0 in X by compactness of e. So, x_0 is in X and O_e, so that O_e is open. Finally, given any open set O and x in O, we have x = sup{e compact | e <= x} implies there is a compact e <= x, so that x is in O_e subset of O.

Given cpo's D and D', <x,y> in DxD' is compact iff x and y are compact. If D and D' are algebraic, then DxD' is algebraic. In this case, the Scott topology on the cpo product is the same as the product topology (since O_(e,f) = O_e x O_f). In fact, the function spaces of algebraic cpo's are algebraic, and we have a CCC of algebraic cpo's with continuous functions.

A generalization of directed set is that of a consistent set. A subset X of a cpo D is consistent if for every x and y in X there is a z in D [not X!] so that x <= z and y <= z. D is coherent if every consistent set has a supremum. For example, power sets and the cpo of partial functions are coherent because we can take the union of finite sets (in the case of partial functions, the finite sets must be compatible graphs in order to be consistent).

Given x and y in a cpo D, we say x<<y ("x is way below y") if y is in the interior of O_x. This is equivalent to x<=y iff x is compact. A continuous lattice is a complete lattice in which x = sup({z | z<<x}). An algebraic lattice is a complete lattice that is algebraic as a cpo. Every algebraic lattice is a continuous lattice (since x = sup({e compact | e<<x})).

**Chapter 2: Conversion.** The set of (type-free) lambda terms is
the smallest set containing the variables and closed under
application and abstraction. The theory Lambda has formulas M = N
for lambda terms M and N. The theory is generated by beta,
reflexivity, symmetry, transitivity and replacement within
application and abstraction. (Rule zeta is M = N implies lambda x M
= lambda x N.) For every lambda term F there is a fixed point X so
that Lambda proves F X = X. X is WW where W is lambda x (F(xx)). In
fact, abstracting over F we have a fixed point combinator Y (so that
YF = F(YF)).

In a lambda term MN, the occurrence of the subterm M is active and the occurrence of the subterm N is passive.

In order to avoid variable capture, we work with lambda terms modulo alpha conversion, and assume the variable convention that all bound variables are different from all free variables in a given context.

Substitution is defined recursively as usual. Convertibility is preserved under substitution. He proves this in three steps, |- M[x:=N] = M'[x:=N] by induction on |- M = M' or by rules zeta and beta, |- M[x:=N] = M[x:=N'] by induction on M, and |- M[x:=N] = M'[x:=N'] by combining the two above using transitivity. An alternative, shorter proof uses beta.

A context C[ ] is a term with "holes". Given a lambda term M, C[M] is the lambda term obtained by plugging M into the holes of C[ ]. Here we do not work modulo alpha-conversion. For example, we may have C[ ] given by lambda x [ ]. Then, C[x] is lambda x x. If Lambda proves N = N', then Lambda proves C[N] = C[N'] (proven by induction on C[ ]). This is more powerful than preservation of convertibility under substitution.

Combinatory completeness is the result that for every M(x1,...,xn) there is an F so that Lambda proves Fx1...xn = M. In fact, F is simply lambda x1 . . . xn M. The combinators I, S and K can be defined as lambda terms which satisfy the appropriate reduction properties.

We can axiomatize extensionality as "Mx = Nx implies M = N if x is
not free in M and N" or as eta conversion "lambda x Mx = M if x is
not free in M." The resulting theories are equivalent (**Curry's
Theorem**). Lambda with the eta conversion is the theory Lambda-eta.

A theory is consistent if we cannot prove every equation. (Lambda and Lambda-eta are both consistent.) Two lambda terms are incompatible (M#N) if the theory Lambda + M=N is inconsistent. K#S since K=S implies XZ = XZ(YZ) implies (letting X = Z = I and Y = KM) I = M for every M.

A beta-nf is a lambda term with no beta-redexes (lambda x M)N. I
is a beta-nf. KI has a beta-nf. Omega := (lambda x . xx)(lambda x .
xx) has no beta-nf. A beta-eta-nf is a beta-nf with no eta-redexes
(lambda x . R x) where x is not free in R. A lambda term has a
beta-nf iff it has a beta-eta-nf. Since beta-nf's are unique in
Lambda and beta-eta-nf's are unique in Lambda-eta, we have Lambda
does not prove M = N if M and N are distinct beta-nf's and Lambda-eta
does not prove M = N if M and N are distinct beta-eta-nf's. Since S
and K are distinct beta-eta-nf's, Lambda and Lambda-eta are
consistent. In fact, **Bohm's [1968] Theorem** states that if M and N
are different beta-eta-nf's, then M#N. It follows that for terms
with beta-eta-nf's, the theory **Lambda-eta is Hilbert-Post complete**:
If M and N are terms with beta-eta-nf's, then Lambda-eta proves M=N
or Lambda-eta + M=N is inconsistent. (Of course, in the absence
of eta, I and 1 are distinct nf's and can of course be consistently
identified.)

An alternative approach to lambda calculus is combinatory logic. Here we introduce combinators (such as S and K) as primitive and axiomatize their behavior. Every closed lambda term can be mapped to a combinator with the same applicative behavior. The theory Lambda is not equivalent to the theory CL of combinatory logic, since in Lambda we can prove |-SKK = I = SKS, but in CL we cannot prove |- SKK = SKS. However, Curry constructed a finite set of equations A_beta so that Lambda is equivalent to CL + A_beta and a finite set of equations A_beta,eta so that Lambda-eta is equivalent to CL + A_beta,eta [recall: eta is equivalent to I=1].

Church considered the lambda-I calculus, in which abstraction is restricted to those cases in which the variable being bound actually occurs. Church was interested in the lambda-I calculus for several reasons. First, the lambda-I calculus is sufficient to define all recursive functions (since K_1 := lambda x,y . yIIx satisfies K_1xc_n = x for each of Church's numerals c_n := lambda f,z . f^n z--it is also the case that for each finite set n of nf's, we can find a "local K for n" K_n such that K_nMN = M for each N in n). Second, every subterm of a lambda-I term with a nf have a nf (and Church only considered lambda terms with nf's as significant). Third, it is consistent with the lambda-I calculus to identify all terms without nf's, but inconsistent with Lambda to identify all such terms. For example, lambda x . xKOmega and lambda x . xSOmega do not have nf's. If we identify them, then we obtain K = S (a contradiction) by applying the terms to K.

We can define lambda terms for the numerals '0', '1', etc., in several ways. Relative to such a system we say f:N^k->N is lambda-definable if there is a closed lambda term F such that for every n1,...,nk in N we have F'n1'...'nk' = 'f(n1,...,nk)'. Kleene [1936] showed f is definable iff it is recursive. To represent partial functions, we must define how F should behave when n1,...,nk are not in the domain of f. Church's proposal is that F'n1'...'nk' should not have a nf. One problem with this proposal is that the property of having a nf is not preserved under the translations between Lambda and CL. Also, the property is too syntactical and does not make sense for elements of a model. In Scott's D-infinity model, there is no subset of the model which corresponds precisely to the denotations of terms with nf's. Another problem is that composition does not behave as expected:

The constant function K'0' composed with the function represented by G which is undefined at 'n' gives lambda x (K'0'(Gx)). Applying this to 'n' gives '0', not undefined.

A different approach is to identify "undefined" with
unsolvability. A *closed* lambda term M is
solvable if there is an n
and there are lambda terms N_1,...,N_n so that MN_1...N_n = I. (Note
this is equivalent to saying M, considered as a function of variable
arity, maps onto all lambda terms.) It terms out that solvability is
equivalent to the property of having a
head normal form (lambda
x1,...,xn (xi N_1 . . . N_m)). Note that Omega is not solvable and
has no head normal form (so we may think of Omega as the canonical
representative of the unsolvable terms). lambda x (IxOmega) is
solvable and has head normal form lambda x (xOmega). (Every nf is a
head normal form, but the converse does not hold.)

We can relativize solvability to the lambda-I calculus in the obvious way. Here solvability is equivalent to having a nf.

Solvability is preserved under the translations between Lambda and CL. Solvability is a property which makes sense in models (in D-infinity the only unsolvable term is bottom; in P-omega the only unsolvable term is the empty set). It follows from the models D-infinity and P-omega that it is consistent to identify unsolvable lambda terms. Also, it turns out that if M is unsolvable and N is a nf and C[M] = N, then C[X] = N for every X.

We can define closed lambda terms (combinators) representing truth (K), falsehood (:= lam xy.y =beta KI), pairing ([M,N] := lambda z (zMN)), and numerals ('0' := I, '(n+1)' := [F,_n]). This is different from Church numerals. The intuition is that n is a list of n F's, with I ('0') playing the role of the empty list.

Other fixed point theorems include the multiple fixed point theorem (which proves the existence of a vector of combinators which map vectors of length n to their components) and the second fixed point theorem giving F with F'X' = X forall X (where 'X' is the numeral of the Godel number of X).

If A and B are disjoint, nonempty sets of lambda terms which are closed under equality, then A and B are recursively inseparable. It follows that if A is a nontrivial set of lambda terms closed under equality, then A is not recursive. So, we cannot decide the problem "M=x?" for any particular M. Also, it follows that Lambda has no recursive models.

All the combinators can be defined in terms of S and K. In fact, S and K can be defined in terms of a single combinator X, where X is the lambda term lambda x (xKSK). Then XXX reduces to K and X(XX) reduces to S.

Using the convention that undefined is represented by unsolvable terms, we have Kleene's Theorem: A partial function is definable iff it is partial recursive.

The lambda-I calculus is conservative over Lambda. The partial recursive functions can also be defined in the lambda-I calculus. The lambda-I calculus corresponds to the combinatory logic with primitive combinators I, B, C, and S.

For each lambda term M, we may construct the Bohm tree BT(M) of M. This tree may be infinite. The tree is constructed as follows: If M is unsolvable, then we are at the leaf node labelled "bottom." If M is solvable, then it has a principal head normal form (lambda x1,...,xn (y M_1...M_m). Then BT(M) is a tree labelled "lambda x1,...,xn y" with children given by BT(M_i) for i=1,...,m. The set of all possible trees (including, but not limited to, Bohm trees) can be partially ordered by A <= B if A results from B by deleting some subtrees. Under this ordering, the set of trees is an algebraic cpo. The Scott topology on this algebraic cpo can be used to induce the tree topology on lambda terms by considering the smallest topology so that BT:Lambda->Trees is continuous.

A finite sequence M_1,...,M_n of closed terms is separable if for every N_1,...,N_n there is an F so that FM_i = N_i for each i=1,...,n. For Lambda, we have M_1,...,M_n is separable if the terms have distinct beta-eta-nf's. We can relativize the notion of separability to the lambda-I calculus. In the lambda-I calculus, M_1,...,M_n is separable iff the terms have distinct beta-eta-nf's.

**Chapter 3: Reduction.** A relation R on lambda terms is compatible
if R(M,M') implies R(ZM,ZM') and R(MZ,M'Z) and R(lambda x M,lambda x
M'). An equality or congruence relation is a compatible equivalence
relation. A reduction relation is a compatible, reflexive and
transitive relation. A relation is compatible iff R(M,M') implies
R(C[M],C[M']) for every context C[ ] with one hole.

A notion of reduction is any binary relation on lambda terms. Given two notions of reduction, we can construct a new notion of reduction by taking the union. An example of a notion of reduction is beta-reduction. Any notion of reduction R can be extended to be a compatible relation ->R which can then be extended reflexively and transitively giving the reduction relation ->>R which can be extended by transitivity giving the equality relation =R.

An R-redex is an M where R(M,N). In this case, N is an R-contractum. An R-normal form (R-nf) M is a term with no R-redex occurrences. N is an R-nf of M if N is an R-nf and M =R N. M ->R N iff there is a context C[ ] with one hole and terms P and Q with M == C[P], N == C[Q], and R(P,Q).

A binary relation >- on the lambda terms satisfies the diamond property if for all terms M, M1, M2 for which M>-M1 and M>-M2 we have a term M3 such that M1>-M3 and M2>-M3. A notion of reduction R is Church-Rosser (CR) if ->>R satisfies the diamond property. If R is CR and M =R N, then there is a Z such that M ->>R Z and M ->>R Z. This is proven by induction on the definition of =R. That is, M =R N follows from either M ->>R N (in which case we let Z be N), or N =R M (in which case Z exists by the induction hypothesis), or M =R L =R N (in which case the induction hypothesis gives Z1 for M and L and gives Z2 for L and N and the diamond property gives Z for Z1 and Z2). It follows from this result that for Church-Rosser R, R-nf's are unique and that if M has R-nf N, then M ->>R N.

A binary relation R on lambda terms is substitutive if R(M,N) implies R(M[x:=L],N[x:=L]). If R is substitutive, then so are ->R, ->>R, and =R. Beta is substitutive (assuming the variable convention). Eta is also substitutive.

Given an R-reduction, we can measure the length (by the number of ->R steps). Given a lambda term, we can build the R(reduction) graph (a directed multigraph) of the term G_R(M) by taking vertices to be all N so that M ->>R N and taking particular ->R steps to be edges. [Actually this forms a category if we take objects to be N such that M ->>R N and arrows to be particular R-reductions.] A term M R-strongly normalizes if there is no infinite R-reduction from M, otherwise M is R-infinite. R is strongly normalizing (SN) if every term M R-strongly normalizes.

Consider beta-reduction. Let omega_3 be lambda x (xxx) and M be (lambda x I)(omega_3omega_3). M has an infinite beta-reduction graph, but has the nf I. The beta-reduction graph of Omega is finite (one vertex, one edge [note that the category is infinite because we count reduction paths--or, we could reasonably equate paths so that the category becomes the monoid (Z/2,*)--or, maybe even just the trivial monoid (finite!)]) but Omega has no nf. We can show that if M beta-strongly normalizes, then the beta-reduction graph of M is finite and M has a beta-nf. The converse is not true. Consider the term M given by (lambda x I)Omega. This has beta-nf I and finite beta-graph (with two vertices and two edges [the category is infinite]) and has the infinite reduction path in which we keep reducing Omega to Omega.

A binary relation >- satisfies the weak diamond property if whenever x>-y and x>-z then there is a u such that y >=* u and z >=* u (where >=* is the reflexive, transitive closure of >-). A notion of reduction R is weakly Church-Rosser (WCR) if ->R satisfies the weak diamond property.

WCR does not imply CR. For examples, look here.

However, if a reduction is SN and WCR, then it is CR. SN implies all terms have an R-normal form. Assuming WCR, we can show R-nf's are unique by induction--choose an "ambiguous" M which reduces to two R-nf's by reductions which do not involve ambiguous lambda terms (so M is "minimal" in some sense); then use weak Church-Rosser to show that we must have M->M' with M' ambiguous along one of the reductions from M. (Technically, the induction is over bd(M) where bd(M) is the longest reduction path from M. Such an induction only proves properties for strongly normalizing terms, of course.)

A * closed*
term M is R-solvable if there are terms P1,...,Pm so that
MP1...Pm =R I. (Note that this does not imply we can obtain any term
from M by applying to appropriate arguments as it did when
considering beta-reduction.) A term M is R-solvable if some closed
instance of M is R-solvable.

The Church-Rosser Theorem states that if two terms are beta-convertible to one another, then there is a common term to which they both reduce. By above, this reduces to showing that beta-reduction is CR. The proof given in this chapter uses Parallel Reductions and is attributed by Barendregt to Tait and Martin-Lof (though according to Statman, Rosser knew the proof). Other proofs appear later in the book. First, if >- satisfies the diamond property, then so does its transitive closure >-* (proven by induction on the lengths of the reduction paths; the induction hypothesis must include the fact that the new paths have the same lengths as the old--or we might prove a strip lemma first). Define a relation ->>1 on lambda terms given as follows. The relation ->>1 is reflexive, includes one-step beta reduction, and commutes with abstraction and application. The relation ->>1 represents a parallel reduction (note that it allows beta-reductions only on preexisting beta-redexes). Now, the transitive closure of ->>1 is ->>beta since ->beta+reflexivity is a subrelation of ->>1. So, showing beta-reduction is CR reduces to showing ->>1 satisfies the diamond property. The definition of ->>1 is well-suited to showing the diamond property by induction on the definition of M ->>1 M1.

In natural deduction style, we could represent parallel reduction as (see Pfenning, 1992):

Reflexivity: |- M ->>1 M

Beta: M ->>1 M', N ->>1 N' |- (lam x.M)N ->>1 M'[x:=N']

App: M ->>1 M', N ->>1 N' |- MN ->>1 M'N'

Lam: M ->>1 M' |- (lam x.M) ->>1 (lam x.M')

We can prove that ->>1 satisfies the diamond property by induction on M ->>1 M', but we must take care! There is also the parallel reduction M ->>1 M'' whose form we only know by inversion (using the form of M--see Lemma 3.2.5). So, there are subcases (of App and Beta cases). In the Beta case, and one subcase of the App case, we end up needing the admissible inference rule:

M ->>1 M', N ->>1 N' |- M[x:=N] ->>1 M'[x:=N']

This is precisely Lemma 3.2.4.

Once we know beta-reduction is CR, we have the uniqueness of beta-nf's and that if M has beta-nf N, then M ->>beta N. It also follows that if M and N are distinct beta-nf's, then M is not beta-convertible to N, so that Lambda is consistent (since we have distinct beta-nf's). We can also use the fact that beta-reduction is CR to show Omega has no beta-nf, since it obviously cannot reduce to one (it only reduces to itself).

In order to show beta-eta reduction is also CR, we show that beta- and eta-reduction commute and that eta-reduction is CR. Two relations >-1 and >-2 commute if whenever x >-1 y and x >-2 z, then there is a u with y >-2 u and z >-1 u. (>- has the diamond property iff it commutes with itself.) If two relations commute and both satisfy the diamond property, then the transitive closure of the union of the relations satisfies the diamond property. This implies that if two notions of reduction R1 and R2 are CR and ->>R1 commutes with ->>R2, then the notion of reduction R1 union R2 is CR. Eta-reduction is CR (since ->eta+reflexivity satisfies the diamond property--as is shown easily by induction on the definition of ->eta) and does commute with beta-reduction (which we show by showing a special kind of one-step commutativity hypothesized in Lemma 3.3.6). So, beta-eta-reduction is CR. The usual results follow: if M is beta-eta-convertible to N, then there is a common term to which they beta-eta-reduce; if M has beta-eta-nf N, then M ->>beta-eta N; beta-eta-nf's are unique; Lambda+eta is consistent, so Lambda+ext is consistent.

It struck me as curious that we need to take the reflexive closure of one-step eta reduction to get the diamond property. The problem is that the two eta contractions in the diamond property might be the same! Consider this example: lambda x (yx) one-step eta reduces to "both" y and y (the same way), so for the diamond property to hold we would need a term to which "both" y and y one-step eta reduce. Of course, there is no such term since y is in eta-normal form!

Note that the "simple diagram chase" proof of Lemma 3.3.6 requires lexicographic induction on the lengths of the reductions. Or, better yet, as in most of these "simple diagram chase" argument we can first prove a "Strip Lemma" and then prove the result (so we separate the argument into two inductions). Lemma 3.3.6 is special in that one formulation of the strip lemma follows by induction, but the other does not.

Note that if M ->>beta-eta N and x is free in N, then x must be free in M.

Given a term and a set of redexes, a development is a reduction of the term in which only the redexes (and their residuals) may be contracted. All (beta-)developments are finite [this is the finite development (FD) theorem]. All maximal (beta-)developments end in the same term. This can be used to give another proof of CR.

An I-redex is a term (lambda x M)N where x is free in M. Whenever M ->beta N is the contraction of an I-redex and M has an infinite reduction, then N has an infinite reduction. So, if a lambda-I term has a nf, then so do all its subterms.

A standard reduction proceeds from left to right. Whenever M ->>beta N, there is a standard beta-reduction from M to N.

Given two beta-reductions sigma:M->>N1 and tau:M->>N2, we can define canonical reductions (tau/sigma):N1->>N3 and (sigma/tau):N2->>N3. Two reductions sigma,tau:M->>N are strongly equivalent if (tau/sigma) = (sigma/tau) = the empty reduction from N to N. All maximal developments are, in fact, strongly equivalent. Also, given any beta-reduction M->>N there is a strongly equivalent standard reduction.

A reduction strategy is a map F from lambda terms to lambda terms so that M ->> F(M). A one step strategy takes every M not in nf to an F(M) satisfying M -> F(M). A strategy is recursive if the function is recursive (after coding) and effective if it can be computed in a "relatively simple way". A strategy F is normalizing if whenever M has a nf, then there is some n so that F^n(M) is a nf. There is an effective normalizing one step strategy for beta-reduction (the leftmost strategy).

A strategy F is cofinal if whenever M->>N there is an n with N->>F^n(M). There is an effective cofinal strategy for beta-reduction (the Gross-Knuth (gk) strategy).

A strategy is Church-Rosser if whenever M=N there is an n and m with F^n(M) == F^m(N). There is an recursive CR-strategy for beta-reduction.

A strategy F is perpetual if for every M with an infinite reduction we have F^n(M) ->> F^{n+1}(M) is a nonempty reduction (so that M->>F(M)->>F^2(M)->>... is an infinite reduction). There is an effective perpetual strategy for beta-reduction.

Given two normalizing strategies F and G, G is t-better than F if for every M the least n so that F^n(M) is normal is larger than the least n so that G^n(M) is normal. A one step strategy F is t-optimal if F is normalizing and no one step strategy is t-better than F. There is no recursive t-optimal one step strategy.

We can label lambda terms with natural numbers and throw in a constant "bottom" and define labelled reduction to be beta-reduction which decreases labels with the exception that (lambda x M)^0 N reduces to (M[x:=bottom])^0. All labelled reductions terminate and we can prove CR-, FD- and standardization theorems.

Recall that the topology on Bohm trees induces the tree topology on lambda terms. We have the following Continuity Theorem: if a function f from lambda terms to lambda terms is given by f(M) = C[M] for a context C[ ], then f is continuous with respect to this topology.

Lambda-calculus computations are "essentially sequential."

A lambda term has a beta-eta-nf iff it has a beta-nf. A lambda term is beta-eta-solvable iff it is beta-solvable. Whenever M ->>beta-eta N, there is a reduction M ->>beta L ->>eta N (we may postpone the eta-reduction).

Omega-reduction is given by the contraction rule M -> Omega whenever M is unsolvable and not already Omega. Beta-eta-Omega reduction is CR, and we may perform the reductions in the order beta, Omega, then eta.

Given any function phi on lambda terms, we may include a new constant delta with the reduction rule delta M -> phi(M). (So, we force phi to be definable.) For example, we may include a constant delta_C for equality which reduces delta_C M M to T [i.e., K] if M is a closed beta-delta_C-nf and reduces delta_C M N to F [=beta KI] if M an N are closed, distinct beta-delta_C-nf's. Beta-delta_C is CR, but some delta-reductions defined in this way are not CR. For example, we may include two constants delta and epsilon with the reduction rule delta M M -> epsilon. In this case, Beta-delta-reduction is not CR [Klop 1980]. This can be used to show that lambda calculus extended with surjective pairing is not CR.

**Chapter 4: Theories.** A lambda theory is a consistent extension
of Lambda closed under derivations. Without loss of generality, we
may restrict ourselves to closed equations (since M = N iff lambda x
M = lambda x N). So, a lambda-theory is a consistent set of closed
equations closed under derivations. A set of closed equations is
consistent iff we cannot prove T = F [i.e., K = KI]. For the
lambda-I calculus, consistency of a set of closed equations is
equivalent to the inability to prove I = S. For any lambda-theory T,
T+eta = T + (I=1) where 1 is the Church numeral lambda x,y . xy.

Let H be the theory in which we equate all unsolvable terms. H and H-eta can be proven consistent by proving the associated reductions are CR. In general, we do not have Con(T) implies Con(T+eta) (where Con(T) denotes "the set T of closed equations is consistent"). We do have the implication for semi-sensible theories.

A theory is r.e. if it is recursively enumerable (after coding). A theory is sensible if it contains H (so that all unsolvable terms are identified). A theory is semi-sensible (s.s.) if it does not equate any solvable term with any unsolvable term. Lambda and Lambda-eta are r.e. and s.s. If T is sensible, then T is s.s. This follows since if T were not s.s., then we could equate some solvable term with an unsolvable term. Then we could apply the solvable term to the appropriate arguments to obtain (using T sensible to change the unsolvable term) I = Omega_3 or I = K-infinity (where K-infinity is a fixed point of K). I = Omega_3 gives a contradiction for the lambda-I calculus. I = K-infinity gives a contradiction for the ordinary lambda (lambda-K) calculus.

The omega-rule says that if MZ = NZ for all closed terms Z, then M = N. The term rule (tr) says that if MZ = NZ for all closed terms Z, then Mx = Nx (for arbitrary x). A lambda-theory T is closed under the omega-rule (T|-omega) if whenever T|-MZ = NZ for all closed terms Z we have T|-M=N. We can define T|-tr and T|-ext similarly. A lambda-theory T is extensional if T|-ext. T|-omega iff T|-tr and T|-ext. T|-ext iff T|-I=1 iff T=T+eta.

A lambda-theory T and rule R can be combined to give T+R := {M=N | M,N closed lambda terms so that Lambda+R+T |- M=N}. This may not be a lambda-theory because it may not be consistent. For example, consider the theory T given by beta-reduction and the Omega_1 reduction rules Omega 1 -> T and Omega I -> F. T+Omega_1 is consistent (since the associated reduction is CR), but T+Omega_1+eta is inconsistent (we can prove T = Omega 1 = Omega I = F).

Any rule R can be restricted to closed terms, giving the rule R^0. T|-tr^0 iff T|- tr. Of course, T|-ext implies T|-ext^0, but the converse fails.

A combinatory algebra is a structure M = <X,.,k,s> with Card(X) > 1 and kxy = x and sxyz = xz(yz) for all x,y,z in X. M is extensional if a=b whenever ax=bx forall x in X. Given a lambda-theory T, we can form the (open) term model of all terms modulo provable equality in the theory and the closed term model of all closed terms modulo provable equality in the theory. S and K are the equivalence class of the corresponding closed lambda terms. These term models do form combinatory algebras. T is extensional iff the term model is extensional. T|-omega iff the closed term model is extensional. T may be extensional without the closed term model being extensional (since there may not be enough elements of the model to distinguish between different elements). For example, Lambda+eta|-ext but not omega, so that the closed term model of Lambda+eta does not satisfy extensionality.

Given an open term model, we can define a canonical map from the lambda terms onto the model by taking the equivalence class of the term. Similarly, if T1 is a subset of T2, where T1 and T2 are lambda-theories, then we have canonical maps from the term model of T1 onto the term model of T2 and from the closed term model of T1 onto the closed term model of T2. Such maps can be used to transfer topologies from the lambda terms to models. Every lambda-theory is a nondegenerate congruence relation on the term model of Lambda.

Combinatory algebras (if we include the one-point degenerate case) form an algebraic variety which is not a Mal'cev variety.

An equational theory T is Hilbert Post (HP)-complete if every equation in the language is either provable from T or inconsistent with T. In first order logic, an HP-complete theory corresponds to maximal consistency (such as any complete theory of a given model). In the present context, the theory of a combinatory algebra may not be HP-complete. Zorn's lemma implies any lambda-theory may be extended to an HP-complete theory. H can, in fact, be uniquely extended to an HP-complete theory H* (in the lambda-I calculus, H* is H+eta).

We can construct an infinite set of combinators independent over H and use these to prove the existence of 2^{aleph_0} many sensible theories.

We can associate lambda terms with their Bohm trees and obtain the lambda-theory B which equates terms with the same Bohm tree.

Given a set of lambda terms P closed under equality, we can form the lambda-theory T_P = {M = N | M and N are closed terms with C[M] in P iff C[N] in P for every context C[ ]}. We can show theories consistent (such as B+eta) by showing the theory is a subset of some T_P.

If T is a s.s. lambda-theory, then so is T+eta. If T and T' are s.s. lambda-theories, then T+T' is also. T is s.s. iff the term model for T with the tree topology is not indiscrete. For every r.e. theory T there is a term Omega* which can consistently be equated to any term. The range of all definable functions (in an r.e. theory) is a singleton or infinite. This can be proven by defining a topology on the term models of r.e. theories.

H*|-omega, so that Lambda+omega is consistent. If Lambda+omega|-M=I, then Lambda+eta|-M=I.

The omega rule holds for M and N in Lambda-eta if M or N is not a "beta-eta-universal generator" (an R-universal generator is a term M so that for every N there is an M ->>R M' with N a subterm of M'). The omega-rule holds for M and N in H+eta if M or N has a beta-Omega-nf. Neither the omega-rule nor the term rule holds in general in Lambda+eta or H+eta.

In H+omega there is a term Pi which represents universal quantification over N: H+omega |- Fn = F'n for every natural n iff H+omega |- Pi F = Pi F'.

**Chapter 5: Models.** In order for a structure to be a model of
lambda calculus, objects must act both as functions and arguments.
So, we need a D so that D^D is a retract of D (i.e., there is a map
pi:D->>D^D with pi o pi = pi). This is impossible if D^D is
the set of all functions (by cardinality). Scott [1969] overcame
this difficulty by considering continuous functions with an
appropriate topology. Scott originally worked with the category of
complete lattices with continuous maps. For continuous lattices
(where each x can be approached from "way below"), the topology of
the product is the product topology (which does not hold for complete
lattices). Complete partial orders also induce a topology which can
be used to construct lambda-models (such as Plotkin's [1978] model
T-omega). Ershov [1978] defined the category of f_0-spaces.

There are two kinds of models of lambda-calculus: lambda-algebras and lambda-models. Lambda-algebras satisfy all provable equations. Lambda-models are lambda-algebras that also satisfy weak extensionality (if M = N for all x, then lambda x M = lambda x N). Lambda-models are not closed under substructures or homomorphic images, and so cannot described by equations. Clearly they can be described by first-order axioms (such as the weak extensionality axiom given above).

The classes of lambda-algebras and lambda-models can be described syntactically and categorically. The categorical description is as a CCC C with a reflexive object U so that U^U is a retract of U (i.e., there are arrows F:U->>U^U and G:U^U>->U with FG = 1). Given this description, Koymans [1982] showed that a lambda-model is a lambda-algebra with "enough points." Also, since typed lambda calculus corresponds to CCC's, we may consider the untyped lambda calculus as a "special case" of the typed lambda calculus (giving the typed theory priority). Barendregt does not adopt this point of view. One reason is that the untyped lambda calculus can represent all recursive functions while the typed lambda calculus does not.

A CCC C can be embedded in the topos Set^C^op (by Yoneda). In the topos, we can use Kripke-Joyal semantics to interpret the model given by a reflexive U in C. Here we have the full function space U^U so that weak extensionality is satisfied. However, in a topos we have intuitionistic logic.

An applicative structure M is a set X with a binary operation on X. M is extensional if ax=bx for all x implies a=b. Let T(M) be the terms over M given by variables, constants for the elements of M, and application. A valuation in M is a map from variables into M. This allows us to interpret terms A of T(M) in M and define M|=A=B.

An applicative structure is combinatory complete if for every term A with FV(A) <= {x1,...,xn} there is an f in M with fx1...xn = A. The f must be unique if the structure is extensional. A function phi:X^n->X is representable if there is an f in M so that fa1...an = phi(a1,...,an) for every a1,...,an in M; phi is algebraic if there is a term A with FV(A) <= {x1,...,xn} so that phi(a1,...,an) is the interpretation of A[xi:=ai]. Combinatory completeness says all algebraic functions are representable.

A combinatory algebra is an applicative structure with distinguished elements k and s with kxy = x and sxyz = xz(yz) for every x,y,z. We can extend T(M) for combinatory algebras by including constants K and S corresponding to k and s. Also, we can define I=SKK. We can define abstraction operations lambda*x:T(M)->T(M) inductively by lambda*x(x) = I, lambda*x(P) = KP if x is not free in P, and lambda*x(PQ) = S(lambda*x(P))(lambda*x(Q)) otherwise. In every combinatory algebra we have lambda*x(A)x = A. An applicative structure is combinatory complete iff it can be extended to be a combinatory algebra by choosing k and s (where we can use lambda*x to prove every combinatory algebra is combinatory complete).

Homomorphisms between combinatory algebras respect application and the elements k and s. This gives us a category of combinatory algebras and allows us to define "embeddable," "substructure," and "isomorphic." Homomorphisms phi:M1->M2 respect interpretations of terms P of T(M1). If phi is surjective, then M1|=P=Q implies M2|=phi(P)=phi(Q) for all P and Q in T(M1).

Let C be the set of terms of combinatory logic (generated by S, K, variables and application only). Let C0 be the set of combinators (generated by S, K and application only). Given a combinatory algebra M, let Th(M) = {P=Q | M|=P=Q where P and Q are in C0}. Given a homomorphism phi:M1->M2 and P and Q in C0, M1|=P=Q implies M2|=P=Q. It follows that, if there is a phi:M1->M2, then Th(M1) <= Th(M2). In fact, if there is an injective phi:M1->M2, then Th(M1) = Th(M2). [Think: Restricting phi to interiors phi0, every phi0 is surjective and if phi is injective, then phi0 is bijective.]

Combinatory algebras (except the trivial one) are never commutative, never associative, never finite, and never recursive.

Given a combinatory algebra M, let Lambda(M) be the set of lambda-terms with constants for members of M. We can interpret lambda-terms of Lambda(M) in M by defining maps CL:Lambda(M)->T(M) and Lam:T(M)->Lambda(M). These maps preserve variables, constants (corresponding to elements of the model) and application. To complete the definition, let CL(lambda x A) := lambda*x(CL(A)), Lam(K) := lambda xy.x, and Lambda(S) := lambda xyz.xz(yz). Now, we can intepret A of Lambda(M) in M by interpreting CL(M) in M. For A and B of Lambda(M), we say M|=A=B if M|=CL(A)=CL(B). Note that Lambda proves Lam(CL(A))=A for every lambda-term A.

In general, equations of the lambda-calculus may not hold in a combinatory algebra. For example, CL(lambda z (lambda x x)z) == S(KI)I while CL(lambda z z) == I. While the lambda-terms are equal (by one beta reduction), S(KI)I may not equal I in a combinatory algebra. A lambda-algebra M is a combinatory-algebra if for all A and B in T(M) with Lambda|-Lam(A)=Lam(B), we have M|=A=B. A lambda-algebra homomorphism is a combinatory algebra homomorphism [so the category of lambda-algebras is a full subcategory of the category of combinatory algebras].

A combinatory algebra M is a lambda-algebra iff we have M|=A=B whenever A and B are from Lambda(M) with Lambda|-A=B and we have M|=CL(Lam(K))=K and M|=CL(Lam(S))=S. (To prove this, use Lam(CL(A)) =Lambda A, and that M|=CL(Lam(K))=K and M|=CL(Lam(S))=S is enough to imply M|=CL(Lam(A))=A.)

Homomorphisms between combinatory algebras preserve the interpretations of lambda terms (since they preserve interpretations of combinatory terms). If there is a phi:M1->M2, then Th(M1) <= Th(M2), so that if M1 is a lambda-algebra, then so is M2.

Curry's combinatory axioms A_beta axiomatizes the class of lambda-algebras. This follows from the fact that Lambda corresponds to CL+A_beta. A lambda-model is a lambda-algebra which satisfies the Meyer-Scott axiom: if ax = bx for all x, then 1a=1b. Here 1 is the CL-term S(KI). Note that in any combinatory algebra we have 1ab=ab. In any lambda-algebra we have (confusing the terms with their interpretations) 1=lambda xy.xy so that 1a = lambda y.ay, 1(lambda x A) = lambda x A for every A in T(M), and 11=1. These facts can be used to show that a lambda-algebra is a lambda-model iff it is weakly extensional: whenever M|=A=B for every interpretation of x, M|=lambda x A = lambda x B.

A lambda-algebra is extensional iff it is weakly extensional and satisfies I=1. Every extensional combinatory algebra is a lambda-algebra, hence a lambda-model (since it is extensional).

Let T be a lambda-theory. Provability in T induces an equivalence relation on the set of lambda-terms which respects application. Let M(T) be the open term model of T given by the lambda-terms modulo provability in T, application via representatives, and k and s given by the equivalence classes of the obvious lambda-terms. We may restrict to closed terms and obtain the closed term model M0(T) of T. Both M(T) and M0(T) are nontrivial since [K] is not [S] (since T is consistent). In particular, we may consider the term models M(Lambda) and M0(Lambda). Interpreting variables in term models commutes with substitution. If T|-M=N, then M(T) and M0(T)|=M=N. In fact, M(T)|=M=N implies T|-M=N. Also, if M and N are closed, then M0(T)|=M=N implies T|-M=N. M0(T) is a lambda-algebra. M(T) is a lambda-model.

Jacopini [1975] showed M0(T) need not be a lambda-model by giving an example of such a theory T defined by OmegaKZ = OmegaSZ for every closed Z. So, in M0(T) we have OmegaKx = OmegaSx for all x in the model, but we do not have OmegaKx = OmegaSx where x is a variable (i.e., this equation is not provable in T). In fact, Plotkin [1974] showed M0(Lambda) and M0(Lambda+eta) are not lambda-models.

A the interior A0 combinatory algebra A is the substructure generated by k and s. A is hard if A0 = A. Let Th(A~) = {M=N : M and N are closed terms from T(A) and A|=M=N}. (Recall the Th(A) only considered equations between combinators.) M0(Th(A)) is isomorphic to A0. M0(Th(A~)) is isomorphic to A. So, we can embed every lambda-algebra A into a lambda-model by taking A isomorphic to M0(Th(A~)) substructure of lambda-model M(Th(A~)). Similarly, every lambda-algebra is the homomorphic image of a lambda-model (take M(Th(A~))->>M0(Th(A~)) by sending variables to some closed term such as K).

"There is a lambda-model that cannot be embedded in an extensional lambda-model. There is a combinatory complete applicative structure that cannot be made into a lambda-algebra (by choosing k and s). There is a lambda-algebra that cannot be made into a lambda-model (by changing k and s). There is a lambda-model that cannot be made into an extensional one (by collapsing it)."

For any lambda terms M and N, Lambda|-M=N iff M=N is true in all lambda-models (or lambda-algebras). Given a lambda-theory T, T|-M=N iff M=N is true in all lambda-models satisfying T. Provability in Lambda can be axiomatized by the first order theory consisting of K, S, K does not equal S, Meyer-Scott, and Curry's A_beta axioms.

Consider the rules ext, zeta, omega, and tr. We may refer to these rules generically as R. Each of these rules can be formulated as a rule (R-rule) or as an axiom (R-ax). Each of these rules is of the form R1 => R2. For a lambda-algebra M, M|=R-Rule if whenever M|=R1 we have M|=R2. We can also formulate these rules and axioms restricted to closed terms--the restricted rules are called R^0. The difference between the rule and axiom versions is when we quantify over valuations. The rule versions say if a certain equation or certain equations R1 hold for all valuations, then an equation R2 holds for all valuations. The axiom versions say for all valuations, if the equation or equations R1 hold, then the equation R2 holds. Each R-axiom implies R-rule (easily--though we do need M to be a lambda-algebra), but the converse is not true (one example he gives is omega in D-infinity). For hard models (e.g., the interior of any model), R-axiom is equivalent to R-rule (since the valuations can be removed since we can close the terms of interest). So, for hard models we may simply write M|=R. For hard models M we have M|=R iff M|=R^0 (since the restriction to closed terms still includes the entire domain). The condition M|=R is independent of the condition M^0|=R (M(Lambda-eta)|=ext, but its interior does not; there are also non-extensional models with extensional interior).

Let M be a lambda-algebra. M|=ext-ax iff M is extensional. M is weakly extensional iff M|=zeta-ax iff M|=zeta-rule (though his "proof" of this is a reference to an apparently irrelevant corollary; I am unconvinced that the rule implies the axiom). M|=omega-rule iff M^0|=omega iff M^0 is extensional (use the fact that M|=omega^0-rule implies M|=omega-rule, since the omega-rule can be implemented as a finite number of applications of the omega^0-rule).

Given a lambda-theory T and R one of the rules above, T|-R iff M(T)|=R-rule iff M(T)|=R-ax. It follows that if T is an extensional lambda-theory, then M(T) is an extensional lambda-model. Also, if T|-omega, then M^0(T) is an extensional lambda-model.

Given an applicative structure M=(X,.), we can let Val(M) be the set of valuations in M and a syntactical interpretation in M be a map I:Lambda(M) x Val(M)->X satisfying properties derived from what is usually the inductive definition of an interpretation. A syntactical applicative structure is an applicative structure with a syntactical interpretation. We can define satisfaction in M as usual. M is a syntactical lambda-algebra if it satisfies all the equations in Lambda. M is a syntactical lambda-model if it satisfies zeta. We can show by induction that in a syntactical lambda-model substitution commutes with interpretation. By showing that the axioms and rules of Lambda are sound in a syntactical lambda-model, we have that every syntactical lambda-model is a syntactical lambda-algebra.

A homomorphism between syntactical lambda-algebras is a map which commutes with interpretation. This gives a category of syntactical lambda-algebras (and a subcategory of syntactical lambda-models). The category of syntactical lambda-algebras is isomorphic to the category of lambda-algebras. The subcategory of syntactical lambda-models corresponds to the subcategory of lambda-models under this isomorphism.

Non-syntactical models can be constructed in CCC categories, such as the category of cpo's. Scott's D-infinity is an example of such a construction. In the specific case of cpo's, we say a cpo D is reflexive if [D->D] is a retract of D (i.e., there are continuous maps G:[D->D]->D and F:D->[D->D] so that FoG=1). Given a reflexive cpo, we obtain a lambda-model with domain D, application given by F(x)(y), and abstraction given by G(lambda y.xy) (so that the interpretation of (lambda y.xy)y is F(G(lambda y.xy))(y)=(lambda y.xy)y=xy [note that sometimes lambda is used syntactically and other times to actually name a function]. Rule zeta is satisfied because if two terms M and N are interpreted the same for all valuations of x, then lambda d.[[M]](x:=d) and lambda d.[[N]](x:=d) are the same (continuous) function and we have [[lambda x.M]] = G(lambda d.[[M]](x:=d)) = G(lambda d.[[N]](x:=d)) = [[lambda x.N]]. The representable functions in the lambda-model are precisely the continuous functions (represent a continuous function f:D^n->D by G(lambda d1.G(lambda d2....G(lambda dn.f(d1,...,dn))...)). The model is extensional iff GoF=1 (so that F and G are mutually inverse). If GoF=1 and de=d'e for all e, then F(d)=F(d') (extensionally, as functions), so that d=d' (by applying G). If the model is extensional, then G(F(d))e = F(G(F(d)))(e) = F(d)(e) = de, so that G(F(d)) = d by extensionality of the model, and so GoF=1.

Examples of a reflexive cpo's can be constructed as follows. Start with a set A and let B be the closure of A under (beta,b) in B whenever beta finite subset of B and b in B. Let D be the power set of B. D is a cpo (under subset) since power sets are always an algebraic lattice. A function f:D->D is continuous iff for every C, f(C) is the union of f(beta) for beta finite subsets of C (compact elements of D below C). Given x,y in D, we can define "x applied to y" by xy={b | there is a subset beta of y with (beta,b) in x}. Note that this is continuous in y since the beta's are finite so that xy is the union of xy' for y' finite subsets of y. Now, given f in [D->D], let lambda^G x.f(x) be the member of D given by {(beta,b) | beta is a finite subset of B and b is in f(beta)}. D is a reflexive cpo since we can let F(x) = lambda y.xy and G(f) = lambda^G x.f(x). F(x) is continuous as noted above. We can also check that F and G are continuous [recall that such functions are continuous iff F(sup(U)) = sup(F(U)) for directed U]. FoG=1 since F(G(f))(y) = F({(beta,b) | beta finite subset of B and b in f(beta)})(y) = {b | b in f(beta) for beta finite subset of y} = f(y) by continuity of f.

In an arbitrary CCC C with an object U and arrows F:U->U^U and G:U^U->U with FoG=1 (so U is reflexive), we can define a lambda-algebra M(C)=M(C,U,F,G) with domain |U|=Hom(1,U) and application given by evo(Fx1) (so, f applied to g is evo<Fof,g>). Interpretations are given in a context of free variables, so that the interpretation of M(x) in context delta of n variables including x is an arrow from U^n to U (for example, free variables are interpreted as projections). A valuation (restricted to the free variables of the relevant term) gives an arrow from 1 to U^n, so that we can define an interpretation of M with respect to a valuation rho as a member of |U| (arrow from 1 to U). Products of projections allow us to "thin" the context. We may change the context more generally by substituting for the free variables. Semantically, we have [[M[x:=N]]]_Gamma = [[M]]_Delta o <[[N]]_Gamma> where x is a vector of variables (agreeing with the context Delta) and N is a vector of terms. We can check that the equations in Lambda do hold (so we have a lambda-algebra) by induction on the length of the proof.

In M(C,U,F,G), we have [[1A]]_Delta = G o F o [[A]]_Delta. This can be used to show that U has enough points (i.e., for all f,g:U->U distinct, there is an x:1->U with fx distinct from gx) iff M is a lambda-model (i.e., satisfies the Meyer-Scott axiom). Also, F and G are mutually inverse (GF=1) iff the model satisfies 1=I. Combining these results, we have M is extensional iff U has enough points and GF=1.

A CCC C is strictly concrete if there is a full and faithful functor Phi:C->Set preserving finite products (including the terminal object) and so that Phi(B^A) is a subset of Phi(B)^Phi(A) with evaluation the restriction of full evaluation. [Compare this to Henkin models in which we may have a subset of the full function space.] In such a case, the transpose of f:A->B maps via Phi to the usual "Curried" transpose of Phi(f). Every object in such a category has enough points (so lambda-algebras given by reflexive U are lambda-models). Examples of strictly concrete CCC's are the category of cpo's or complete lattices with continuous maps. If C is a strictly complete CCC with reflexive object U, then we can map the lambda-model M(C) isomorphically to a lambda-model with domain Phi(U) (called the "concrete version of M(C) via the functor Phi").

Actually, every lambda-algebra can be obtained from a CCC C with a reflexive object U. Let A be a lambda-algebra. The category C(A) is given by the Karoubi envelope: the objects are a in A with aoa=a (idempotents--note aoa means lambda x.a(ax) mapped into CL and interpreted as an elements in A as usual), the arrows f:a->b are given by f in A satisfying bofoa=f, the identity arrow of a is a, and composition is given by composition of lambda terms. [Scott 1980] C(A) is a CCC with reflexive object I with arrows F=G=1. The terminal object t is lambda xy.y. Products are given as usual for lambda terms. The exponent b^a is lambda z.bozoa. ev:b^a->b is given by lambda z.b(pi_1 z(a(pi_2 z))). The transpose of f:cxa->b is given by Lam(f)=lambda xy.f[x,y]. In particular, I^I=1, 1:1->I, 1:I->1, 1o1=1=id_1, and ev:I^IxI->I is given by lambda z.pi_1 z(pi_2 z) = Spi_1pi_2. Then we have [Koymans 1982] M(C(A),I,1,1) is isomorphic to A. The elements of |I| are x:t->I and must be equal to K(xI). Note that application in M(C(A)) is given by Sab (by reducing ev o <1oa,b>). The isomorphism phi takes a in A to Ka in |I| (note that Ka = K(KaI)). This is easily seen to be bijective. phi is a homomorphism since phi(ab) = K(ab) = S(Ka)(Kb) = phi(a)phi(b).

Koymens [1982] showed that if A is a lambda-model, then every object of C(A) has enough points.

We do not, in general, have C(M(C,U)) isomorphic to C, since C may have more objects.

CC-Functors Phi:C1->C2 induce homomorphisms Phi*:M(C1)->M(C2). Also, homomorphisms phi:A1->A2 induce a CC-Functor phi+:C(A1)->C(A2) preserving the reflexive object I and retraction map 1. Also, phi+* is phi modulo the isomorphism between M(C(A)) and A.

Recall that a lambda-algebra is a combinatory algebra satisfying Curry's combinatory axioms A_beta, and a lambda-model is a lambda-algebra satisfying the Meyer-Scott axiom. There are simpler ways to axiomatize lambda-models than simply A_beta + Meyer-Scott.

Define 1_n inductively by 1_0 = 1 = S(KI) and 1_{n+1} = S(K1)(S(K1_n)). In any combinatory algebra, 1_n a b1...bn = ab1...bn. In any lambda-algebra, 1_n = lambda ab1...bn.ab1...bn. (The idea here is that in the absence of extensionality an n-ary function may be represented by several elements of the algebra. One of these is the canonical representative for which 1_n f = f. In fact, given any representative a, 1_n a is the canonical representative.) Note that 1_{n+1}ax = 1_n (ax).

Meyer [1982] and Scott [1980] independently showed a combinatory algebra is a lambda-model iff it satisfies Kxy=x, Sxyz=xz(yz), Meyer-Scott, 1_2 K=K, 1_3 S = S. The conditions 1_2 K = K and 1_3 S = S are enough to show that 1(lambda x.A) = lambda x.A (since under the translation to CL lambda x.A must be KP or SPQ and we have 1(KP) = 1_2 KP = KP and 1(SPQ) = 1_3 SPQ = SPQ). This fact, with Meyer-Scott, implies weak extensionality (recall that Meyer-Scott + lambda-algebra implies weak extensionality, but here we do not have a lambda-algebra a priori). In order to show we have a lambda-algebra, we use the characterization the CL(M)=CL(N) holds whenever Lambda|-M=N (shown by induction on the proof of M=N), and K=CL(Lam(K)) (since K = 1_2 K = 1_2 CL(Lam(K)) = CL(Lam(K))) and S=CL(Lam(S)) (as in K) hold in the model. In order to show the last equations, we need that if ax1...xn = bx1...xn for all x1,...,xn, then 1_n a = 1_n b (shown by induction), that 1(1_n a) = 1_n a (shown by calculation), and that 1_n(lambda x1...xn.A) = lambda x1...xn.A (by induction on n).

Clearly, the term 1 and the resulting sequence of terms 1_n play an important role in this characterization of lambda-models. Meyer [1982] abstracted this role to define a combinatory model as an applicative structure with distinguished elements s, k, and epsilon satisfying Kxy=x, Sxyz = xz(yz), epsilon xy = xy, and forall x ax=bx implies epsilon a = epsilon b. A combinatory model is stable if it also satisfies epsilon epsilon = epsilon, epsilon_2 K = K, and epsilon_3 S = S (where epsilon_1 is epsilon and epsilon_{n+1} is S(Kepsilon)(S(Kepsilon_n))). For any combinatory model, we have epsilon(epsilon a) = epsilon a (by epsilon ax = ax and the epsilon-Meyer-Scott axiom) which implies epsilon(epsilon_n a) = epsilon_n a (calculation from definition of epsilon_n). From this we can show, epsilon_{n+1} a = a iff epsilon a = a and forall x epsilon_n (ax) = ax. Also, epsilon_n a = a iff epsilon(ax1...xi) = ax1...xi forall x1,...,xi,i<n (by induction on n using the previous characterization of epsilon_{n+1} a = a). A combinatory model is stable iff epsilon, k, ka, s, sa, sab (forall a and b) are all fixed points of epsilon.

Every stable combinatory model is a lambda-model with epsilon = 1 (so epsilon is determined by s and k). Also, epsilon determines s and k (since if kxy=x and k'xy=x, then epsilon_2 k=epsilon_2 k' implies k=k'--similarly for s). Given any combinatory model, we can obtain a stable combinatory model by taking epsilon' = epsilon epsilon, k' = epsilon_2 k, and s' = epsilon_3 s (stability follows using the fixed point characterization). Note that although (by above) 1 determines s and k, a map may preserve application and 1 without preserving s and k (consider the appropriate constant map).

Consider a combinatory complete applicative structure M=(X,.). An expansion of M distinguishes elements s and k to form a combinatory algebra. M is a categorical lambda-model (lambda-algebra/combinatory algebra) if there is a unique expansion making M into a lambda-model (lambda-algebra/combinatory algebra). An element epsilon of M is stable if epsilon epsilon = epsilon, epsilon ab=ab, and forall x ax=bx implies epsilon a = epsilon b. Let [X->X] be the set of functions representable in M. Define F:X->[X->X] by F(a)(b) = ab. Expansions of M to a lambda-model correspond to functions G:[X->X]->X so that FG=1 and GF is representable (in one direction, take G(f) = 1a for some a representing f; in the other direction, take epsilon = G(GoF)). Such G's correspond to stable epsilons (stable epsilons represent the GF's). So, M is a categorical lambda-model iff there is a unique such G iff there is a unique stable epsilon.

P-omega (the power set of omega) is an algebraic lattice (hence a cpo). f:P-omega->P-omega is continuous in the Scott topology iff f(Union beta[finite]) = Union f(beta[finite]). So, f is determined by graph(f) = {(n,m)|m is in f(e_n)} where (,) is a pairing on N and {e_n} is an enumeration of the finite subsets of omega. graph:[X->X]->X embeds [X->X] as a retract into X (graph is the "G"). So, P-omega forms a lambda-model.

D-infinity is a class of models formed by taking the limit of psi_n in [D_{n+1}->D_n] where D_0 = D and D_{n+1} = [D_n -> D_n]. For appropriate choices of psi_n, we have D-infinity isomorphic to [D-infinity -> D-infinity] giving an extensional lambda-model.

The set of Bohm-like trees B forms a cpo and gives a lambda-model which equates precisely those terms with the same Bohm tree.

P-omega also equates precisely those terms with the same Bohm tree. D-infinity equates all unsolvable terms (mapping them to bottom) and in fact equates terms as in the theory H* (the unique HP-complete extension of the theory H which equates unsolvable terms). The terms equated by D-infinity can also be characterized by Bohm trees (by incorporating eta, for example). D-infinity models omega, so the interior is extensional.

A continuous lambda-model is a lambda-model given by a cpo with application continuous so that the approximation theorem (M = sup(M^(k))--where M^(k) is the approximation of M obtained from the Bohm tree of M) holds. P-omega, D-infinity, and B are all continuous lambda-models. For any continuous lambda-model, any two terms with the same Bohm tree are equated and fixed point operators represent the least fixed point operator.

The term model M(Lambda-eta) is extensional, but the closed term model M0(Lambda-eta) is not. Similarly, the term model M(Lambda) is weakly extensional, but M0(Lambda) is not. We have similar statements replacing Lambda by H or B. Neither P-omega nor its interior are extensional. Both D-infinity and its interior are extensional. P-omega is a categorical lambda-model (but not a categorical combinatory algebra). The model D_A (constructed from the power set of B where B is the closure of A under pairing of finite subsets with elements) is not a categorical lambda model.

A lambda-algebra has the range property if every definable function has infinite or singleton range. Open term models M(T) have the range property. For r.e. theories T, closed term models M0(T) have the range property. Continuous lambda-models M and their interiors have the range property.

Church's delta (satisfying delta MN = T if M=N and delta MN = F if M is not N for all closed nf's M and N) is not definable. Every definable map on M(T) (or M0(T) if T is r.e.) with range included in the numerals is constant.

A map phi:M->M is locally representable if for each b we have the function a|->phi(a)b is representable. A lambda-algebra is rich if all locally representable maps are representable. If M is rich, then M is extensional. M(Lambda-eta), M(H+eta), and D-infinity are rich. If M is hard and satisfies B (terms with equal Bohm trees are equated), then M is not rich.

The tree topology on an open or closed term model M is the largest topology making the map taking lambda-terms to their equivalence classes continuous (where the lambda-terms have the tree topology given by Bohm terms). For an open or closed term model M, M is s.s. iff M is not indiscrete. If M is s.s., then a term is unsolvable iff its only neighborhood is the whole space. For T=Lambda, Lambda-eta, H, H+eta, B or B+eta, and M=M(T) or M0(T), we have that term has a nf iff it is isolated in M.

Given a lambda-algebra M, let S(M) be the monoid {1a | a in M} with composition as the operator and I as identity. Let G(M) be the subgroup of S(M) of invertible elements. For a lambda-theory T, let S(T)=S(M(T)) and G(T)=G(M(T)). S(M0(Lambda=eta)) is a recursively presented semigroup with two generators and has an unsolvable word problem. A term A in S(Lambda-eta) is invertible iff BT(A) is a finite hereditary permutation. A term A in S(H*) is invertible iff BT(A) is a hereditary permutation.

G(Lambda-eta) = G(Lambda+omega) = G(H+eta) = G(H+omega). This group is a subgroup of G(H*).

We can characterize the groups G(Lambda+eta) and G(H*) as follows. Start with the trivial group G0 and S the group of permutations on N with finite support. Let G-omega be the group of omega-sequences from G with almost all coordinates equal to the identity ("weak infinite power"). Let G{n+1} be the semidirect product of Gn-omega and S with pi in S acting on <x0,x1,...> in Gn-omega to give <x(pi 0),x(pi 1),...>. We can define homomorphisms fn:Gn->G{n+1} and gn:G{n+1}->Gn inductively: f0(e)=e1, f{n+1}(<pi,{xi}>)=<pi,{fn(xi)}>, g0(x)=e, g{n+1}(<pi,{xi}>)=<pi,{gn(xi)}>. G(Lambda-eta) is (up to isomorphism) the colimit of the diagram (Gn,fn). G(D-infinity) contains (up to isomorphism) the limit of the diagram (Gn,gn). Furthermore, constructing the limit of diagram (Gn,gn) in the canonical way gives the group of sequences <x0,x1,...> with gn(x{n+1})=xn for all n. If we form the subgroup of all recursive such sequences (giving the "recursive projective limit"), then this subgroup is isomorphic to G(H*).

*Part II: CONVERSION*

**Chapter 6: Classical Lambda Calculus.** This chapter considers
Lambda (i.e., the lambda-calculus with beta-reduction only). There
are several fixed point combinators (i.e., terms M so that MF = F(MF)
for all F). Y=lambda f.(lambda x.f(xx))(lambda x.f(xx)) is one.
Curry called Y the "paradoxical combinator". We do not have
Yf->>f(Yf). However, Turing [1937] defined a fixed point
combinator Theta so that Theta f ->> f(Theta f). Theta is AA
where A is lambda xy.y(xxy). For any term C(f,x-bar), we can take
F=Theta(lambda fx-bar.C(f,x-bar)) and we have FN-bar ->>
C(F,N-bar) for all N-bar.

YI = Omega (so, Omega is a fixed point of I--which is trivial since everything is a fixed point of I). We can define combinators T=lambda xy.x and F=lambda xy.y (for "true" and "false"). T is K and KI->>F. Given these definitions, the conditional "if B then M else N" is defined simply as BMN. Pairing is defined by [M,N] := lambda z.zMN with projections (M)_0 := MT and (M)_1 := MF. We do not have [(M)_0,(M)_1]=M (i.e., everything is not a pair). Given pairing, we can define n-tuples by induction using pairs. Or, we can directly define <M0,...,Mn> := lambda z.zM0...Mn with the obvious projections. Composition has the standard definition.

Numerals can be defined as '0' := I, 'n+1' := [F,'n']. Successor, predecessor and test for zero are easily defined. S+ := lambda x.[F,x]. P- := lambda x.xF (so the predecessor of 0 is F). Zero := lambda x.xT. We may abbreviate successors and predecessors by n+ and n- respectively. A numerical (total) function phi:N^m->N is lambda-definable iff F'n1'...'nm' = 'phi(n1,...,nm)' for all n1,...,nm. Lambda-definable functions are recursive since their graphs are given by Lambda|-F'n1'...'nm' = k and so are r.e. so that the function is recursive. All recursive functions are lambda-definable. First, the initial functions are lambda-definable. The class of lambda-definable functions is clearly closed under composition. The class can be shown to be closed under primitive recursion by using a fixed point combinator to construct an F so that Fxy="if Zero x then Gy else H(F(x-)y)(x-)y". A fixed point combinator can also be used to show closure under minimization, by constructing H_P ->> lambda z."if Pz then z else H_P(z+)". Also, a numerical relation R is recursive iff there is a lambda-term F so that F'n1'...'nk' = T iff R('n1'...'nk') and F'n1'...'nk' = F iff -R('n1'...'nk').

This is not the only way to define numerals in lambda-calculus. In general, we say a numeral system is a sequence of closed lambda-terms d0,d1,... so that there are lambda terms Sd+ and Zerod so that Sd+ dn = d(n+1), Zerod d0 = T, and Zerod d(n+1) = F. A numeral system is determined by d0 and Sd+. We can define the obvious notion of lambda-definability of numerical functions with respect to d. The numeral system d is called adequate iff "all recursive functions are lambda-definable with respect to d." It turns out that a numeral system d is adequate iff there is a lambda term Pd- (predecessor) so that Pd- d(n+1) = dn. So, for a sequence of closed lambda-terms d0,d1,... to form an adequate numeral system, we precisely need successor, predecessor and test for zero.

An example of such a numeral system is Church's numerals. c_n := lambda fx.f^n(x). Sc+ := lambda abc.b(abc). Zeroc := lambda x.x(KF)T. Notice c_1 == 1. This system can be shown adequate by constructing lambda-terms H and H^{-1} so that H'n' = c_n and H^{-1}c_n = 'n'.

The double fixed point theorem (that for every F and G we can find A and B such that A=FAB and B=GAB) can be proven in two ways. First, let A_b := Theta(lambda a.Fab); let B := Theta(lambda b.GA_b b); let A := A_B. Then, check A->>FAB and B->>GAB. The second method generalizes naturally to the n-fold case. Let X = Theta(lambda x.[F(x)_0(x)_1,G(x)_0(x)_1]); A:=X_0; B:=X_1. Then check that A->>FAB and B->>GAB.

We can characterize all fixed point combinators as fixed points of the combinator SI:=lambda yf.f(yf). This result allows us to form fixed point combinators Y^n where Y^0 := Y and Y^{n+1} := (Y^n)(SI). (Note: Y^1->>Theta.) For any M, we have the fixed point Y_M := lambda f.WWM where W:=lambda xz.f(xxz).

Let #:Lambda->N be an effective injection (Godel numbering). This allows us to talk about notions such as "recursive sets of lambda-terms". Let 'M' be '#M' (the numeral of the Godel number of M). The second fixed point theorem states that for every F there is an X so that F'X' = X. X := W'W' where W:=lambda x.F(Ap x (Num x)) where Ap is a lambda-term representing application (so Ap'M''N' = '(MN)') and Num is a lambda-term representing the operation n|->#'n' (so Num'M'=''M''). Note that X = W'W' -> F(Ap 'W' (Num 'W')) -> F('(W'W')')) = F('X'). In fact, there is a combinator Theta2 so that for every closed F, Theta2 'F' ->> F '(Theta2 'F')'.

A set of lambda-terms is nontrivial if it is neither the empty nor the full set. If A and B are two nontrivial, disjoint sets of lambda-terms closed under equality, then A and B are recursively inseparable. It follows that no nontrivial A closed under equality is recursive. We can show recursive inseparability by contradiction. Suppose C is recursive with A<C and B<-C. Let F be a lambda-term so that F 'M' = '0' if M is in C and F 'M' = '1' if M is not in C. Let G be lambda x."if Zero(Fx) then M1 else M0" (for some M0 in A and M1 in B). The second fixed point theorem gives an X with G 'X' = X. This X is in C iff it is not in C, giving a contradiction. This result can be refined as follows: If A and B are nontrivial, disjoint sets of combinators which are closed under equality of combinators, then A and B are recursively inseparable. It follows that {M | M has a nf} is r.e., but not recursive (Church [1936]--one of the first examples of such a set). Also, it follows that the theory Lambda is essentially undecidable (is consistent and has no consistent recursive extensions).

By contrast, in the lambda-I calculus, the set {M | FV(M) = {x}} is recursive, nontrivial and closed under equality.

The fixed point combinator can be used to construct Godel's self-referential sentence and to show Kleene's recursion theorem.

Let P be first-order Peano arithmetic, let #s be the Godel number of syntactic objects s of P, and let 's' := '#s' be the corresponding numeral in the system P. Godel showed that for any formula A(x) we can construct a sentence B with P|-B<->A('B') (stating "I have property A"). We can define the relation on natural numbers n1~n2 iff n1=#A1, n2=#A2 and P|-A1<->A2. Then the problem reduces to finding an n so that n~#A('n') [which is a certain fixed point]. Let lambda x.n = <#x,n> (where < , > is a recursive pairing function with recursive projections ( )_0 and ( )_1). Let n.m be a p.r. function satisfying (lambda x #A(x)).m = #A('m'). This p.r. function is representable in P by o so that P|-'n' o 'm' = '(n.m)'. So, #A('n' o 'm') ~ #A('(n.m)'). Let v := lambda x.#A(x o x) and n := v.v. Then we show n~#A('n').

Let {.} be an enumeration of the partial recursive functions [think: e is the code of the function {e}]. Kleene's recursion theorem states that if f is a total recursive function, then there is some n (fixed point) so that f(n) and n code the same partial function (i.e., {f(n)}(x) = {n}(x) for all x).

We can show this by first showing that given any partial function psi, there is a code v so that {v} is total {v}(x) codes the same function as psi(x) if psi(x) is defined (otherwise {v}(x) codes the nowhere defined partial function). We may denote [and think of] this v as "lambda x psi(x)". The v is obtained as follows: lambda xy.{psi(x)}(y) is partial recursive, so there is an e with {e}(x,y) = {psi(x)}(y). There is a p.r. s (by the "s-m-n theorem") so that {e}(x,y) = {s(e,x)}(y) for all y. So, for each x, s(e,x) and psi(x) [essentially] code the same partial recursive functions. s(e,x) is a total function of x. Let v be the code of this function, so that {v}(x) = s(e,x) forall x.

Kleene's recursion theorem follows by taking v to be "lambda x.f({x}(x))" and letting n be {v}(v) (well-defined since v codes a total function). Now, {n}(x) = {{v}(v)}(x) = {f({v}(v))}(x) = {f(n)}(x). So, n and f(n) code the same partial function.

**Chapter 7: The Theory of Combinators.**
Combinatory Logic (CL) is a formal system with
constants terms
S and K and terms built by application (but not lambda).
The formulas are equations between terms. The basic
axiomatic equations: Kxy = x and Sxyz = (xz)(yz).
We have the usual rules for congruence and equivalence.
(We can also define a notion of weak reduction by
orienting these equations in the obvious way.)
Let *C* denote the set of terms of CL.

Induction arguments are often simplified by the
fact that there are no variable bindings in CL.
However, we can define a function for each variable
from CL terms to CL terms lam*x:*C*->*C*
by induction. (First let I = SKK and note that Ix=x
is provable.) Let lam*x(x) := I, lam*x(P) := KP if x
is not free in P, and let lam*x(PQ) := S(lam*x(P))(lam*x(Q)).
Clearly, FV(lam*x(P)) = FV(P)-{x}. But the real point is
that lam*x acts like the "lambda x" binding in the following
sense: CL |- (lam*x(P))Q = P[x:=Q] (we can extend this
to several variable and Q's if we are careful about free
variables--assume a variable convention). This is proven by
induction. Note the similarity between this proof
and the proof of the deduction theorem in a Hilbert Style
system. The idea in both cases is that we work backwards through
the term/proof and replace P(x) (x|-P) with lam*x(P) (|-x => P).
Lambek and Scott make the relationship between the deduction
theorem and combinatory completeness quite clear: Combinatory
completeness (or, functional completeness)
is the deduction theorem with equations between proofs.

We can extend CL to CL+ext by including a rule "from Px = P'x derive P = P' if x is not free in P and not free in P'". CL does not respect the zeta rule (e.g., SK and KI), but CL+ext does. In fact, it turns out CL+ext is equivalent to Lambda+eta.

Weak reduction is CR (use a reduction allowing disjoint weak
reductions to be done simultaneously
and show this has the diamond property and its transitive closure
is weak reduction). This, of course,
implies normal forms are unique, implies consistency of CL,
and gives us a way of telling if two terms are not provably equal.
For example, SK and KI are distinct normal forms, so CL *does not*
prove SK = KI, even though the corresponding lambda terms are beta-equal.

There is also a notion of strong reduction which corresponds to CL+ext equality.

We can translate between CL and Lambda [as we can translate between Hilbert-style and Gentzen-style deduction systems]. Going from CL to Lambda is as simple as replacing combinators with the corresponding Lambda definitions of combinators [compare this with replacing Hilbert-style axioms with natural deduction proofs of the axioms]. Going from Lambda to CL requires use of the lam*x functions [compare this with replacing all natural deduction inference steps with convoluted appeals to Hilbert-style axioms and modus ponens, where modus ponens of course corresponds to application]. We can denote the lambda term for a combinator P by lam(P) and denote the combinator for a lambda term M by CL(M).

It is easy to show that if CL |- P = Q then Lambda |- lam(P)=lam(Q). The proof is by induction on the judgement of the equation. Intuitively it is true because the axioms Kxy=x and Sxyz=(xz)(yz) follow by the definitions of K and S in the lambda calculus and by beta-reduction.

The converse is not true, as I keep emphasizing. Consider (again!)
Lambda |- lam(SK)=lam(KI) but CL |-/- SK = KI. Again, CL is not
closed under rule zeta (congruence of equality under lambda bindings--
"weak extensionality"). Curry showed how to give more equations
between *closed terms*
as axioms, A_beta, so that CL+A_beta is equivalent to Lambda.
By equivalence, we mean lam(CL(M)) = M for all M,
CL(lam(P)) = P for all P, CL(M) = CL(N) iff M = N for all M and N,
and lam(P) = lam(Q) iff P = Q for all P and Q.

What should such an A = A_beta satisfy?
Clearly, we need to be
able to prove
K = lam*x(lam*y(x)) and S = lam*x(lam*y(lam*z((xz)(yz)))).
Furthermore, if we can prove these, then we can prove
CL(lam(P)) = P for all terms P (by an easy induction on P).
In order to show CL+A is closed under zeta, we need
equations lam*x(KPQ) = lam*x(P) and lam*x(SPQR) = lam*x((PR)(QR)),
but not only these! These equations allow us to start the
induction on the S and K axioms. The axioms of A are easy
because *we are assuming the equated terms closed*.
But watch what happens when we try to prove the induction
step for congruence under application: Suppose we have
P = Q |> PR = QR, and the induction hypothesis is that
lam*x(P) = lam*x(Q). We want to derive lam*x(PR)=lam*x(QR).
If x is free in PR and QR, then lam*x(PR) = Slam*x(P)lam*x(R)
= Slam*x(Q)lam*x(R) by the induction hypothesis, reflexivity,
and congruence under application. But what if x is not free
in one or both of them? If x is not free in both, then
the induction hypothesis (by the definition of lam*) says
KP = KQ. And what we are trying to derive is K(PR) = K(QR).
As you can see, we are in a bit of trouble here. A similar
problem occurs in the case P = Q |> RP = RQ when x is not
free in one or both sides of RP = RQ. The solution? Read on.

We can inductively define
another operator lam1x:*C*->*C*
by lam1x(x)=I, lam1x(PQ)=S(lam1x(P))(lam1x(Q)),
and lam1x(c)=Kc if c is a constant K or S
or variable other than x. The point is that lam1x(-) is defined
by going all the way to the base instead of stopping when we first
see independence of x. In general lam*x and lam1x give different
results. In fact, we may not even have the results equal under
CL. In fact, we *usually* will not have the results equal
under CL. Why not? Well, lam* and lam1 both return weakly
normal forms, so they will only be provably equal if they are
syntactically identical. Here's a concrete example where
they are not: lam*x(SS) = K(SS) and lam1x(SS) = S(KS)(KS).

On the other hand, if we assume CL+A can prove S(KP)(KQ) = K(PQ), then we can show (by induction on the structure of P) that CL+A |- lam*x(P) = lam1x(P) for all P. Using this, we can complete the induction in the congruence of application cases by noting lam*x(PR) = lam1x(PR) = S(lam1x(P))(lam1x(R)) = S(lam*x(P))(lam1x(R)) = S(lam*x(Q))(lam1x(R)) = S(lam1x(P))(lam1x(R)) = lam1x(PR) = lam*x(PR).

Hopefully this gives the idea of what A_beta should satisfy.
So, let A_beta be given by the following equations between
*closed* terms:

(A.1) K = lam*xy(Kxy)

(A.1') K = lam*xy(x)

(A.2) S = lam*xyz(Sxyz)

(A.2') S = lam*xyz((xz)(yz))

(A.3) lam*xy(S(Kx)(Ky)) = lam*xy(K(xy))

(A.4) lam*xy(S(S(KK)x)y) = lam*xyz(xz)

(A.5) lam*xyz(S(S(S(KS)x)y)z) = lam*xyz(S(Sxz)(Syz))

The motivation for the first two axioms is obvious. The motivation for the third axiom is explained above--we need to be able to show S(KP)(KQ) = K(PQ) so that the induction to show CL+A is closed under zeta will work. For the basis step of that induction (the S and K axioms) we need lam*z(KPQ) = lam*z(P) and lam*w(SPQR) = lam*w((PR)(QR)). This provides the motivation for the fourth and fifth axiom . Note that lam1z(KPQ) is S(S(KK)P)Q for any P and Q (independent of whether z is free) and similarly lam1w(SPQR) is S(S(S(KK)P)Q)R. This is the origin of the corresponding combinator structures in (A.4) and (A.5).

In order to see that this A_beta works, first note that (A.1) and (A.2) are enough to give us CL(lam(P)) = P is provable. (A.3) allows us to infer that lam*z and lam1z give provably equal results. Using this fact, we apply (A.5) to lam1w(P), lam1w(Q), and lam1w(R) and weakly reduce to S(S(S(KS)(lam1w(P)))(lam1w(Q)))(lam1w(R)) = S(S(lam1w(P))(lam1w(R)))(S(lam1w(P))(lam1w(R))). By the definition of lam1w, this is lam1w(SPQR) = lam1w((PR)(QR)). Using the fact that lam*w and lam1w give provably equal results, we have closure under zeta for S. To obtain the same for K, we must work a bit harder. First, we can prove lam*z(SPQz) = SPQ whenever z is not free in SPQ, using (A.2). Using this, (A.1), and induction on P, we can prove lam*z(lam*x(P)z) = lam*x(P) whenever z is not free in P. Applying (A.4) to lam1x(P) and lam1x(Q) and weakly reducing, we have S(S(KK)(lam1x(P)))(lam1x(Q)) = lam*z(lam1x(P)z). using the definition of lam1 and applying the fact lam*z(lam*x(P)z) = lam*x(P) (since z is not free in P by the variable convention), we have lam1x(KPQ) = lam*x(P), which is what we wanted.

It follows that Lambda and CL+A_beta are equivalent, in the sense that:

Lambda proves lam(CL(M)) = M (in fact, we have
lam(CL(M)) ->> M, since we can show lam(lam*x(P)) ->> lambda x lam(P)
for every P in *C*).

CL+A_beta proves CL(lam(P)) = P for every P in *C*
(this follows by induction on P using (A.1) and (A.2)).

Lambda proves M = N iff CL+A_beta proves CL(M) = CL(N).

CL+A_beta proves P = Q iff Lambda proves lam(P) = lam(Q).

We can prove the "=>" (only if) directions of the last two assertions by induction on the judgement of the equality. (The fact that lam(lam*x(P)) ->> lambda x lam(P) helps one to see that the translations of the equations in A_beta are provable in Lambda.) The "<=" directions then follow using the first two assertions. For example, if CL+A_beta proves CL(M) = CL(N), then Lambda proves M = lam(CL(M)) = lam(CL(N)) = N.

It is easy to see that CL+ext proves the formulas in A_beta. In fact, CL+ext is equivalent to Lambda+ext (hence Lambda+eta) in the same sense as above. One only needs to extend the induction on proofs to include the extensionality rules.

We can also axiomatize CL+ext by A_beta-eta by adding the axiom

(A.6) lam*x(S(Kx)I) = I.

The intuition is that this says 1 = I (which is essentially what eta says as well). One could read the equation as lam*x(lam1y(xy)) = lam*x(x). The equivalence between CL+ext and CL+A_beta-eta is easy to show by induction on proofs, replacing each appeal to ext with a short argument using (A.6), or each appeal to (A.6) by a short proof of (A.6) using extensionality.

If we really want to, we can compute explicit axioms for A_beta and A_beta-eta. Actually, Barendregt presents simplified sets of axioms. To check that the original set implies the simplified set, it suffices to compute the corresponding lambda terms, beta reduce, and check that the nf's are the same (they are) (of course, we also use eta for A_beta-eta). To check that the simplified set implies the original set is a bit tougher.

A consequence of the fact that we can finitely axiomatize CL+A_beta is that first-order logic with a single binary relation symbol (representing equality in CL) and a single binary function symbol (representing application in CL) is undecidable.

Notice that the translations between Lambda and CL do not preserve normal forms, the property of having a normal form (consider S(Komega)(Komega) where omega = SII--this is a weak nf, but its lambda translation has no nf). We also know beta reductions do not translate into weak reductions (consider lambda x.II -> lambda x.I, but S(KI)(KI) does not weakly reduce to KI).

A concept that *is* preserved is
solvability.

See also Schonfinkel's original paper introducing Combinatory Logic.

**Chapter 8: Classical Lambda Calculus (Continued).**
Up to beta-equality, the set of closed lambda terms is generated
by the closed lambda terms S and K (by application). In this situation,
we say {S,K} is a basis for the set of closed lambda terms. We actually
know more: for every closed lambda term M there is an N in {S,K}+
(the set of terms generated by {S,K}) with N ->> M.
Knowing this for the basis {S,K} implies it is true for any basis,
since we can form the N in {S,K} and then write N in terms
of the basis elements. CR implies the representations of S and K
reduce to S and K, respectively (since S and K are nf's).

Even better, we have a basis of one term X := < K,S,K > == lam x.xKSK (recall the definition of tuples). We can easily calculate XXX = K and X(XX) = S. This is more than a curiosity; it has the following important application:

There is a term E which enumerates the set of closed terms. In fact, for every closed term M, we have E 'M' ->> M (knowing = is enough for enumeration). We construct this E as follows:

First, for every M we can effectively construct M' in {X}+
with M' ->> M. Next, we define a function "natural" from {X}+
to the natural numbers inductively by natural(X) := 0 and
natural(AB) :=

Also, given any finite vector of variables x-bar, we can define Ex-bar by lambda a.(E(Ha)x-bar) where H'M' ->> '(lam x-bar M)'. So, we have Ex-bar'M' ->> E'(lam x-bar M)'x-bar ->> (lam x-bar M)x-bar ->> M.

A sequence of closed lambda terms {M_n} is uniform in n if there is a lambda term F with F'n' = M_n. Such an F lambda defines the sequence. A sequence {M_n} is uniform in n whenever it only contains a finite number of variables and the function taking n to #(M_n) is recursive. (We can take F to be Ex-bar o G where G_n ->> _(M_n).) One might naturally believe that we must have n|->#(M_n) recursive to have uniformity in n, but consider the counterexample {I^{f(n)}_n} where f is any nonrecursive function. Since we only requre F'n' = M_n up to beta-equality, F := KI lambda defines the sequence.

Given a sequence {M_n} uniform in n, lambda defined by F, we can define the generator of the sequence to be A := Theta(lambda ax.[Fx,a(S+x)]). We can then let [M_n] := A'0'. Notice that [M_n] has the same free variables as F. More importantly, notice that for every natural i, [M_n] = [M_0,...,M_i,A'i+1']. Why? Because [M_n] = A'0' ->> [F'0',A'1'] = [M_0,A'1'] ->> [M_0,[F'1',A'2']], etc. We can also use projections to recover each M_n (up to beta-equality). Also, notice that if we knew F'n' ->> M_n for each n, we can make the further observation that [M_n] ->> [M_0,...,M_i,A'i+1'] for each i, and that pi_i [M_n] ->> M_i.

What good is this? For one thing, it will give us interesting examples of Bohm trees, since each [M,N] is in head normal form. For an interesting immediate application, consider the following closed universal generator:

Start with the uniform sequence {E'n'_n} defined via E.
Notice that E'n' ->> E'n' (in the most trivial way possible!).
Let A be Theta(lambda ax.[Ex,a(S+x)]) as above and M be A'0'.
So, M ->> [E'0',...,E'i',A'i+1'] for every i. So what?
Notice that given any term (even open) N, this is the subterm of
a closed term N* (take lambda x-bar N) and E'N*' ->> N*. So,
[E'n'] ->> [E'0',...,E'N*',A'#(N*) + 1']. That is, every term N
is a subterm of some (beta) reduced form of the term [E'n']. We
call such a term a (beta) Universal Generator. Notice that this
implies the beta reduction graph of M contains the beta reduction
graph of *every other lambda term* as a subgraph!

We can also define the notion of a uniform sequence of lambda terms indexed by lambda terms, i.e., {M_A|A in Lambda}. Such a sequence is uniform in A if there is an F with FA = M_A for every A. But this notion is not so exciting. A sequence is uniform iff for some variable x and all A, M_A = M_x[x:=A]. Clearly such a sequence is defined by F = lambda x.M_x. Furthermore, given an F defining a sequence, we note that M_x = Fx and M_A = FA = Fx[x:=A] = M_x[x:=A].

Recall that a *closed* lambda term M is solvable
if there are N_i's such that MN_1...N_n = I. (Since I
is in nf, it follows that MN_1...N_n would beta reduce
to I.) For *open* terms we say M is solvable if
its closure lambda x-bar M is solvable. (Otherwise, M
is unsolvable.) It is easy to see that an open term is solvable
iff some substitution instance instance is solvable. The following
facts are also easy to show:

M is solvable iff lambda x.M is solvable.

If M is unsolvable, then so are MN, M[x:=N], and lambda x.M.

Examples: Any normal form is solvable. xIOmega is solvable (since KIOmega ->> I). Omega is unsolvable, since any reduct of OmegaN-bar must still have Omega as head. Notice, that Omega = omega omega where omega := (lambda x.xx) is solvable. Y is solvable: Y(KI) = KI(Y(KI)) = I. The numerals are uniformly solvable: 'n'KII = I.

First, by induction on the construction of lambda terms, we can show that every lambda term is either a variable, an application term of the form MN-bar (N-bar nonempty) where M is not an application term, or an abstraction term of the form lambda x-bar.N (x-bar nonempty) where N is not an abstraction term. From this we can show that every M is either in the head normal form (hnf) lambda x-bar.xM-bar (x-bar and M-bar may be empty) or the form lambda x-bar.(lambda x.M0)M1...Mm (x-bar may be empty, but m>=1) which has the head redex (lambda x.M0).

We say M has the head normal form (hnf) M' if M' is in hnf and M=M'. We can define an obvious notion of head reduction (which provides no choice as to which redex to reduce). Wadsworth proved (using Curry's standardization theorem) that M has a hnf iff its head reduction terminates.

Using the substitution lemma, we can show that of M head reduces to M' in one step, then M[x:=N] head reduces to M'[x:=N] in one step. (The substitution has no serious effect on the head redex.) It is trivial to see that lambda x M has a hnf iff M has a hnf. Using this and considering the form of head reductions we can prove:

If M[z:=N] has a hnf, then so does M.

If MN has a hnf, then so does M.

Compare these to the results on solvability and one will not be surprised to learn that a term is solvable iff it has a hnf. It is trivial that terms with hnf are solvable (since we can essentially plug in an "n-ary constant identity" term for the head). For the other direction, assuming M solvable, we have MN-bar = I has a hnf, so M has a hnf. It follows that M is unsolvable iff it is hereditarily without a nf (i.e., for every substitution instance M* and for every N-bar we have M*N-bar has no nf).

A term may have many hnf's, so we define the principal hnf to be the one obtained by a head reduction (if it terminates). We know by Wadsworth's result (which follows from Curry's Standardization Theorem) that the head reduction terminates iff the term has a hnf.

The notion of solvability can be defined for CL and is preserved under the translations CL and lam.

A partial numeric function phi is lambda definable iff there is a term F so that F'n1'...'nk' = 'phi(n1,...,nk)' for every (n1,...,nk) in the domain of phi, and F'n1'...'nk' is unsolvable for every (n1,...,nk) out of the domain. (See the discussion motivating this choice of definition.)

Kleene's Theorem: A partial numeric function is partial recursive iff it is lambda definable.

It is very important to notice that the numerals are uniformly solvable: 'n'KII = I. For n=0, we have IKII = KII = I. For successors, we have 'n+1'KII = [F,'n']KII = KF'n'II = FII = I (recall F = KI).

To show closure under composition, we make use of Lercher's "jamming factors" so that the composite will only be defined when appropriate (to avoid situations like this). So, if phi(n-bar) = chi(psi1(n-bar),...,psim(n-bar)) where chi is defined by G and each psii is defined by Hi, we define phi by lambda x-bar.(H1x-barKII)...(Hmx-barKII)(G(H1x-bar)...(Hmx-bar)).

To show closure under minimization, we must use the fact that quasi leftmost reductions are normalizing. This comes into play when we have mu(P) = H_P'0' where P'n' = F for all n. In this case, H_P'0' has an infinite quasi left most reduction, so it has no normal form. By a similar proof, every substitution instance applied to any vector of terms N-bar has no normal form because it has a quasi left most reduction. Thus, mu(P) is unsolvable (since it is hereditarily without a nf).

**Chapter 10: Bohm Trees.**
Informally, we may describe the Bohm tree BT(M) of a lambda term
M as follows: If M is unsolvable, then we have the tree
with one node: bottom. If M is solvable with
principal
hnf lam x-bar . yM1...Mn, then the Bohm tree is the tree
with root "lam x-bar . y", where the root has n children
BT(M1), . . ., BT(Mn).

Bohm trees may be represented in two different ways. One representation is as a partial function from Seq (finite sequences over N) to forms "lambda x-bar . y" or bottom--i.e., a labelled tree. The other representation is as a partially labelled tree with forms "lambda x-bar . y" as labels. The difference is how we treat unsolvable terms and what we consider the underlying (naked) tree. In the first case, unsolvable terms are given the label bottom. Since unsolvability is undecidable, the second case gives an effective procedure for generating a Bohm tree by declaring unsolvable terms not to be in the domain at all. But then we do want the underlying naked tree to include the node where bottom would be in the first representation--so, we include those nodes in the tree. So, given an effective representation A of a Bohm tree and a sequence alpha, we may have A(alpha) defined (hence in the underlying tree), A(alpha) not defined but alpha is still in the underlying tree (if alpha = beta * <k> with A(beta) = (a,n) with k < n), or A(alpha) not defined and alpha is not in the underlying tree.

Let Sigma1 be the set of labels of the form "lambda x-bar . y". Bohm trees are special cases of Bohm-like trees--arbitrary partially Sigma1-labelled trees. A Bohm-like tree A is r.e. iff the partial function A is partial recursive. A Bohm-like tree A is bottom-free if whenever alpha is in the underlying tree, then A(alpha) is defined. We can also define the depth d(A) for Bohm-like trees. Of course, d(A) may be infinite.

For any A and alpha in A (i.e., in the underlying tree), we define A_alpha to be the subtree of A at node alpha. Given a lambda term M, we can define M_alpha (assuming alpha is in the Bohm-tree of M) recursively using principal hnf's so that we can prove (by induction on alpha) BT(M_alpha) == BT(M)_alpha.

For any finite A, we can define a lambda term M(A) (by induction on the depth of A) so that BT(M(A)) == A. It's easy: M(bottom) := Omega. M(lam x-bar . y) := lam x-bar . y. M((lam x-bar . y,A1,...,An)) := lam x-bar . yM(A1)...M(An). Notice that M(A) has [is!] a nf iff A is bottom-free.

We can also chop off a Bohm-tree A at depth k, giving A^k (by simply making A^k(alpha) := A(alpha) if alpha has length less than k and undefined otherwise). Using the function M(-), we can define M^(k) := M(BT(M)^k) for lambda terms M. We can define A =k B iff A^k == B^k. Also, we can define A <k B iff A^k < B^k (here < is the subset relation on partial functions, except possibly the treatment of "bottom"). We can prove that BT(M^(k)) == BT(M)^(k) using BT(M(A)) == A.

To make our lives much simpler, when we have a fixed Bohm-like tree A and A(sigma) == < lam x-bar . y,m>, then we may write x-bar_sigma for x-bar, y_sigma for y, and m_sigma for m (hiding the dependence on A).

We can define the free variables of a Bohm-like tree A as FV(A) := Union FV_A(sigma) where the union is over sigma's in the underlying tree of A. We define FV_A(sigma) to be {y_sigma} - {x-bar_tau | tau <= sigma} (note this is a singleton or empty) if A(sigma) is defined or the empty set if A(sigma) is not defined. The point is that we are collecting the free variable from the head at each node, if that head is a free variable.

Of course we have FV(BT(M)) <= FV(M) (since beta reduction cannot introduce a free variable), but the inclusion may be proper if we "push" a free variable "to infinity" as in this example: Suppose we have M with principal hnf lam zx.x(Mz). Then z is not free in BT(Mz) (in fact, the tree is closed). If you want to be specific, take M := Theta(lam fzx.x(fz)).

A Bohm-like tree A is an actual Bohm tree (of a lambda-K term) iff FV(A) is finite and A is r.e. One direction of this is easy to see, since every BT(M) has finitely many free variables and is r.e. The other direction is quite involved.

Also, a Bohm-like tree A is the Bohm tree of a lambda-I term iff FV(A) is finite, A is r.e., and A has a partial recursive variable indicator. (A variable indicator for A is a partial function chi from Seq to sequences of variables so that chi(alpha) is defined iff alpha is in the underlying tree of A, {x-bar_alpha} union {chi(alpha)} = {y_alpha} union Union {chi(alpha * <i>) | i < m_alpha}, and {x-bar_alpha} intersect {chi(alpha)} is empty. It turns out, for lambda-I terms M, chi(alpha) := FV(M_alpha) is a variable indicator for BT(M).)

A Bohm-like tree A looks like a lambda-I-tree iff FV(A) is
finite, A is r.e. and for all alpha in the underlying tree of A
{x-bar_alpha} is a subset of Union {FV_A(beta) | beta < alpha}.
It would be reasonable to expect this characterizes
but it does *not*. In fact, there is a lambda-K term M
so that BT(M) looks like a lambda-I-tree, but has no partial
recursive variable indicator.

Since Bohm-like trees are partial functions, they form an coherent algebraic cpo with respect to the subset partial ordering. Also, we have arbitrary inf's by taking intersections. We can define a preorder (Bohm-less) and an equivalence relation (Bohm-equiv) on lambda terms by inheriting them from Bohm trees via BT(-).

The tree topology on Lambda is the smallest topology which makes the map BT(-) continuous. A set O of lambda terms is open iff

(1) (closed upwards) N is in O whenever some M is in O with M Bohm-less than N, and

(2) (approachability) if M is in O with BT(M) = Union_i BT(M_i), then for some i we have M_i in O.

The set of solvable terms is open, since it is the inverse of the set of Bohm trees larger than bottom.

A basis for the tree topology is given by sets O_M,k := {N | M^(k) Bohm-less than N}. The set of NF's are dense in the tree topology.

We can easily define a finite notion of eta-expansion and reduction on Bohm-like trees. Wadsworth showed we can also have infinite eta- expansions. For example, take J := Theta (lam jxy.x(jy)) so that Jx has principal hnf lam z.x(Jz). So, BT(Jx) has the form lam z0.x - lam z1.z0 - lam z2.z1 - . . . which may be considered an infinite eta expansion of x. Defining the notion of (possibly) infinite eta expansions in general is technically difficult, but should be intuitively clear. Without giving the technical details, the point is that we can say when a tree X (i.e., subset of Seq) "extends" A (so it has the right form for an eta expansion), and then define the eta expansion (A;X) by choosing some variables to stand in the right places.

If we know B is an eta-expansion of A, then we can recognize if B is a finite eta-expansion of A by looking at the underlying trees. In this case, we know B eta-reduces to A.

We have the upside down Church-Rosser Theorem for eta on Bohm-Trees. Also, we cannot have a proper extension of the tree be an eta expansion (because eta expansions add branches).

We can apply Bohm trees to "Bohm Out" subterms from lambda terms. The same ideas show that terms with distinct beta-eta normal forms are separable. Also, separability of a finite set of terms is equivalent to the set being "distinct" (where distinct is a certain recursive definition).

*Part III: REDUCTION*

**Chapter 11: Fundamental Theorems.**

The strip lemma says that if M -> M1 (one step) and M ->> M2 (many steps), then there is an M3 with M1 ->> M3 and M2 ->> M3. Once we show this, CR follows by induction on the multi-step reduction M ->> M1. The idea behind proving the strip lemma is as follows:

Index the redex contracted in M -> M1.

Lift the M ->> M2 reduction to be an indexed reduction M~ ->> M2~.

Perform a complete development on the indexed redexes in M2~ giving M2 ->> M3. (Actually, this is before complete developments are introduced. Here we obtain M3 as phi(M2~) where phi is defined inductively from indexed lambda terms to (reduced) lambda terms by parallel reduction on the indexed redexes.)

Show that, in general for indexed reductions M ->> N,
we have a corresponding reduction phi(M) ->> phi(N). Apply this
fact here to show M1 (which *is* phi(M1~) where M1~ is
M1 since no redexes are indexed) reduces to M3.

To make this work, we must first introduce a notion of indexed lambda terms Lambda' where some redexes are indexed by natural numbers. Substitution is defined in the obvious way on the new system. A notion of reduction Beta' is the union of Beta0 (reduction of indexed redexes) and Beta1 (reduction of redexes without indices). Then we define phi:Lambda'->Lambda by induction and |-|:Lambda'->Lambda as the function which forgets indices. Then we prove the lemmas:

Given M', beta reductions |M'| ->> N lift to Beta' reductions M'->>N' with |N'|=N (easy for one-step, then handle transitivity by induction).

Given a Beta' reduction M'->>N', there is a beta reduction |M'| ->> |N'|.

Given any Beta' reduction M'->>N', there is a beta reduction phi(M') ->> phi(N'). This is the key fact in the proof of the strip lemma, and is proven as follows: First prove that phi(M'[x:=N']) = phi(M')[x:=phi(N')] by induction on M'. Then prove the result by induction on M'->>N' (Beta' reduction).

This proof does not generalize to beta-eta reduction. Consider the fact that ((lam x (M x))N) may be beta or eta reduced without leaving a residual of the other redex.

The finite development theorem (FD) states that the indexed notion of reduction Beta0 is strongly normalizing. In other words, if we index certain redexes, reduce them (and only them) until they have no residuals, then this process will terminate.

We can extend the function |-| to apply to reductions, sending Beta' reductions to Beta reductions by forgetting the indices. We can strengthen the lifting lemma to say that sig:|M'|->>N always lifts to sig':M'->>N' with |N'|==N and |sig'|==sig. Also, given any set F of redex occurrences in M, we can form (M,F) in Lambda' by giving the index 0 to the redex occurrences in F. Now, the set of residuals of F relative to a reduction sig:M->>N (written F/sig) is the set F' of redex occurrences of N such that sig':(M,F)->>(N,F') is the lifting of sig:M->>N.

A development of (M,F) is a *Beta* reduction in which
each redex is a residual of a redex in F
(equivalently, a reduction sig of M
which lifts to a Beta0 reduction of (M,F)).
A complete development is a development so
that F/sig is empty (no residuals remain). A development of M
is a development of (M,F_M) where F_M is the set of all redex
occurrences in M.

In order to prove FD, we assign a "special norm" for indexed terms M' which reduces at each Beta0 reduction. We do this by assigning weights to variables in a term M'. A decreasing weighting is one for which every indexed redex (lami x P)Q and every occurrence of x in P satisfies ||x|| > ||Q||. Every term has a decreasing weighting by assigning 2^n to the n'th occurrence of a variable counting right to left. We can define substitution and Beta* reduction on weighted terms in the obvious ways. It is not so clear that Beta* reducing a term with a decreasing weighting gives a term with a decreasing weighting, but we can check this (for one step) by considering the relative positions of redexes. Clearly, given such a weighting, contracting indexed redexes reduces the special norm of the term (i.e., the sum of the weights). From this it follows that any Beta0 reduction is terminating, i.e., SN(Beta0). So, we have the finiteness of developments.

It is easy to check that Beta0 is WCR, by considering the relative positions of redexes. In fact, we see by doing this that the reductions are just the reductions of the residuals of the other redex. (We can also easily check that Beta' is WCR.)

Now, WCR(Beta0) and SN(Beta0) implies CR(Beta0). And all the usual implications follow. The refined result is (FD!):

All developments of M are finite (we knew this).

All developments of (M,F) can be extended to a complete development of (M,F) (we knew this too).

All complete developments of (M,F) end with the same term (this is new! It follows from CR).

Given this, it makes sense to define Cpl(M,F) as the unique term we obtain by performing the complete development of (M,F). We may write M ->>1 N when Cpl(M,F) == N for some F.

Now, the transitive closure of ->>1 is ->> (beta), so to prove CR for beta (again!) it suffices to show ->>1 satisfies the diamond property. This we can do as follows: M ->>1 M1 and M ->>1 M2 implies there are F1 and F2 with Cpl(M,F1) == M1 and Cpl(M,F2) == M2. Now, let M3 := Cpl(M,F1 union F2) We may consider M1 and M2 to be partial developments of (M,F1 union F2), so both may be completed to give M3. QED.

It turns out that ->>1 is just parallel reduction (easy to prove by induction on ->>1 that M ->>1 N implies there is an F, and that given an F with Cpl(M,F) == N, there is a derivation of M ->>1 N). It also turns out that phi(M,F) == Cpl(M,F) (compare the definition of phi and parallel reduction). So, all three proofs of the CR theorem give the same M3.

The convervation theorem for lambda-I states that whenever M -> N and SN(N), then SN(M). For lambda-K, of course, we have obvious counterexamples such as KIOmega -> I. It follows that if M has a nf, then M is strongly normalizing (since nf's are obviously strongly normalizing).

First, given two redex occurrences Delta1 and Delta2 in an I-term M, we know Delta2/Delta1 is nonempty. Actually, to conclude this, we only need to know that Delta1 is an I-redex, not that M is an I-term. (Since we need the results in this greater generality later, I will describe them this way now.) In general, for F a set of I-redexes and Delta not in F, and sig a development of (M,F), we have Delta/sig nonempty.

We already know (by FD!) that if we reduce an
indexed redex (Beta0 reduction) and then complete the development,
then we get the same term as if we had simply completed
the development. For a Beta1 I-reduction Delta:M'->N', we can show
that Cpl(M') ->> Cpl(N') by a *nonempty* beta reduction.
This is true because we may consider Cpl(M') as
a (partial) development of (M,F union {Delta})
where M' == (M,F)
and Delta is not in F. Now, by above Delta has a residual in
Cpl(M'), and by (FD!) we may complete the development
to obtain
Cpl(N'), which is of course the
complete development of (M,F union {Delta}). (The reduction
is nonempty since Delta has a residual in Cpl(M')--all we
needed to know this was that F contains only I-redexes.)

The conservation theorem for lambda-I follows
since we can start with a reduction Delta:M->N and
an infinite reduction
M == M0 -> M1 -> M2 -> . . . (all of which must be I-reductions
since we are assuming M == M0 is an I-term).
Then we can lift these to (Mi,Fi) with F0 = {Delta}.
Each Fi is nonempty because the reductions are I-reductions.
Taking the complete developments, we obtain the beta reductions
N == N0 ->> N1 ->> N2 ->> . . .
Here Ni == Cpl(Mi,Fi). (Note that N == Cpl(M,{Delta}).)
In case Mi -> M{i+1} is a Beta1 reduction, we know
Ni ->> N{i+1} is a *nonempty* beta reduction.
In case Mi -> M{i+1} is a Beta0 reduction, we know
Ni == N{i+1}. But we may only
have Ni == N{i+1} a finite number of
times consecutively, because Beta0 is SN.

This result may be generalized to the lambda-K calculus, but the proof cannot. That is, given an I-reduction Delta:M->N, every infinite reduction of M cannot be projected to one of N, but at least one can (the one obtained using the Effective Perpetual Strategy).

A **standard reduction**
is a reduction sig = < M0,Delta_0,M1,Delta_1,M2,...>
where for each i and each j < i, Delta_i is not a residual
of a redex to the left of Delta_j. We can denote this
by M ->>s N.

Curry's **Standardization Theorem** states that
if M ->> N, then M ->>s N (for every reduction there is
a standard reduction). The idea of the proof is this:

*We can distinguish between head reductions and internal
reductions (reductions which are not head reductions).*
Let us write ->h, ->>h, ->i, and ->>i for the obvious
concepts. Let us furthermore use ->>1,i to denote
a reduction which is both a complete development and
internal. (Every ->i reduction is a ->>1,i reduction
by taking only one redex as indexed.)

*We can factor any complete development into ->>h ; ->>1,i.*
(Simply contract the head redexes as long as they are indexed,
then complete the development.)

*We can commute ->i ; ->>h to give ->>h ; ->>i.*
To show this, we first show we can commute ->>1,i ; ->h
(one step!) to give ->>h ; ->>1,i. Clearly, if we have
this we can show the original result (since ->i is
a special case of ->>1,i) by induction on the length of ->>h.
To prove this, start with M ->>1,i M' ->h N' and let F be
the set of redex occurrences so that M' == Cpl(M,F).
Let Delta be the head redex of M. Since every reduction in
M ->>1,i M' is internal, Delta has exactly one residual in M',
and it is the head redex of M'. Hence, M' ->h N' is the
reduction of this residual. So, we have M ->>1 N' with
F union {Delta}. Now, we can factor this into M ->>h N ->>1,i N'
by the result above.

*Given M ->> N, there is an M ->>h M' ->>i N.*
We can show this by induction on the length of ->>, using
the previous result to turn ->i ->>h ->>i into ->>h ->>i.

*Proof of Standardization!* Start with M ->> N.
Prove the result by *induction on the construction of N*.
First convert M ->> N
into M ->>h Z ->>i N.
Clearly, M ->>h Z is standard since
every head redex is leftmost.
If N is a variable, then Z == N == x and we are done.
Otherwise, N is of the form lambda x-bar N-bar and
Z must be of the form lambda x-bar Z-bar weare each Zi ->> Ni
via sig_i. Let sig:M ->>h Z. Now, sig + sig_0 + . . . + sig_n
is standard!

It follows that a term has a hnf iff its head reduction terminates (recall we needed this to define principal hnf's). (Suppose we have M = (lambda x-bar . y M-bar). Then by CR M ->> (lambda x-bar . y N-bar). We can standardize this reduction to M ->>s (lambda x-bar . y N-bar). The beginning of this is a head reduction and the end is an internal reduction. As an internal reduction leaves terms in hnf in hnf and terms not in hnf not in hnf, we know that the head reduction must have terminated in a hnf (lambda x-bar . y Z-bar).

Let A := (lambda x.(lambda y.x)I) and watch the redex (lambda y.x)I. Look at this reduction: (lambda a.1aa)A -> 1AA -> (lambda x.Ax)A -> (lambda x.(lambda y.x)I)A -> (lambda y.A)I. Now, the original redex has two residuals, one inside the other. Note that this cannot happen with weak reduction in CL (as can be proven by an easy induction).

**Chapter 13: Reduction Strategies.**

The **leftmost reduction
strategy** is a one step strategy defined by
Fl(M) := M if M is in nf and Fl(M) := M' where M' is
obtained by contracting the leftmost redex in M.
A leftmost reduction is an initial segment of an Fl-path.

**Normalization Theorem.**
Fl is an effective normalizing 1-strategy (i.e., one-step strategy).
In fact, it turns out that given M ->> N with N a nf, we
can take "a" standard reduction M ->>s N and this must
be the leftmost reduction. If at some point we did not reduce
the leftmost, then the redex would never get contracted and N
would not be in normal form.

We can use this to show a term does not have a normal form. For example, let omega := (lambda bxz.z(bbx)) and A := omega omega x. The leftmost reduction path does not terminate (A ->> ->> <> ->> . . .) so A has no normal form by the normalization theorem.

Now, we can show that if a term has an infinite quasi leftmost reduction, then it has no normal form. (Recall we needed this to show closure under minimization of lambda definable partial functions.) We can show this by showing how to change an infinite quasi leftmost reduction into an infinite leftmost reduction.

First, distinguish between two reductions ->l and ->=/=l where, of course, ->l is one step leftmost reduction and ->=/=l is a one step reduction which is not leftmost. We also have the reflexive transitive closures ->>l and ->>=/=l.

We will be done if we show we can convert a reduction ->>=/=l ; ->>l with the leftmost reduction sequence nonempty into one of the form ->>l ; ->>=/=l with the leftmost reduction sequence, again, nonempty. Why? Because we can start with an infinite quasi leftmost reduction and keep applying this result to extend the length of the initial leftmost part. Then we can union all these finite leftmost reductions to obtain an infinite leftmost reduction.

So, suppose we have M ->>=/=l M' ->>l N' with M' ->>l N' nonempty. Since the reduction is nonempty, we may write it as M' ->l N0' ->>l N'. Now, if we can change M ->>=/=l M' ->l N0' into M ->l Z ->> N0', then we will be done. Why? Because we can standardize the reduction Z ->> N0' ->>l N' and factor this standard reduction into a leftmost part and a non-leftmost part as in Z ->>l N ->>=/=l N' (once we reduce a redex which is not leftmost, contracting a leftmost redex would violate the definition of a standard reduction). Then we have M ->l Z ->>l N ->>=/=l N' as desired.

So, suppose we have M ->>=/=l M' ->l N' (writing N' instead of N0'). Also, take M ->l N and let M == C[(lam x . P) Q] where (lam x . P)Q is the leftmost redex. So, N is C[P[x:=Q]]. Since the reduction M ->>=/=l M' cannot affect the position of this redex (nor can it reduce it, M' must be of the form C'[(lam x . P')Q'] where C'[-] ->> C[-], P ->> P', and Q ->> Q'. Well, that means (lam x . P')Q' is the leftmost redex of M', so that N' must be C'[P'[x:=Q]]. Since P[x:=Q] ->> P'[x:=Q'] using the substitutivity of Beta, we have N ->> N'.

The leftmost strategy is not cofinal. Consider this: Omega(KII) -> Omega I, but the leftmost reductions are all Omega(KII) -> Omega(KII). So, Fl^n(Omega(KII)) = Omega(KII) for all n.

The Gross-Knuth reduction strategy is given by Fgk(M) = Cpl(M,F_M). That is, we perform a complete development on all redex occurrences in M. This is obviously not a one step strategy. It is, however, effective and cofinal.

To prove the strategy is cofinal, perform induction on the transitive closure of the complete development reduction ->>1 (this is the same as ->> of course). A zero step reduction is trivial, we clearly have a zero step reduction to Fgk^0(M) == M. The induction hypothesis will be that whenever we have an n step ->>1 reduction of M to Mn, then we have an n step ->>1 reduction from Mn to Fgk^n(M) (notice how we are controlling the lengths as in Barendregt's diagram shown in Figure 13.5). Now, suppose we have an n+1 reduction M ->>1 M1 ->>1 . . . ->>1 Mn ->>1 L. By induction we have a ->>1 reduction from Mn to Fgk^n(M) in n steps. We need an n+1 step ->>1 reduction from L to Fgk(Fgk^n(M)). To obtain this we really need a "strip lemma".

So, we want to show that whenever we have P ->>1 L, P == P0 ->>1 P1 ->>1 . . . ->>1 Pn, and N = Cpl(Pn), then there is an n+1 step ->>1 reduction from L to N. For the n = 0 case, use the fact that we can complete the partial development P0 ->>1 L to P0 ->>1 L ->>1 N, giving the one step ->>1 reduction. In the n+1 case, use the fact that ->>1 satisfies the diamond property to obtain P1 ->>1 L1 and L ->> L1. Now, by induction we have an n+1 step ->>1 reduction from L1 to N and we are done.

In fact, any quasi Gross-Knuth reduction path of M is cofinal in G(M). We can prove this by induction using the fact that Fgk is cofinal, the fact that we can complete partial developments in one ->>1 step, and the fact that ->>1 satisfies the diamond property.

Application: Suppose we have a quasi Gross-Knuth reduction
of M and x is free in every Mi.
Then, x must be beta-free in M (that is, for every N = M, x is
free in N). (To prove this, use CR and cofinality.)
It follows that if x is not free in N, then N is not
beta-equal to M. Consider A := Theta(lam ax.

The Gross-Knuth reduction is not CR. Let omega3 := lam x.xxx. Let M := (omega3omega3)(omega3omega3) and N := (omega3omega3)(omega3omega3omega3). We have M -> N in one beta step, so that M = N. We also have M ->>gk (omega3omega3omega3)(omega3omega3omega3) ->>gk . . . and N ->>gk (omega3omega3omega3)(omega3omega3omega3omega3) ->gk . . . It is easy to see that we never have Fgk^m(M) == Fgk^n(N).

There is a recursive (as opposed to effective) CR strategy. This can be constructed as the limit of finite partial CR reduction strategies.

A lambda term is minimal if everything beta equal to it reduces to it. Clearly normal forms are minimal. Also, Omega is minimal. In fact, every minimal term is either a normal form or has no normal form.

A partial reduction strategy is a partial function H
on lambda terms so that if M is in Dom(H), then M ->> H(M).
A partial reduction strategy H is CR if whenever M and N
are in Dom(H) and M = N, then there are p and q so that
each H^i(M) and H^j(N) are in Dom(H) for i<=p and j<=q
and so that H^p(M) == H^q(N). An H-cycle is a finite
sequence

What should a CR-strategy satisfy? If F is a CR-strategy, then every F-cycle must be minimal, and we cannot have two distinct linked F-cycles.

As indicated above, we will construct F := F_CR by finite partial approximations. First, notice that for each n we can compute a finite set E_n of lambda terms so that if M and N are in E_n, then M = N, and if M = N, then there is some n so that M and N are in E_n. (Take an effective listing {M0, M1, M2, . . .} of the lambda terms and a bijection <-,-,-> between N^3 and N. Define a norm on lambda terms by ||M||1 := ||M|| + the sum of i where vi is free in M. Let E_<n0,n1,n2> be the set of all terms N with ||N||1 <= n0 and so that Mn2 and N reduce to a common term in <= n1 steps.)

Notice that given any finite set E of lambda terms we can recursively decide if an M in E reduces to a term N not in E (keep reducing until every path is a cycle in E or we reach an N not in E). Now, assuming we know M does not reduce to an N not in E, then the reduction graph of M is finite (since it is a subset of E), so we can recursively decide if there is an H-cycle in E linked to a term M in E (where H is a finite partial strategy). (To do this, compute all the H-cycles and G(M) and check for the intersection. Obviously this may not be a very "effective" procedure.)

Now, to construct F_CR, construct a sequence of finite partial CR strategies F0 <= F1 <= F2 <= . . . We will define the Fk by induction, mutually with the definition of finite sets Gk of lambda terms. Let G0 := empty set and F0 := empty set. We will at each step let G(k+1) be the closure of Ek under Fk. We preserve four properties at each step:

1. Dom(Fk) superset of Gk.

2. Fk restricted to Gk is a partial CR strategy. (Notice that since the terms in Gk are beta-equal, this says that for any two terms M and N in Gk, there are m and n so that Fk^m(M) == Fk^n(N).)

3. Every Fk-cycle is minimal.

4. There are no distinct linked Fk-cycles.

This is trivially satisfied for the base case. We define F(k+1) to have domain G(k+1) union Dom(Fk). Note that G(k+1) is a set of beta-equal terms. Let A(k+1) := G(k+1) - Dom(Fk) (the new terms in the domain). In order to have Fk <= F(k+1), we must let F(k+1)(M) := Fk(M) for M in Dom(Fk). It remains to define F(k+1)(M) for M in A(k+1) so as to preserve the properties above. [Note that A(k+1) might be empty.]

If there is an Fk-cycle in G(k+1), say,

If there is no Fk-cycle in G(k+1), then we first try to reduce our way out of the domain. Start by computing a common reduct A' of A(k+1). This is possible because we know (by CR) there is a common reduct and A(k+1) is finite. However, there appears to be no bound on how long this can take. In any case, we can recursively find an A', though it may not be in G(k+1). We can decide if A' has a reduct A not in Dom(F(k+1)), and if it does, then we let F(k+1)(M) := A for every M in A(k+1). This is still a strategy since every such M ->> A. Also, we have added no new cycles, so the third and fourth properties are trivially preserved. Again, we must check CR: Consider M in G(k+1) and Dom(Fk). Since there is no Fk-cycle in G(k+1), we must have some iterate Fk^m(M) in A(k+1) (i.e., not in Dom(Fk)). So, F(k+1)^{m+1}(M) == A, and we are done.

Next, suppose we have the A' from above and G(A') is contained in Dom(F(k+1)). We can decide if there is an Fk-cycle <L0,...,Lp> linked to A' (i.e., in the graph G(A')). If there is, then let F(k+1)(M) := L0 for every M in A(k+1). Again, by minimality, we have M ->> L0. Again, we've added no new cycles. We have CR as before because every M in G(k+1) and Dom(F(k)) has an iterate Fk^m(M) in A(k+1).

Finally, if G(A') is contained in Dom(F(k+1)) and A' is not linked to any cycle, then choose a minimal B in G(A') (which exists because G(A') is finite). Let F(k+1)(M) := B for every M in A(k+1). By minimality of B we have M ->> B. This time we may have added a cycle, so we should check the third and fourth properties. If we did introducke a new cycle, then it must be of the form <B,B1,...,Bq>. This is minimal since B is minimal. Also, it must not be linked to any other F(k+1)-cycles since such a cycle would be an Fk-cycle and we would have A' linked to an Fk-cycle--which was the previous case. Again, CR follows because every M in G(k+1) and Dom(F(k)) has an iterate Fk^m(M) in A(k+1).

Taking the union of the Fn, we have the recursive CR strategy F_CR, as is easily checked from the fact that eac Fn restricted to Gn is CR.

A perpetual strategy gives an infinite reduction if there is one. We can define an infinite perpetual strategy Finf by taking:

Finf(M) := M if M is in nf;

Otherwise, writing M == C[(lam x.P)Q] where the redex indicated is leftmost:

Finf(C[(lam x.P)Q]) := C[P[x:=Q]] if this is an I-redex;

Finf(C[(lam x.P)Q]) := C[P] if Q is in nf and this is a K-redex;

Finf(C[(lam x.P)Q]) := C[(lam x.P)Finf(Q)] if Q is not in nf and this is a K-redex

To describe this more precisely, we define the re of a redex (lam x.P)Q to be (lam x.P) and the dex to be Q. The "derived term" of M (not in nf) is the dex of the leftmost redex in M (written M+). The derived sequence of a term M is the finite sequence M0,...,Mn with M0 := M and M(k+1) := (Mk)+ if Mk is not in nf. The leftmost redex Ri of Mi is called the "special redex of order i" of M.

So, Finf contracts the first I-redex in the sequence of special redexes. If there is no I-redex, it contracts the last.

Finf is an effective perpetual one-step strategy. To prove this, we need to show that whenever M has an infinite reduction, so does Finf(M). First, we need a lemma which will be applied twice in the proof.

Suppose we have M == C[(lam x.P)Q] where the leftmost redex is indicated and is a K-redex. If M has an infinite reduction, but Q does not, then C[P] has an infinite reduction path. We can prove this by converting an infinite reduction C[(lam x.P)Q] -> C1[(lam x.P1)Q1] -> . . . -> Ck[(lam x.Pk)Qk] -> . . . into C[P] ->> C1[P1] ->> . . . ->> Ck[Pk] ->> . . . if the leftmost redex is never contracted (this is infinite because the reduction Q ->> Q1 ->> Q2 ->> . . . is finite). If it is contracted, then we convert C[(lam x.P)Q] -> C1[(lam x.P1)Q1] -> . . . -> Ck[(lam x.Pk)Qk] -> Ck[Pk] (K-redex) -> . . . into C[P] ->> C1[P1] ->> . . . ->> Ck[Pk] ->> . . .

We can prove that Finf is perpetual by induction on M, but we only need the induction hypothesis in one case (when the leftmost redex is a K-redex and Q is not in nf).

Suppose we have an infinite reduction M == M0 -> M1 -> . . .

If M is in nf, then it cannot have an infinite reduction and we are done. So, we must have M == C[(lam x.P)Q] where we indicate the leftmost redex.

Suppose the leftmost redex is an I-redex. We either have the leftmost redex is never contracted in the infinite reduction, or it is. If it is not, then every Mi is of the form Ci[(lam x.Pi)Qi] and we must have an infinite reduction of C[z] or P or Q. In any case, we have an infinite reduction of C[P[x:=Q]] (since x is free in P). If at some point we did contract the left most redex, then we have C[(lam x.P)Q] ->> Ck[(lam x.Pk)Qk] -> Ck[Pk[x:=Qk]] ->> . . . Since we have C[P[x:=Q]] ->> Ck[Pk[x:=Qk]] we also have an infinite reduction in this case (not even using the I-redex hypothesis).

Suppose the leftmost redex is a K-redex. If Q is in normal form, then by the lemma, C[P] has an infinite reduction.

Finally, suppose Q is not in nf. If it has no infinite reduction, the C[P] has an infinite reduction by the lemma (again) and so we have an infinite reduction Finf(C[(lam x.P)Q]) == C[(lam x.P)Finf(Q)] -> C[P] -> . . .

If Q does have an infinite reduction, then Finf(Q) has an infinite reduction by induction. So, Finf(C[(lam x.P)Q]) == C[(lam x.P)Finf(Q)] has an infinite reduction.

We can use Finf to prove the general conservation theorem for lambda-K. Recall that for the lambda-I calculus, we could prove the conservation theorem by projecting any infinite reduction down to the reduced term. This fails in the lambda-K calculus (even when the redex is an I-redex). Here is an example:

Consider (lam x.KIx)Omega -> (lam x.I)Omega -> (lam x.I)Omega -> . . . and the I-redex contraction (lam x.KIx)Omega -> KIOmega. If we try to project the infinite reduction sequence we get the finite reduction sequence KIOmega -> I ->> I ->> I ->> . . . This is an example of a nonprojectible sequence.

To prove the general conservation theorem, we consider an I-indexed
lambda calculus in which only I-redexes may be indexed.
Note that special redexes are not in the re part of any redex.
It follows that contracting a special redex cannot cause an
I-redex to become a K-redex (i.e., have a K-redex as a residual).
So, any contraction of a special redex can be lifted to a
Beta' contraction of the I-indexed calculus.
As
before, we know that completions performed
after Beta0 reductions in the I-indexed
calculus give the same result as completions performed
before the Beta0 reductions. Similarly, a Beta1 reduction
followed by a completion factors into a completion and a
*nonempty* beta reduction.

The general conservation theorem follows. Start with M -> M' obtained by contracting an I-redex, where M has an infinite reduction. We can take the infinite reduction given by Finf, so that every redex contracted is special. Then we can use the lemmas above to lift this to an infinite I-reduction and use complete developments (and the fact that SN(Beta0)) to project it to an infinite reduction of M'.

A redex R with contractum R' is perpetual if whenever C[R] has an infinite reduction path, then so does C[R']. The general conservation theorem states that all I-redexes are perpetual. Bergstra and Klop proved that K-redexes (lam x.M)N are perpetual iff for every substitution instances M* and N* where N* has an infinite reduction, M* must have an infinite reduction as well.

Given a reduction strategy F, we can measure the length of the F path of M as the least n with F^n(M) in nf. We can also measure the breadth by the largest ||F^n(M)|| in the path.

There are obvious L-optimal, L-1-optimal, B-optimal, and B-1-optimal strategies. The L-optimal and B-optimal strategies essentially jump to the nf (and doing something reasonable in case there is no nf). The L-1-optimal strategy uses a shortest reduction path to the nf, and the B-1-optimal strategy uses a reduction path to the nf using the smallest terms. Of course, these are not recursive.

There is no recursive L-optimal strategy because this would allow us to decide which terms have normal forms. There is no recursive B-optimal strategy because we could use such a strategy to decide the set of terms M which have a nf of smaller size.

There is also no recursive L-1-optimal strategy, since we could use one to separate the set of terms beta-equal to lam xy.I from the set of terms beta-equal to the variable a.

There is no one step strategy which is both L-1-optimal and B-1-optimal. We can see this by considering the term M := (lam xy.pxx(yI))((lam x.pxx)A)I where A is sufficiently large. The shortest path must contract the internal redex first in order to avoid duplicating the internal redex. The path with the terms of smallest size must contract the leftmost redex first (as can be seen by writing out the two possible reductions).

**Appendix A: Typed Lambda Calculus.**
The typed lambda calculus consists of types inductively
generated from a base type 0 by "arrow" (where the intended
interpretation is a function space), infinitely many variables
of each type, and terms constructed from variables,
application (where the types must match), and abstraction.
The terms are partitioned into sets Lambda_tau for each type tau.
The formulas are equations between terms of the same type.
The axioms and rules of inference are the restricted versions
of those for
the untyped lambda calculus (beta + congruence
+ equivalence). We may also add eta as an axiom scheme.

Typed Combinatory Logic has primitive constants

K:sig -> (tau -> sig)

S:(sig -> (tau -> rho)) -> ((sig -> tau) -> (sig -> rho))

for each type sig, tau, and rho. (To remember the types, think Curry-Howard.) We also have the usual axiom schemes for K and S. The typed lambda calculus with extensionality and typed CL are equivalent. There is also a set of axioms A_beta so that CL+A_beta is equivalent to the typed lambda calculus.

We can define beta and eta reduction for the typed lambda calculus and show they are CR. Furthermore, in the typed lambda calculus we have strong normalization (as a special case of SN for reduction in Godel's T) It follows that every typed term has a nf. So, we cannot have fixed point operators and equality is decidable (reduce to normal form and check for equality up to alpha conversion--this process is not Kalmar Elementary [Statman]).

Some type free terms can be typed. That is, there is a typed term M so that if we erase the types we get the untyped term |M| with which we, presumably, started. The set of typable lambda terms is decidable, and we can effectively find a unique principal type scheme so that every type is an instance of this scheme.

The Church numerals can be typed. The set of numeric functions definable in the typed lambda calculus over the Church numerals consist precisely of the extended polynomials. That is, the least class of numeric functions containing projections, containing the function sg where sg(0) = 0 and sg(n+1) = 1, and closed under sums and products of functions.

Let M be a set of domains A_sig for each type sig and maps ._sig,tau:A_{sig->tau} x A_sig -> A_tau (application) for each types sig and tau. M is a combinatory type structure if A_0 has more than one element and there are elements k and s of each appropriate type satisfying kxy = x and sxyz = (xz)(yz) for all x, y, and z in the appropriate domains. An extensional type structure satisfies xz = x'z for all z implies x = x' for all x and x'

Here are some examples of models (or ways to construct them):

*Standard Models--Full Type Structures.*
Given any set X we can define the full type structure M(X) over
X to be A_0 := X and A_{sig -> tau} := A_tau^A_sig (the
full function space). This is clearly extensional. We can
perform this same process in any CCC (we assume Hom(X,X) has
more than one element). In a general CCC, we may not get an
extensional model. However, Zucker proved that from every
combinatory type structure M one can obtain an extensional
model M^E by forming the "extensional collapse" of M (mod
out function domains by extensional equality).

*Hereditarily Recursive Operations.*
Write n.m for {n}(m). Let A_0 be the set of natural
numbers N. Let A_{sig->tau} := {n in N | forall m in A_sig
n.m is defined and in A_tau}. This gives the combinatory type
structure HRO, but it is clearly not extensional. (Note
that we can find k and s independent of types by the s-m-n
theorem.)
We can take its extensional collapse HEO (Hereditarily
Effective Operations), however. Notice once we do this
we are essentially talking about a PER model.

*Term Models.* We have open term models, but
not closed term models, since there are no closed
terms of type 0.

With respect to models, we have the following completeness results:

M1 = M2 (for open terms) in typed lambda calculus with extensionality iff M(N) |= M1 = M2 where N is the set of natural numbers. (It follows that M(N) |= M1 = M2 as a relation on typed lambda terms is decidable. Friedman showed the relation M(N) |= M1 =/= M2 is not arithmetical. Note these relations are not complements because we are allowing open terms. So, essentially what we are judging is forall x-bar M(N) |= M1 = M2 in the first case and the second is forall x-bar M(N) |= M1 =/= M2.)

For every M1 and M2 there is an n such that M1 = M2 in typed lambda calculus with extensionality iff M1 = M2 in M({0,...,n}) |= M1 = M2.

We can consider term models F0(X) where we restrict the free variables to a finite set X (with at least two of type 0) and their extensional collapse F(X).

Statman proved that if X contains at least one variable of the types 0, 0->0, 0->0->0, . . ., then F(X)|= M = N iff M and N are equal in the extensional typed lambda calculus. Also, if X contains only type 0 variables (and at least two of them), then M = N is consistent with the extensional typed lambda calculus iff F(X)|= M=N (no more identifications can be made).

A pairing for types sig and tau is a type rho and a triple D:sig->tau->rho, D1:rho->sig, D2:rho->tau satisfying the obvious two equations. It is surjective if it satisfies the surjective pairing equation. Barendregt proved that there is no pairing for general types sig and tau in the typed lambda calculus. For the extensional typed lambda calculus, on the other hand, there is a pairing, but there is no surjective pairing for any types sig and tau (since there is cannot be a surjective pairing in the untyped case).

Godel's T is the typed lambda calculus with constants 0:0, S+:0->0, and R_sig:sig->(sig->0->sig)->0->sig at each type sig. (The fact that we have a recursion operator at each type sig allows us to define functions which are not primitive recursive, such as the Ackermann function.) We also have the equations RMN0 = M and RMN(S+x) = N(RMNx)x. Teta is T with eta.

There is a strongly normalizing, CR notion of reduction corresponding to provability in T and Teta. (Note that this result implies SN of the typed lambda calculus as a special case.) The proof of SN uses the logical relations technique: Define a set of ("computable") closed terms for each type by induction on types. Extend this to a set of open terms by checking that every closed instance is in the original set. Show by induction on types that every "computable" term is SN and that every constant zero function is "computable". Then show by induction on terms that every term is "computable". [Barendregt does not use the term "computable", but Lambek and Scott do.]

An example of a function definable in T but not p.r. (a simplification of the Ackermann function) is given by psi(0,n):=n+1, psi(m+1,0):=psi(m,1), and psi(m+1,n+1):=psi(m,psi(m+1,n)). This is represented in T by defining the functionals psi(m,-) by recursion in m, where each psi(m+1,n) is defined by recursion in n. Suppose Fm represents psi(m,-). Then psi(m+1,-) is represented by R[Fm1][lambda ab.Fma] (induction on n). Let G be [lambda f.R[f1][lambda ab.fa]]. Now, psi is represented by RS+[lambda fb.Gf] (induction on m).

For any ordinal alpha <= epsilon0, an alpha-recursive function is one definable by transfinite recursion up to alpha. The T-definable theorems are exactly the < epsilon0 recursive functions.

Models for Godel T are combinatory type structures with 0, s+, and r satsifying the axioms. omega-models contain only the representation of the naturals at type 0. The (open) term model is an extensional model of T. HRO, the closed term model (we have closed terms of every type here), and the interior of the full type structure over N are nonextensional omega-models. HEO is an extensional omega-model.

Friedman proved the relation M(N)|= M=N is complete Pi^1_1.

Godel's T can be extended to include bar recursion. While Godel used his system to prove consistency of arithmetic, Spector used the extended system to prove consistency of analysis.

Godel's T can also be extended by including a constructor for infinite sequences of terms which are destructed by the numerals.

Extending Godel's T with logic and the Peano axioms gives intuitionistic arithmetic in all finite types.

Scott created LCF ("Logic for Computable Functions") which extends T to include product and disjoint sum types, fixed point operators at each type, and predicate logic. Complete partial orders give models for LCF.

Curry noticed the types of K and S were valid formulas of the implication calculus. Tait noticed the connection between beta-reduction and cut elimination. Howard gave a correspondence between typed terms and derivations--giving the formulae-as-t{pes notion (Curry-Howard Correspondence). DeBruijn independently introduced Automath which also uses the formulae-as-types notion. Girard extended the formulae-as-types notion to include second order quantification (corresponding to polymorphic type abstraction). Girard obtained a proof of cut elimination and analysis by proving SN for his system. The Curry-Howard correspondence has given rise to connections between typed lambda-calculus, proof theory, and category theory.