# Defining computation

- See that computation can be precisely modeled.
- Learn the NAND computational model.
- Comfort switching between description of NAND programs as
*code*and as*tuples*. - Begin acquiring skill of translating informal algorithms into NAND code.

“there is no reason why mental as well as bodily labor should not be economized by the aid of machinery”, Charles Babbage, 1852

“If, unwarned by my example, any man shall undertake and shall succeed in constructing an engine embodying in itself the whole of the executive department of mathematical analysis upon different principles or by simpler mechanical means, I have no fear of leaving my reputation in his charge, for he alone will be fully able to appreciate the nature of my efforts and the value of their results.”, Charles Babbage, 1864

“To understand a program you must become both the machine and the program.”, Alan Perlis, 1982

People have been computing for thousands of years, with aids that
include not just pen and paper, but also abacus, slide rulers, various
mechanical devices, and modern electronic computers. A priori, the
notion of computation seems to be tied to the particular mechanism that
you use. You might think that the “best” algorithm for multiplying
numbers will differ if you implement it in *Python* on a modern laptop
than if you use pen and paper. However, as we saw in the introduction,
an algorithm that is asymptotically better would eventually beat a worse
one regardless of the underlying technology. This gives us hope for a
*technology independent* way of defining computation, which is what we
will do in this lecture.

## Defining computation

The name “algorithm” is derived from the Latin transliteration of Muhammad ibn Musa al-Khwarizmi, who was a Persian scholar during the 9th century whose books introduced the western world to the decimal positional numeral system, as well as the solutions of linear and quadratic equations (see Reference:alKhwarizmi). Still his description of the algorithms were rather informal by today’s standards. Rather than use “variables” such as \(x,y\), he used concrete numbers such as 10 and 39, and trusted the reader to be able to extrapolate from these examples.Indeed, extrapolation from examples is still the way most of us first learn algorithms such as addition and multiplication, see Reference:childrenalg)

Here is how al-Khwarizmi described how to solve an equation of the form \(x^2 +bx = c\):Translation from “The Algebra of Ben-Musa”, Fredric Rosen, 1831.

[How to solve an equation of the form ] “roots and squares are equal to numbers”: For instance “one square , and ten roots of the same, amount to thirty-nine dirhems” that is to say, what must be the square which, when increased by ten of its own root, amounts to thirty-nine? The solution is this: you halve the number of the roots, which in the present instance yields five. This you multiply by itself; the product is twenty-five. Add this to thirty-nine’ the sum is sixty-four. Now take the root of this, which is eight, and subtract from it half the number of roots, which is five; the remainder is three. This is the root of the square which you sought for; the square itself is nine.

For the purposes of this course, we will need a much more precise way to
define algorithms. Fortunately (or is it unfortunately?), at least at
the moment, computers lag far behind school-age children in learning
from examples. Hence in the 20th century people have come up with exact
formalisms for describing algorithms, namely *programming languages*.
Here is al-Khwarizmi’s quadratic equation solving algorithm described in
the Python programming language:For concreteness we will sometimes include code of actual
programming languages in these notes. However, these will be simple
enough to be understandable even by people that are not familiar
with these languages.

```
def solve_eq(b,c):
# return solution of x^2 + bx = c using Al Khwarizmi's instructions
val1 = b/2.0 # halve the number of the roots
val2 = val1*val1 # this you multiply by itself
val3 = val2 + c # Add this to thirty-nine (c)
val4 = math.sqrt(val3) # take the root of this
val5 = val4 - val1 # subtract from it half the number of roots
return val5 # This is the root of the square which you sought for
```

## The NAND Programming language

We can try to use a modern programming language such as Python or C for
our formal model of computation, but it would be quite hard to reason
about, given that the Python language
reference has more than 100
pages. Thus we will define computation using an extremely simple
“programming language”: one that has only a single operation. This
raises the question of whether this language is rich enough to capture
the power of modern computing systems. We will see that (to a first
approximation), the answer to this question is **Yes**.

We start by defining a programming language that can only compute
*finite* functions. That is, functions \(F\) that map \(\{0,1\}^n\) to
\(\{0,1\}^m\) for some natural numbers \(m,n\). Later we will discuss how to
extend the language to allow for a single program that can compute a
function of every length, but the finite case is already quite
interesting and will give us a simple setting for exploring some of the
salient features of computing.

The *NAND programming language* has no loops, functions, or if
statements. It has only a single operation: `NAND`

. That is, every line
in a NAND program has the form:

`foo := bar NAND baz`

where `foo`

, `bar`

, `baz`

are variable names.The terms `foo`

and `bar`

are often
used to describe generic
variable names in the context of programming, and we will follow
this convention throughout the course. See the appendix and the
website http://nandpl.org for a full specification of the NAND
programming language. When this line is
executed, the variable `foo`

is assigned the *negation of the logical
AND* of (i.e., the NAND operation applied to) the values of the two
variables `bar`

and `baz`

.The *logical AND* of two bits \(x,x'\in \{0,1\}\) is equal to \(1\) if
\(x=x'=1\) and is equal to \(0\) otherwise. Thus its negation satisfies
\(NAND(0,0)=NAND(0,1)=NAND(1,0)=1\), while \(NAND(1,1)=0\). If a
variable hasn’t been assigned a value, then its default value is
zero.

All variables in the NAND programming language are *Boolean*: can take
values that are either zero or one. Variables such as `x_22`

or `y_18`

(that is, of the form `x_`

\(\expr{i}\) or `y_`

\(\expr{i}\) where \(i\) is a
natural number) have a special meaning.In these lecture notes, we use the convention that when we write
\(\expr{e}\) then we mean the numerical value of this expression. So
for example if \(k=10\) then we can write `x_`

\(\expr{k+7}\) to mean
`x_17`

. This is just for the notes: in the NAND programming language
itself the indices have to be absolute numerical constants. The variables beginning with
`x_`

are *input* variables and those beginning with `y_`

are *output*
variables. Thus for example the following four line NAND program takes
an input of two bits and outputs a single bit:

```
u := x_0 NAND x_1
v := x_0 NAND u
w := x_1 NAND u
y_0 := v NAND w
```

Can you guess what function from \(\{0,1\}^2\) to \(\{0,1\}\) this program computes? It might be a good idea for you to pause here and try to figure this out.

To find the function that this program computes, we can run it on all
the four possible two bit inputs: \(00\),\(01\),\(10\), and \(11\).

For example, let us consider the execution of this program on the input
\(00\), keeping track of the values of the variables as the program runs
line by line. On the website http://nandpl.org we can run NAND
programs in a “debug” mode, which will produce an *execution trace* of
the program.At present the web interface is not yet implemented, and you can
run NAND program using an OCaml interpreter that you can download
from that website. The implementation is in a fluid state and so the
text below might not exactly match the output of the interpreter. When we run the program above on the input \(01\), we get
the following trace:

```
Executing step 1: "u := x_0 NAND x_1" x_0 = 0, x_1 = 1, u is assigned 1,
Executing step 2: "v := x_0 NAND u" x_0 = 0, u = 1, v is assigned 1,
Executing step 3: "w := x_1 NAND u" x_1 = 1, u = 1, w is assigned 0,
Executing step 4: "y_0 := v NAND w" v = 1, w = 0, y_0 is assigned 1,
Output is y_0=1
```

On the other hand if we execute this program on the input \(11\), then we get the following execution trace:

```
Executing step 1: "u := x_0 NAND x_1" x_0 = 1, x_1 = 1, u is assigned 0,
Executing step 2: "v := x_0 NAND u" x_0 = 1, u = 0, v is assigned 1,
Executing step 3: "w := x_1 NAND u" x_1 = 1, u = 0, w is assigned 1,
Executing step 4: "y_0 := v NAND w" v = 1, w = 1, y_0 is assigned 0,
Output is y_0=0
```

You can verify that on input \(10\) the program will also output \(1\), while on input \(00\) it will output zero. Hence the output of this program on every input is summarized in the following table:

Input | Output |
---|---|

\(00\) | \(0\) |

\(01\) | \(1\) |

\(10\) | \(1\) |

\(11\) | \(0\) |

In other words, this program computes the

*exclusive or*(also known as XOR) function.

### Adding one-bit numbers

Now that we can compute XOR, let us try something just a little more ambitious: adding a pair of one-bit numbers. That is, we want to compute the function \(ADD_1:\{0,1\}^2\rightarrow\{0,1\}^2\) such that \(ADD(x_0,x_1)\) is the binary representation of the addition of the two numbers \(x_0\) and \(x_1\). Since the sum of two \(0/1\) values is a number in \(\{0,1,2\}\), the output of the function \(ADD_1\) is of length two bits.

If we write the sum \(x_0+x_1\) as \(y_02^0 + y_12^1\) then the table of values for \(ADD_1\) is the following:

Input | Output | ||
---|---|---|---|

`x_0` |
`x_1` |
`y_0` |
`y_1` |

\(0\) | \(0\) | \(0\) | \(0\) |

\(1\) | \(0\) | \(1\) | \(0\) |

\(0\) | \(1\) | \(1\) | \(0\) |

\(1\) | \(1\) | \(0\) | \(1\) |

One can see that

`y_0`

will be the XOR of `x_0`

and `x_1`

and `y_1`

will
be the AND of `x_0`

and `x_1`

.This is a special case of the general rule that when you add two
digits \(x,x' \in \{0,1,\ldots,b-1\}\) over the \(b\)-ary basis (in our
case \(b=2\)), then the output digit is \(x+x' (\mod b)\) and the carry
digit is \(\lfloor (x+x')/b \rfloor\). Thus we can compute one bit variable
addition using the following program:
```
// Add two single-bit numbers
u := x_0 NAND x_1
v := x_0 NAND u
w := x_1 NAND u
y_0 := v NAND w
y_1 := u NAND u
```

If we run this program on the input \((1,1)\) we get the execution trace

```
Executing step 1: "u := x_0 NAND x_1" x_0 = 1, x_1 = 1, u is assigned 0,
Executing step 2: "v := x_0 NAND u" x_0 = 1, u = 0, v is assigned 1,
Executing step 3: "w := x_1 NAND u" x_1 = 1, u = 0, w is assigned 1,
Executing step 4: "y_0 := v NAND w" v = 1, w = 1, y_0 is assigned 0,
Executing step 5: "y_1 := u NAND u" u = 0, u = 0, y_1 is assigned 1,
Output is y_0=0, y_1=1
```

and so you can see that the output \((0,1)\) is indeed the binary encoding of \(1+1 = 2\).

### Formal definitions

For a NAND program \(P\), its *input length* is the largest number \(n\)
such that \(P\) contains a variable of the form `x_`

\(\expr{n-1}\). \(P\)’s
*output length* is the largest number \(m\) such that \(P\) contains a
variable of the form `y_`

\(\expr{m-1}\).As mentioned in the appendix, we require that all output variables
are assigned a value, and that the largest index used in an \(s\) line
NAND program is smaller than \(s\). In particular this means that an
\(s\) line program can have at most \(s\) inputs and outputs. Intuitively, if \(P\) is a NAND
program with input length \(n\) and output length \(m\), and
\(F:\{0,1\}^n \rightarrow \{0,1\}^m\) is some function, then \(P\) computes
\(F\) if for every \(x\in \{0,1\}^n\) and \(y=F(x)\), whenever \(P\) is executed
with the `x_`

\(\expr{i}\) variable initialized to \(x_i\) for all
\(i\in [n]\), at the end of the execution the variable `y_`

\(\expr{j}\) will
equal \(y_j\) for all \(j\in [m]\).

To make sure we have a precise and unambiguous definition of computation, we will now model NAND programs using sets and tuples, and recast the notion of computing a function in these terms.

A *NAND program* is a 4-tuple \(P=(V,X,Y,L)\) of the following form:

- \(V\) (called the
*variables*) is some finite set. - \(X\) (called the
*input variables*) is a tuple of elements in \(V\), i.e. \(X=(X_0,X_1,\ldots,X_{n-1})\) for some \(n\in N\) where \(X_i \in V\) for all \(i\in [n]\). We require that the elements of \(X\) are distinct: \(X_i \neq X_j\) for all \(i\neq j\) in \([n]\). - \(Y\) (called the
*output variables*) is a tuple of elements in \(V\), i.e., \(Y=(Y_0,\ldots,Y_{m-1})\) for some \(m\in \N\) where \(Y_j \in V\) for all \(j \in [m]\). We require that the elements of \(Y\) are distinct (i.e., \(Y_i \neq Y_j\) for all \(i\neq j\) in \([m]\)) and that they are disjoint from \(X\) (i.e., \(Y_i \neq X_j\) for every \(i\in [n]\) and \(j\in [m]\)). - \(L\) (called the
*lines*) is a tuple of*triples*of \(V\), i.e., \(L \in (V \times V \times V)^*\). Intuitively, if the \(\ell\)-th element of \(L\) is a triple \((u,v,w)\) then this corresponds to the \(\ell\)-th line of the program being \(u\)`:=`

\(v\)`NAND`

\(w\). We require that for every triple \((u,v,w)\), \(u\) does not appear in \(X\) and \(v,w\) do not appear in \(Y\). Moreover, we require that for every \(v\in V\) (i.e., a member of \(V\) that is not equal to \(X_i\) for some \(i\)), \(v\) is contained in some triple in \(L\).

The *number of inputs* of \(P=(V,X,Y,Z)\) is equal to \(|X|\) and the
*number of outputs* is equal to \(|Y|\).

This definition is somewhat long and cumbersome, but really corresponds
to a straightforward modelling of NAND programs, under the map that \(V\)
is the set of all variables appearing in the program, \(X\) corresponds to
the tuple \((\)`x_`

\(\expr{0}\), `x_`

\(\expr{1}\), \(\ldots\), `x_`

\(\expr{n-1}\)
\()\), \(Y\) corresponds to the tuple \((\) `y_`

\(\expr{0}\), `y_`

\(\expr{1}\),
\(\ldots\), `y_`

\(\expr{m-1}\), \()\) and \(L\) corresponds to the list of
triples of the form \((\) `foo`

\(,\) `bar`

, \(,\) `baz`

\()\) for every line
`foo := bar NAND baz`

in the program. Please pause here and verify that
you understand this correspondence.

For example, one representation of the XOR program we described above is \(P=(V,X,Y,L)\) where

- \(V= \{\)
`x_0`

,`x_1`

,`v`

,`u`

,`w`

,`y_0`

\(\}\) - \(X = (\)
`x_0`

,`x_1`

\()\) - \(Y =(\)
`y_0`

\()\) - \(L = (\;\;(\)
`u`

,`x_0`

,`x_1`

\(), (\)`v`

,`x_0`

,`u`

\(), (\)`w`

,`x_1`

,`u`

\(), (\)`y_0`

,`v`

,`w`

\()\;\;)\)

But since we have the freedom of choosing arbitrary sets for our variables, we can also represent the same program as (for example) \(P'=(V',X',Y',L')\) where

- \(V' = \{0,1,2,3,4,5 \}\)
- \(X' = (0,1)\)
- \(Y' = (5)\)
- \(L' = ( (3,0,1),(2,0,3),(4,1,3),(5,2,4))\)

### Computing a function: formal definition

Now that we defined NAND programs formally, we turn to formally defining
the notion of computing a function. Before we do that, we will need to
talk about the notion of the *configuration* of a NAND program. Such a
configuration simply corresponds to the current line that is executed
and the current values of all variables at a certain point in the
execution. Thus we will model it as a pair \((\ell,\sigma)\) where \(\ell\)
is a number between \(0\) and the total number of lines in the program,
and \(\sigma\) maps every variable to its current value.The number \(\ell\) can be thought of as the “program counter” and
refers to the line that is just about to be executed, when we number
the lines from \(0\) till \(s-1\) for some \(s\in \N\). The program
counter starts at \(0\), and after executing the last line (i.e., line
number \(s-1\)), it equals \(s\). The initial
configuration has the form \((0,\sigma_0)\) where \(0\) corresponds to the
first line, and \(\sigma_0\) is the assignment of zeroes to all variables
and \(x_i\)’s to the input variables. The final configuration will have
the form \((s,\sigma_s)\) where \(s\) is the number of lines (i.e.,
corresponding to “going past” the final line) and \(\sigma_s\) is the
final values assigned to all variables, which in particular encodes also
the values of the output variables.

For example, if we run the XOR program about on the input `11`

then the
configuration of the program evolves as follows:

```
x_0 x_1 v u w y_0
0. u := x_0 NAND x_1 : 0 1 0 0 0 0
1. v := x_0 NAND u : 0 1 0 1 0 0
2. w := x_1 NAND u : 0 1 1 1 0 0
3. y_0 := v NAND w : 0 1 1 1 0 0
4. (after halting) : 0 1 1 1 0 1
```

We now write the formal definition. As always, it is a good practice to verify that this formal definition matches the intuitive description above:

Let \(P=(V,X,Y,L)\) be a NAND program, and let \(n=|X|\), \(m=|Y|\) and
\(s=|L|\). A *configuration* of \(P\) is a pair \((\ell,\sigma)\) where
\(\ell \in [s+1]\) and \(\sigma\) is a function
\(\sigma:V \rightarrow \{0,1\}\) that maps every variable of \(P\) into a
bit in \(\{0,1\}\). We define
\(CONF(P) = [s+1] \times \{ \sigma \;|\; \sigma:V \rightarrow \{0,1\} \}\)
to be the set of all configurations of \(P\).Note that \(|CONF(P)| = (s+1)2^{|V|}\): can you see why?

If \(P\) has \(n\) inputs, then for every \(x\in \{0,1\}^n\), the *initial
configuration of \(P\) with input \(x\)* is the pair \((0,\sigma_0)\) where
\(\sigma_0:V \rightarrow \{0,1\}\) is the function defined as
\(\sigma_0(X_i)=x_i\) for every \(i\in [n]\) and \(\sigma_0(v)=0\) for all
variables not in \(X\).

An execution of a NAND program can be thought of as simply progressing, line by line, from the initial configuration to the next one:

For every NAND program \(P=(V,X,Y,L)\), the *next step function of \(P\)*,
denoted by \(NEXT_P\), is the function
\(NEXT_P:CONF(P) \rightarrow CONF(P)\) that defined as follows:

For every \((\ell,\sigma) \in CONF(P)\), if \(\ell=|L|\) then \(NEXT_P(\ell,\sigma)=(\ell,\sigma)\). Otherwise \(NEXT_P(\ell,\sigma) = (\ell+1,\sigma')\) where \(\sigma':V \rightarrow \{0,1\}\) is defined as follows: \[ \sigma'(x) = \begin{cases} NAND(\sigma(v),\sigma(w)) & x=u \\ \sigma(x) & \text{otherwise} \end{cases} \] where \((u,v,w)=L_\ell\) is the \(\ell\)-th triple (counting from zero) in \(L\).

For every input \(x\in \{0,1\}^n\) and \(\ell in [s+1]\), the *\(\ell\)-th
configuration of \(P\) on input \(x\)*, denoted as \(conf_\ell(P,x)\) is
defined recursively as follows: \[
conf_\ell(P,x) = \begin{cases} (0,\sigma_0) & \ell=0 \\
NEXT_P(conf_{\ell-1}(P)) & \text{otherwise}
\end{cases}
\] where \((0,\sigma_0)\) is the initial configuration of \(P\) on input
\(x\).

We can now finally formally define the notion of computing a function:

Let \(P=(V,X,Y,L)\) and \(n=|X|\), \(m=|Y|\) and \(s=|L|\). Let
\(F:\{0,1\}^n \rightarrow \{0,1\}^m\). We say that *\(P\) computes \(F\)* if
for every \(x\in \{0,1\}^n\), if \(y=F(x)\) then \(conf_s(P,x)=(s,\sigma)\)
where \(\sigma(Y_j) = y_j\) for every \(j\in [m]\).

For every \(s\in \N\), we define \(SIZE(s)\) to be the set of all functions that are computable by a NAND program of at most \(s\) lines.

The formal specification of any programming language, no matter how simple, is often cumbersome, and the definitions above are no exception. You should go back and read them and make sure that you understand why they correspond to our informal description of computing a function via NAND programs. From this point on, we will not distinguish between the representation of a NAND program in terms of lines of codes, and its representation as a tuple \(P=(V,X,Y,L)\).

Let \(XOR_n:\{0,1\}^n \rightarrow \{0,1\}\) be the function that maps \(x\in \{0,1\}^n\) to \(\sum_{i=0}^n x_i (\mod 2)\). The NAND program we presented above yields a proof of the following theorem

\(XOR_2 \in SIZE(4)\)

Similarly, the addition program we presented shows that \(ADD_1 \in SIZE(5)\).

## Canonical input and output variables

The specific identifiers for NAND variables (other than the inputs and
outputs) do not make any difference in the program’s functionality, as
long as we give separate variable distinct identifiers. For example, if
I replace all instances of the variable `foo`

with `boazisgreat`

then,
under the (unfortunately common) condition that `boazisgreat`

was not
used in the original program, the resulting program will still compute
the same function. For convenience, it is sometimes useful to assume
that all variables identifiers have some canonical form such as being
either `x_`

\(\expr{i}\), `y_`

\(\expr{j}\) or `work_`

\(\expr{k}\). Similarly,
while we allowed in Reference:NANDprogram the variables to be members of
some arbitrary set \(V\), it is sometimes useful to assume that \(V\) is
simply the set of numbers from \(0\) to some natural number (which can
never be more than three times the number of lines \(s\)). This motivates
the following definition:

Let \(P=(V,X,Y,L)\) be a NAND program and let \(n=|X|\) and \(m=|Y|\). We say
that \(P\) has *canonical variables* if \(V=[t]\) for some \(t\in \N\),
\(X=(0,1,\ldots,n-1)\) and \(Y=(t-m,t-m+1,\ldots,t-1)\).

That is, in a canonical form program, the variables are the set \([t]\), where the input variables correspond to the first \(n\) variables and the outputs to the last \(m\) variables. Every program can be converted to an equivalent program of canonical form:

For every \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(s\in\N\), \(F\) can be computed by an \(s\)-line NAND program if and only if it can be computed by an \(s\)-line NAND program with canonical variables.

The “if” direction is trivial, since a NAND program with canonical variables is just a special case of a NAND program. For the “only if” direction, let \(P=(V,X,Y,L)\) be an \(s\)-line NAND program computing \(F\), and let \(t=|V|\). We define a bijection \(\pi:V \rightarrow [t]\) as follows: \(\pi(X_i)=i\) for all \(i\in [n]\), \(\pi(Y_j)=t-m+j\) for all \(j\in [m]\) and we map the remaining \(t-m-n\) elements of \(V\) to \(\{ n,\ldots,t-1-m\}\) in some arbitrary one to one way. (We can do so because the \(X\)’s and \(Y\)’s are distinct and disjoint.) Now define \(P' = (\pi(V),\pi(X),\pi(Y),\pi(L))\), where by this we mean that we apply \(\pi\) individually to every element of of \(V\),\(X\), \(Y\), and the triples of \(L\). Since (as we leave you to verify) the definition of configurations and computing a function are invariant under bijections of \(V\), \(P'\) computes the same function as \(P\).

Given Reference:canonicalvarsthm, since we only care about the functionality (and size) of programs, and not the labels of variables, we will always be able to assume “without loss of generality” that a given NAND program \(P\) has canonical form. A canonical form program \(P\) can also be represented as a triple \((n,m,L)\) where \(n,m\) are (as usual) the inputs and outputs, and \(L\) is the lines. This is because we recover the original representation \((V,X,Y,L)\) by simply setting \(X=(0,1,\ldots,n-1)\), \(Y =(t-m,t-m+1,\ldots,t-1)\) and \(V=[t]\) where \(t\) is one plus the largest number appearing in a triple of \(L\). In the following we will freely move between these two representations. If \(n,m\) are known from the context, then a canonical form program can be represented simply by the list of triples \(L\).

**Configurations of programs with canonical variables:** A
*configuration* of a program with canonical variables is a pair
\((\ell,\sigma)\) where \(\sigma:[t] \rightarrow \{0,1\}\) and \(t\) is the
number of variables. We can and will identify such a function \(\sigma\)
with a string of \(t\) bits. Thus we will often say that a configuration
of a canonical program is a pair \((\ell,\sigma)\) where
\(\sigma \in \{0,1\}^t\).

## Composing functions

Computing the XOR or addition of two bits is all well and good, but still seems a long way off from even the algorithms we all learned in elementary school, let alone World of Warcraft. We will get to computing more interesting functions, but for starters let us prove the following simple extension of Reference:xortwothm

\(XOR_4 \in SIZE(12)\)

We can prove Reference:xorfourthm by explicitly writing down a 12 line
program. But writing NAND programs by hand can get real old real fast.
So, we will prove more general results about *composing* functions:

If \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) is a function in \(SIZE(L)\) and \(G:\{0,1\}^m \rightarrow \{0,1\}^k\) is a function in \(SIZE(L')\) then \(G\circ F\) is in \(SIZE(L+L')\), where \(G\circ F:\{0,1\}^n \rightarrow \{0,1\}^k\) is defined as the function that maps \(x\in \{0,1\}^n\) to \(G(F(x))\).

If \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) is a function in \(SIZE(L)\) and \(G:\{0,1\}^{n'} \rightarrow \{0,1\}^{m'}\) is a function in \(SIZE(L')\) then \(F\| G\) is in \(SIZE(L+L')\), where \(F \| G: \{0,1\}^{n+n'} \rightarrow \{0,1\}^{m+m'}\) is defined as the function that maps \(x \in \{0,1\}^{n+n'}\) to \(F(x_0,\ldots,x_{n-1})G(x_n,\ldots,x_{n+n'-1})\).

We will prove Reference:seqcompositionthm and
Reference:parcompositionthm using our formal definition of NAND
programs. But it is also possible to directly give syntactic
transformations of the code of programs computing \(F\) and \(G\) to
programs computing \(G\circ F\) and \(F \oplus G\) respectively. It is a
good exercise for you to pause here and see that you know how to give
such a transformation. Try to think how you would write a *program* (in
the programming language of your choice) that given two strings `C`

and
`D`

that contain the code of NAND programs for computing \(F\) and \(G\),
would output a string `E`

that contains that code of a NAND program for
\(G\circ F\) (or \(F \| G\)).

Before proving Reference:seqcompositionthm and Reference:parcompositionthm, note that they do imply Reference:xorfourthm. Indeed, it’s easy to verify that for every \(x \in \{0,1\}^4\),

\[ XOR_4(x) = \sum_{i=0}^3 x_i (\mod 3) = ((x_0+x_1 \mod 2) + (x_2+x_3 \mod 2) \mod 2) = XOR_2(XOR_2(x_0,x_1)XOR_2(x_2,x_3)) \]

and hence

\[ XOR_4= XOR_2 \circ (XOR_2 \oplus XOR_2) \;. \]

Since \(XOR_2\) is in \(SIZE(4)\), it follows that \(XOR_4 \in SIZE(4+(4+4))=SIZE(12)\).

Using the same idea we can prove the following more general result:

For every \(n>1\), \(XOR_n \in SIZE(10n)\)

We leave proving Reference:paritycircuitthm as Reference:paritycircuitex.

## Proving the composition theorems

We now formally prove the “Sequential Composition Theorem” Reference:seqcompositionthm, leaving the “parallel composition” as an exercise. The idea behind the proof is that given a program \(P=(V,X,Y,L)\) that computes \(F\) and a program \(P'=(V',X',Y',L')\) that computes \(G\), we can hope to obtain a program \(P''\) that computes \(G \circ F\) by simply “copy and pasting” the code for \(P'\) after the code for \(P\), replacing the inputs of \(G\) with the outputs of \(F\). In our tuple notation this corresponds renaming the variables \(X'\) so they are the same as \(Y\), and then making \(P'' = (V \cup V',X,Y',L'')\) where \(L''\) is obtained by simply concatenating \(L\) and \(L'\).

It is an interesting exercise to try to prove that this transformation
works. If you do so, you will find out that you simply can’t make the
proof go through. It turns out the issue is not about mere
“formalities”. This transformation is simply not correct: if \(G\) and \(F\)
use the same workspace variable `foo`

, then the program \(P'\) might
assume `foo`

is initialized to zero, while the program \(P\) might assign
`foo`

a nonzero value. Thus in the proof we will need to take care of
this issue, and ensure that \(P'\) and \(P\) use disjoint workspace
variables. This is one example of a general phenomenon. Trying and
failing to prove that a program or algorithm is correct often leads to
discovery of bugs in it.

We now turn to the full proof. It is somewhat cumbersome since we have
to **(1)** fully specify the transformation of \(P\) and \(P'\) to \(P''\) and
**(2)** prove that the transformed program \(P''\) does actually compute
\(G \circ F\). Nevertheless, because proofs about computation can be
subtle, it is important that you read carefully the proof and make sure
you understand every step in it.

Let \(P=(V,X,Y,L)\) and \(P'=(V',X',Y',L')\) be the programs for computing \(F\) and \(G\) respectively, and assume without loss of generality that they are in canonical form, and so \(X = [n]\), \(Y=X'=[m]\), and \(Y'=[k]\). Let \(t=|V|\), \(s=|L|\), \(t'=|V'|\), and \(s'=|L'|\). We will construct an \(s+s'\) line canonical form program \(P''\) with \(t+t'-m\) variables that computes \(G\circ F:\{0,1\}^n \rightarrow \{0,1\}^k\) (see Reference:compositionprogsfig). To specify \(P''\) we only need to define its set of lines \(L'' = (L''_0,L''_1,\ldots, L''_{s+s'-1})\). The first \(s\) lines of \(L''\) simply equal \(L\). The next \(s'\) lines are obtained from \(L'\) by adding \(t-m\) to every label.Since we are representing programs in canonical form, each line is a triple of numbers, and adding \(t-m\) simply shifts the set of variables used by \(P'\) from \(\{0,\ldots,t'-1\}\) to the last \(t'\) elements of \([t+t'-m]\) which is the set of variables of the composed program \(P''\). In other words, for every \(\ell \in [s+s']\),

\[ L''_\ell = \begin{cases} L_\ell & \ell < s \\ (L'_{\ell-s,0}+t-m,L'_{\ell-s,1}+t-m,L'_{\ell-s,2} +t-m) & \ell > s \end{cases} \] where \(L' = ((L'_{0,0},L'_{0,1},L'_{0,2}), \cdots , (L'_{t',0},L'_{t',1},L'_{t',2})\).

We now need to prove that \(P''\) computes \(G \circ F\). We will do so by showing the following two claims:These two claims are easiest to understand by looking at Reference:compositionprogsfig. Claim 1 simply says that in the first \(s\) steps of the execution, the state of the first \(t\) variables corresponds to the state in the execution of \(P\), and the last \(t'-m\) variables are untouched. Claim 2 says that in the following \(t'\) step, the state of the first \(t-m\) variables remains as they were at the end of the execution of \(P\), and the last \(t'\) variables evolve according to the execution of \(P'\). The variables \(\{ t-m,\ldots, t-1\}\) are involved in both executions: they play the role of output variables for \(P\) and the role of input variables for \(P'\).

**Claim 1:** For every \(x\in \{0,1\}^n\) and \(\ell \in [s+1]\),
\(conf_\ell(P'',x)=(\ell,\sigma 0^{t'-m})\) where
\((\ell,\sigma)=conf_\ell(P,x)\).Recall that for a program in canonical form, we can think of the
state \(\sigma\) as a string, and hence \(\sigma 0^{t'-m}\) means that
we concatenate \(t'-m\) zeroes to this string. Similarly in Claim 2
below, \(z \sigma\) refers to the concatenation of the string \(z\) to
\(\sigma\).

**Claim 2:** For every \(x\in \{0,1\}^n\) and
\(\ell \in \{s,\ldots, s+s'\}\), \(conf_\ell(P'',x) = (\ell, z\sigma)\)
where \((\ell-t,\sigma)=conf_{\ell-t}(P',F(x))\) and \(z\in \{0,1\}^{t-m}\)
is some string.

Claim 2 implies the theorem, since by our definition of \(P''\) computing the function \(G\), it follows that if \((t',\sigma) = conf_{t'}(P'',F(x))\) then the last \(k\) bits of \(\sigma\) correspond to \(G(F(x))\). We will outline the proof of Claims 1 and 2:

**Proof outline of claim 1:** The proof follows because the lines
\(L''_0,\ldots,L''_{t-1}\) are identical to the lines of \(L\), and hence
they only touch the first \(t\) variables, and leave the remaining \(t'-m\)
equal to \(0\).

**Proof outline of claim 2:** By Claim 1, by step \(t\) the configuration
has the value \(F(x)\) on the variables \(t-m,\ldots,t-1\). Since the lines
\(L''_t,\ldots, L''_{t+t'-1}\) only touch the variables
\(t-m,\ldots,t+t'-1-m\), the last \(t'\) variables correspond to the same
configuration as running the program \(P''\) on \(F(x)\).

To make these outlines into full proofs, we need to use *induction*, so
we can argue that for every \(\ell\), if we maintained these properties up
to step \(\ell-1\), then they are maintained in step \(\ell\) as well. We
omit the full inductive proof, though working out it for yourself can be
an excellent exercise in getting comfortable with such arguments.

### Example: Adding two-bit numbers

Using composition, we can show how to add *two bit* numbers. That is,
the function \(ADD_2:\{0,1\}^4\rightarrow\{0,1\}^3\) that takes two
numbers \(x,x'\) each between \(0\) and \(3\) (each represented with two bits
using the binary representation) and outputs their sum, which is a
number between \(0\) and \(6\) that can be represented using three bits. The
gradeschool algorithm gives us a way to compute \(ADD_2\) using \(ADD_1\).
That is, we can add each digit using \(ADD_1\) and then take care of the
carry. That is, if the two input numbers have the form \(x_0+2x_1\) and
\(x_2+2x_3\), then the output number \(y_0+y_12+y_32^2\) can be computed via
the following “pseudocode” (see also Reference:addtwofig)

```
y_0,c_1 := ADD_1(x_0,x_2) // add least significant digits
z_1,c_2 := ADD_1(x_1,x_3) // add second digits
y_1,c'_2 := ADD_1(z_1,c_1) // second output is sum + carry
y_2 := c_2 OR c'_2 // top digit is 1 if one of the top carries is 1
```

To transform this pseudocode into an actual program or circuit, we can
use Reference:seqcompositionthm and Reference:parcompositionthm. That
is, we first compute
\((y_0,c_1,z_1,c_2) = ADD_1 \| ADD_1 (x_0,x_2,x_1,x_3)\), which we can do
in \(10\) lines via Reference:parcompositionthm, then apply \(ADD_1\) to
\((z_1,c_1)\), and finally use the fact that \(OR(a,b)=NAND(NOT(a),NOT(b))\)
and \(NOT(a)=NAND(a,a)\) to compute `c_2 OR c'_2`

via three lines of NAND.
The resulting code is the following:

```
// Add a pair of two-bit numbers
// Input: (x_0,x_1) and (x_2,x_3)
// Output: (y_0,y_1,y_2) representing the sum
// x_0 + 2x_1 + x_2 + 2x_3
//
// Operation:
// 1) y_0,c_1 := ADD_1(x_0,x_2):
// add the least significant digits
// c_1 is the "carry"
u := x_0 NAND x_2
v := x_0 NAND u
w := x_2 NAND u
y_0 := v NAND w
c_1 := u NAND u
// 2) z'_1,z_1 := ADD_1(x_1,x_3):
// add second digits
u := x_1 NAND x_3
v := x_1 NAND u
w := x_3 NAND u
z_1 := v NAND w
z'_1 := u NAND u
// 3) Take care of carry:
// 3a) y_1 = XOR(z_1,c_1)
u := z_1 NAND c_1
v := z_1 NAND u
w := c_1 NAND u
y_1 := v NAND w
// 3b) y_2 = z'_1 OR (z_1 AND c_1)
// = NAND(NOT(z'_1), NAND(z_1,c_1))
u := z'_1 NAND z'_1
v := z_1 NAND c_1
y_2 := u NAND v
```

For example, the computation of the deep fact that \(2+3=5\) corresponds to running this program on the inputs \((0,1,1,1)\) which will result in the following trace:

```
Executing step 1: "u := x_0 NAND x_2" x_0 = 0, x_2 = 1, u is assigned 1,
Executing step 2: "v := x_0 NAND u" x_0 = 0, u = 1, v is assigned 1,
Executing step 3: "w := x_2 NAND u" x_2 = 1, u = 1, w is assigned 0,
Executing step 4: "y_0 := v NAND w" v = 1, w = 0, y_0 is assigned 1,
Executing step 5: "c_1 := u NAND u" u = 1, u = 1, c_1 is assigned 0,
Executing step 6: "u := x_1 NAND x_3" x_1 = 1, x_3 = 1, u is assigned 0,
Executing step 7: "v := x_1 NAND u" x_1 = 1, u = 0, v is assigned 1,
Executing step 8: "w := x_3 NAND u" x_3 = 1, u = 0, w is assigned 1,
Executing step 9: "z_1 := v NAND w" v = 1, w = 1, z_1 is assigned 0,
Executing step 10: "z'_1 := u NAND u" u = 0, u = 0, z'_1 is assigned 1,
Executing step 11: "u := z_1 NAND c_1" z_1 = 0, c_1 = 0, u is assigned 1,
Executing step 12: "v := z_1 NAND u" z_1 = 0, u = 1, v is assigned 1,
Executing step 13: "w := c_1 NAND u" c_1 = 0, u = 1, w is assigned 1,
Executing step 14: "y_1 := v NAND w" v = 1, w = 1, y_1 is assigned 0,
Executing step 15: "u := z'_1 NAND z'_1" z'_1 = 1, z'_1 = 1, u is assigned 0,
Executing step 16: "v := z_1 NAND c_1" z_1 = 0, c_1 = 0, v is assigned 1,
Executing step 17: "y_2 := u NAND v" u = 0, v = 1, y_2 is assigned 1,
Output is y_0=1, y_1=0, y_2=1
```

### Composition in NAND programs

We can generalize the above examples to handle not just sequential and
parallel but all forms of *composition*. That is, if we have an \(s\) line
program \(P\) that computes the function \(F\), and a program \(P'\) that can
compute the function \(G\) using \(t\) standard NAND lines and \(k\) calls to
a “black box” for computing \(F\), then we can obtain a \(t + ks\) line
program \(P''\) to compute \(G\) (without any “magic boxes”) by replacing
every call to \(F\) in \(P'\) with a copy of \(P\) (while appropriately
renaming the variables).

## Lecture summary

- We can define the notion of computing a function via a simplified “programming language”, where computing a function \(F\) in \(T\) steps would correspond to having a \(T\)-line NAND program that computes \(F\).
- An equivalent formulation is that a function is computable by a NAND program if it can be computed by a NAND circuit.

## Exercises

Which of the following statements is false? a. There is a NAND program
to add two \(4\)-bit numbers that has at most \(100\) lines.

b. Every NAND program to add two \(4\)-bit numbers has at most \(100\)
lines.

c. Every NAND program to add two \(4\)-bit numbers has least \(5\) lines.

Write a NAND program that adds two \(3\)-bit numbers.

Prove Reference:paritycircuitthm.**Hint:** Prove by induction that for every \(n>1\) which is a
power of two, \(XOR_n \in SIZE(4(n-1))\). Then use this to prove the
result for every \(n\).

## Bibliographical notes

The exact notion of “NAND programs” we use is nonstandard, but these are
equivalent to standard models in the literature such as *straightline
programs* and *Boolean circuits*.

An historical review of calculating machines can be found in Chapter I of the 1946 “operating manual” for the Harvard Mark I computer, written by Lieutenant Grace Murray Hopper and the staff of the Harvard Computation Laboratory.

## Further explorations

Some topics related to this lecture that might be accessible to advanced students include:

(to be completed)