# Physical implementations of NAND programs

- Understand how NAND programs can map to physical processes in a variety of ways.
- Learn the model of
*Boolean circuits*and get proficient in moving between description of a NAND program as a code and as a circuit or*labeled graph*. - See that NAND is a
*universal basis*for circuits, and examples for universal and non-universal families of gates. - Understand the
*physical extended Church-Turing thesis*that NAND programs capture*all*feasible computation in the physical world, and its physical and philosophical implications.

“In existing digital computing devices various mechanical or electrical devices have been used as elements: Wheels, which can be locked … which on moving from one position to another transmit electric pulses that may cause other similar wheels to move; single or combined telegraph relays, actuated by an electromagnet and opening or closing electric circuits; combinations of these two elements;—and finally there exists the plausible and tempting possibility of using vacuum tubes”, John von Neumann, first draft of a report on the EDVAC, 1945

We have defined NAND programs as a model for computation, but is this model only a mathematical abstraction, or is it connected in some way to physical reality? For example, if a function \(F:\{0,1\}^n \rightarrow \{0,1\}\) can be computed by a NAND program of \(s\) lines, is it possible, given an actual input \(x\in \{0,1\}^n\), to compute \(F(x)\) in the real world using an amount of resources that is roughly proportional to \(s\)?

In some sense, we already know that the answer to this question is
**Yes**. We have seen a *Python* program that can evaluate NAND
programs, and so if we have a NAND program \(P\), we can use any computer
with Python installed on it to evaluate \(P\) on inputs of our choice. But
do we really need modern computers and programming languages to run NAND
programs? And can we understand more directly how we can map such
programs to actual physical processes that produce an output from an
input? This is the content of this lecture.

We will also talk about the following “dual” question. Suppose we have
some way to compute a function \(F:\{0,1\}^n \rightarrow \{0,1\}\) using
roughly an \(s\) amount of “physical resources” such as material, energy,
time, etc.. Does this mean that there is also a NAND program to compute
\(F\) using a number of lines that is not much bigger than \(s\)? This might
seem like a wishful fantasy, but we will see that the answer to this
question might be (up to some important caveats) essentially **Yes** as
well.

## Physical implementation of computing devices.

*Computation* is an abstract notion, that is distinct from its physical
*implementations*. While most modern computing devices are obtained by
mapping logical gates to semi-conductor based transistors, over history
people have computed using a huge variety of mechanisms, including
mechanical systems, gas and liquid (known as *fluidics*), biological and
chemical processes, and even living creatures (e.g., see
Reference:crabfig or this
video for how crabs or
slime mold can be used to do computations).

In this lecture we review some of these implementations, both so you can
get an appreciation of how it is possible to directly translate NAND
programs to the physical world, without going through the entire stack
of architecture, operating systems, compilers, etc… as well as to
emphasize that silicon-based processors are by no means the only way to
perform computation. Indeed, as we will see much later in this course, a
very exciting recent line of works involves using different media for
computation that would allow us to take advantage of *quantum mechanical
effects* to enable different types of algorithms.

## Transistors and physical logic gates

A *transistor* can be thought of as an electric circuit with two inputs,
known as *source* and *gate* and an output, known as the *sink*. The
gate controls whether current flows from the source to the sink. In a
*standard transistor*, if the gate is “ON” then current can flow from
the source to the sink and if it is “OFF” then it can’t. In a
*complementary transistor* this is reversed: if the gate is “OFF” then
current can flow from the source to the sink and if it is “ON” then it
can’t.

There are several ways to implement the logic of a transistor. For
example, we can use faucets to implement it using water pressure (e.g.
Reference:transistor-water-fig).This might seem as curiosity but there is a field known as
fluidics concerned with
implementing logical operations using liquids or gasses. Some of the
motivations include operating in extreme environmental conditions
such as in space or a battlefield, where standard electronic
equipment would not survive. However, the standard
implementation uses electrical current. One of the original
implementations used *vacuum tubes*. As its name implies, a vacuum tube
is a tube containing nothing (i.e., vacuum) and where a priori electrons
could freely flow from source (a wire) to the sink (a plate). However,
there is a gate (a grid) between the two, where modulating its voltage
can block the flow of electrons.

Early vacuum tubes were roughly the size of lightbulbs (and looked very
much like them too). In the 1950’s they were supplanted by
*transistors*, which implement the same logic using *semiconductors*
which are materials that normally do not conduct electricity but whose
conductivity can be modified and controlled by inserting impurities
(“doping”) and an external electric field (this is known as the *field
effect*). In the 1960’s computers were started to be implemented using
*integrated circuits* which enabled much greater density. In 1965,
Gordon Moore predicted that the number of transistors per circuit would
double every year (see Reference:moorefig), and that this would lead to
“such wonders as home computers —or at least terminals connected to a
central computer— automatic controls for automobiles, and personal
portable communications equipment”. Since then, (adjusted versions of)
this so-called “Moore’s law” has been running strong, though exponential
growth cannot be sustained forever, and some physical limitations are
already becoming
apparent.

## Gates and circuits

We can use transistors to implement a *NAND gate*, which would be a
system with two input wires \(x,y\) and one output wire \(z\), such that if
we identify high voltage with “\(1\)” and low voltage with “\(0\)”, then the
wire \(z\) will equal to “\(1\)” if and only if the NAND of the values of
the wires \(x\) and \(y\) is \(1\) (see Reference:transistor-nand-fig).

More generally, we can use transistors to implement the model of
*Boolean circuits*. We list the formal definition below, but let us
start with the informal one:

Let \(B\) be some set of functions (known as “gates”) from \(\bits^k\) to \(\{0,1\}\). A

Boolean circuitwith the basis \(B\) is obtained by connecting “gates” which compute functions in \(B\) together by “wires” where each gate has \(k\) wires going into it and one wire going out of it. We have \(n\) special wires known as the “input wires” and \(m\) special wires known as the “output wires”. To compute a function \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) using a circuit, we feed the bits of \(x\) to the \(n\) input wires, and then each gate computes the corresponding function, and we “read off” the output \(y\in \{0,1\}^m\) from the \(m\) output wires.

The number \(k\) is known as the *arity* of the basis \(B\). We think of \(k\)
as a small number (such as \(k=2\) or \(k=3\)) and so the idea behind a
Boolean circuit is that we can compute complex functions by combining
together the simple components which are the functions in \(B\). It turns
out that NAND programs correspond to circuits where the basis is the
single function \(NAND:\{0,1\}^2 \rightarrow \{0,1\}\). We now show this
more formally.

## Representing programs as graphs

We now define NAND programs as circuits, using the notion of *directed
acyclic graphs* (DAGs).

If you are not comfortable with the definitions of graphs, and in particular directed acyclic graphs (DAGs), now would be a great time to go back to the “mathematical background” lecture, as well as some of the resources here, and review these notions.

A *NAND circuit* with \(n\) inputs and \(m\) outputs is a labeled directed
acyclic graph (DAG) in which every vertex has in-degree at most two. We
require that there are \(n\) vertices with in-degree zero, known as *input
variables*, that are labeled with `x_`

\(\expr{i}\) for \(i\in [n]\). Every
vertex apart from the input variables is known as a *gate*. We require
that there are \(m\) vertices of out-degree zero, denoted as the *output
gates*, and that are labeled with `y_`

\(\expr{j}\) for \(j\in [m]\). While
not all vertices are labeled, no two vertices get the same label. We
denote the circuit as \(C=(V,E,L)\) where \(V,E\) are the vertices and edges
of the circuit, and \(L:V \rightarrow_p S\) is the (partial) one-to-one
labeling function that maps vertices into the set \(S=\{\)
`x_0`

,\(\ldots\),`x_`

\(\expr{n-1}\),`y_0`

,\(\ldots\), `y_`

\(\expr{m-1}\),\(\}\).
The *size* of a circuit \(C\), denoted by \(|C|\), is the number of gates
that it contains.

The definition of NAND circuits is not ultimately that complicated, but may take a second or third read to fully parse. It might help to look at Reference:XORcircuitfig, which describes the NAND circuit that corresponds to the 4-line NAND program we presented above for the \(XOR_2\) function.

A NAND circuit corresponds to computation in the following way. To
compute some output on an input \(x\in \{0,1\}^n\), we start by assigning
to the input vertex labeled with `x_`

\(\expr{i}\) the value \(x_i\), and
then proceed by assigning for every gate \(v\) the value that is the NAND
of the values assigned to its in-neighbors (if it has less than two
in-neighbors, we replace the value of the missing neighbors by zero).
The output \(y\in \{0,1\}^m\) corresponds to the value assigned to the
output gates, with \(y_j\) equal to the value assigned to the value
assigned to the gate labeled `y_`

\(\expr{j}\) for every \(j\in [m]\).
Formally, this is defined as follows:

Let \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and let \(C=(V,E,L)\) be a NAND
circuit with \(n\) inputs and \(m\) outputs. We say that *\(C\) computes \(F\)*
if there is a map \(Z:V \rightarrow \{0,1\}\), such that for every
\(x\in \{0,1\}^n\), if \(y=F(x)\) then:

* For every \(i\in [n]\), if \(v\) is labeled with `x_`

\(\expr{i}\) then
\(Z(v)=x_i\).

* For every \(j\in[m]\), if \(v\) is labeled with `y_`

\(\expr{j}\) then
\(Z(v)=y_j\).

* For every gate \(v\) with in-neighbors \(u,w\), if \(a=Z(u)\) and \(b=Z(w)\),
then \(Z(v)=NAND(a,b)\). (If \(v\) has fewer than two neighbors then we
replace either \(b\) or both \(a\) and \(b\) with zero in the condition
above.)

You should make sure you understand *why* Reference:NANDcirccomputedef
captures the informal description above. This might require reading the
definition a second or third time, but would be crucial for the rest of
this course. Moreover, a priori it is not clear that for every circuit
\(C\) and assignment \(x\) there is a map \(Z:V \rightarrow \{0,1\}\) that
satisfies the conditions of Reference:NANDcirccomputedef. However, this
follows from Reference:circuitprogequivthm

The following theorem says that these two notions of computing a function are actually equivalent: we can transform a NAND program into a NAND circuit computing the same function, and vice versa.

For every \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(s\in \N\), \(F\) can be computed by an \(s\)-line NAND program if and only if \(F\) can be computed by an \(n\)-input \(m\)-output NAND circuit of \(s\) gates.

The idea behind the proof is simple (see Reference:NANDcircuit_transfig
for an example). Just like we did to the XOR program, if we have a NAND
program \(P\) of \(s\) lines, \(n\) inputs, and \(m\) outputs, we can transform
it into a NAND circuit with \(n\) inputs and \(s\) gates (i.e., a graph of
\(n+s\) vertices, \(n\) of which are *sources*), where each gate corresponds
to a line in the program \(P\). If line \(\ell\) involves the NAND of two
variables `foo`

and `bar`

then if \(\ell'\) and \(\ell''\) are the lines
where `foo`

and `bar`

were last assigned a value, then we add edges
going into the gate corresponding to \(\ell\) from the gates corresponding
to \(\ell',\ell''\). (If one of the variables was an input variable, then
we add an edge from that variable, if one of them was an uninitialized
then we add no edge, and use the convention that it amounts to
defaulting to zero.) In the other direction, we can transform a NAND
circuit \(C\) of \(n\) inputs, \(m\) outputs and \(s\) gates to an \(s\)-line
program by essentially inverting this process. For every gate in the
program, we will have a line in the program which assigns to a variable
the NAND of the variables corresponding to the in-neighbors of this
gate. If the gate is an output gate labeled with `y_`

\(\expr{j}\) then the
corresponding line will assign the value to the variable `y_`

\(\expr{j}\).
Otherwise we will assign the value to a fresh “workspace” variable. We
now show the formal proof.

We start with the “only if” direction. That is, we show how to transform
a NAND program to a circuit. Suppose that \(P\) is an \(S\) line program
that computes \(F\). We will build a NAND circuit \(C=(V,E,L)\) that
computes \(F\) as follows. The vertex set \(V\) will have the \(n+s\) elements
\(\{ (0,0), \ldots, (0,n-1),(1,0),\ldots,(1,s-1) \}\). That is, it will
have \(n\) vertices of the form \((0,i)\) for \(i\in [n]\) (corresponding to
the \(n\) inputs), and \(S\) vertices of the form \((1,\ell)\) (corresponding
to the lines in the program). For every line \(\ell\) in the program \(P\)
of the form `foo := bar NAND baz`

, we put edges in the graph of the form
\(\overrightarrow{(1,\ell')\;(1,\ell)}\) and
\(\overrightarrow{(1,\ell'')\;(1,\ell)}\) where \(\ell'\) and \(\ell''\) are
the last lines before \(\ell\) in which the variables `bar`

and `baz`

were
assigned a value. If the variable `bar`

and/or `baz`

was not assigned a
value prior to the \(\ell\)-th line and is not an input variable then we
don’t add a corresponding edge. If the variable `bar`

and/or `baz`

is an
input variable `x_`

\(\expr{i}\) then we add the edge
\(\overrightarrow{(0,i)\;(1,\ell)}\). We label the vertices of the form
\((0,i)\) with `x_`

\(\expr{i}\) for every \(i\in [n]\). For every \(j\in[m]\),
let \(\ell\) be the last line in which the variable `y_`

\(\expr{j}\) is
assigned a value,As noted in the appendix, valid NAND programs must assign a value
to all their output variables. and label the vertex \((1,\ell)\) with
`y_`

\(\expr{j}\). Note that the vertices of the form \((0,i)\) have
in-degree zero, and all edges of the form
\(\overrightarrow{(1,\ell')\;(1,\ell)}\) satisfy \(\ell>\ell'\). Hence this
graph is a DAG, as in any cycle there would have to be at least on edge
going from a vertex of the form \((1,\ell)\) to a vertex of the form
\((1,\ell')\) for \(\ell'<\ell\) (can you see why?). Also, since we don’t
allow a variable of the form `y_`

\(\expr{j}\) on the right-hand side of a
NAND operation, the output vertices have out-degree zero.

To complete the proof of the “only if” direction, we need to show that
the circuit \(C\) we constructed computes the same function \(F\) as the
program \(P\) we were given. Indeed, let \(x\in \{0,1\}^n\) and \(y = F(x)\).
For every \(\ell\), let \(z_\ell\) be the value that is assigned by the
\(\ell\)-th line in the execution of \(P\) on input \(x\). Now, as per
Reference:NANDcirccomputedef, define the map \(Z:V \rightarrow \{0,1\}\)
as follows: \(Z((0,i))=x_i\) for \(i\in [n]\) and \(Z((1,\ell))=z_\ell\) for
every \(\ell \in [s]\). Then, by our construction of the circuit, the map
satisfies the condition that for vertex \(v\) with in-neighbors \(u\) and
\(w\), the value \(Z(v)\) is the NAND of \(Z(u)\) and \(Z(w)\) (replacing
missing neighbors with the value \(0\)), and hence in particular for every
\(j\in [m]\), the value assigned in the last line that touches
`y_`

\(\expr{j}\) equals \(y_j\). Thus the circuit \(C\) does compute the same
function \(F\).

For the “if” direction, we need to transform an \(s\)-gate circuit
\(C=(V,E,L)\) that computes \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) into an
\(S\)-line NAND program \(P\) that computes the same function. We start by
doing a topological
sort of the graph
\(C\). That is we sort the vertex set \(V\) as \(\{v_0,\ldots,v_{n+S-1} \}\)
such that \(\overrightarrow{v_i v_j} \in E\), \(v_i < v_j\). Such a sorting
can be found for every DAG (see also Reference:topologicalsortex).
Moreover, because the input vertices of \(C\) are “sources” (have
in-degree zero), we can ensure they are placed first in this sorting and
moreover for every \(i\in [n]\), \(v_i\) is the input vertex labeled with
`x_`

\(\expr{i}\).

Now for \(\ell=0,1,\ldots,n+s-1\) we will define a variable \(var(\ell)\) in
our resulting program as follows: If \(\ell<n\) then \(var(\ell)\) equals
`x_`

\(\expr{i}\). If \(v_\ell\) is an output gate labeled with
`y_`

\(\expr{j}\) then \(var(\ell)\) equals `y_`

\(\expr{j}\). otherwise
\(var(\ell)\) will be a temporary workspace variable
`temp_`

\(\expr{\ell-n}\). Our program \(P\) will have \(s\) lines, where for
every \(k\in [s]\), if the in-neighbors of \(v_{n+k}\) are \(v_i\) and \(v_j\)
then the \(k\)-th line in the program will be \(var(n+k)\) `:=`

\(var(i)\)
`NAND`

\(var(j)\). If \(v_k\) has fewer than two in-neighbors then we
replace the corresponding variable with the variable `zero`

(which is
never set to any value and hence retains its default value of \(0\).

To complete the proof of the “if” direction we need to show that the
program \(P\) we constructed computes the same function \(F\) as the circuit
\(C\) we were given. Indeed, let \(x\in \{0,1\}^n\) and \(y=F(x)\). Since \(C\)
computes \(F\), there is a map \(Z:V \rightarrow \{0,1\}\) as per
Reference:NANDcirccomputedef. We claim that if we run the program \(P\) on
input \(x\), then for every \(k\in [s]\) the value assigned by the \(k\)-th
line corresponds to \(Z(v_{n+k})\). Indeed by construction the value
assigned in the \(k\)-th line corresponds to the NAND of the value
assigned to the in-neighbors of \(v_{n+k}\). Hence in particular if
\(v_{n+k}\) is the output gate labeled `y_`

\(\expr{j}\) then this value will
equal \(y_j\), meaning that on input \(x\) our program will output \(y=F(x)\).

## Composition from graphs

Given Reference:circuitprogequivthm, we can reprove our composition theorems in the circuit formalism, which has the advantage of making them more intuitive. That is, we can prove Reference:seqcompositionthm and Reference:parcompositionthm by showing how to transform a circuits for \(F\) and \(G\) into circuits for \(F \circ G\) and \(F \oplus G\). This is what we do now:

If \(C,D\) are NAND circuits such that \(C\) computes \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(D\) computes \(G:\{0,1\}^m \rightarrow \{0,1\}^k\) then there is a circuit \(E\) of size \(|C|+|D|\) computing the function \(G\circ F:\{0,1\}^n \rightarrow \{0,1\}^k\).

Let \(C\) be the \(n\)-input \(m\)-output circuit computing \(F\) and \(D\) be the
\(m\)-input \(k\)-output circuit computing \(G\). The circuit to compute
\(G \circ F\) is illustrated in Reference:serialcompfig. We simply “stack”
\(D\) after \(C\), by obtaining a combined circuit with \(n\) inputs and
\(|C|+|D|\) gates. The gates of \(C\) remain the same, except that we
identify the output gates of \(C\) with the input gates of \(D\). That is,
for every edge that connected the \(i\)-th input of \(D\) to a gate \(v\) of
\(D\), we now connect to \(v\) the output gate of \(C\) corresponding to
`y_`

\(\expr{i}\) instead. After doing so, we remove the output labels from
\(C\) and keep only the outputs of \(D\). For every input \(x\), if we execute
the composed circuits on \(x\) (i.e., compute a map \(Z\) from the vertices
to \(\{0,1\}\) as per Reference:NANDcirccomputedef), then the output gates
of \(C\) will get the values corresponding to \(F(x)\) and hence the output
gates of \(D\) will have the value \(G(F(x))\).

If \(C,D\) are NAND circuits such that \(C\) computes \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(D\) computes \(G:\{0,1\}^{n'} \rightarrow \{0,1\}^{m'}\) then there is a circuit \(E\) of size \(|C|+|D|\) computing the function \(G\oplus F : \{0,1\}^{n+n'} \rightarrow \{0,1\}^{m+m'}\).

If \(C,D\) are circuits that compute \(F,G\) then we can transform them to a
circuit \(E\) that computes \(F \oplus G\) as in
Reference:parallelcompositioncircfig. The circuit \(E\) simply consists of
two disjoint copies of the circuits \(C\) and \(D\), where we modify the
labelling of the inputs of \(D\) from `x_`

\(0\),\(\ldots\),`x_`

\(n'-1\) to
`x_`

\(n\),\(\ldots\),`x_`

\(n+n'-1\) and the labelling of the outputs of \(D\)
from `y_`

\(0\),\(\ldots\),`y_`

\(m'-1\) to `y_`

\(m\),\(\ldots\),`y_`

\(m+m'-1\). By
the fact that \(C\) and \(D\) compute \(F\) and \(G\) respectively, we see that
\(E\) computes the function
\(F \oplus G: \{0,1\}^{n+n'}\rightarrow \{0,1\}^{m+m'}\) that on input
\(x \in \{0,1\}^{n+n'}\) outputs
\(F(x_0,\ldots,x_{n-1})G(x_n,\ldots,x_{n+n'-1})\).

## General Boolean circuits: a formal definition

We now define the notion of *general* Boolean circuits that can use any
set \(B\) of gates and not just the NAND gate.

A *basis for Boolean circuits* is a finite set
\(B = \{ g_0 , \ldots , g_{c-1} \}\) of finite Boolean functions, where
each function \(g \in B\) maps strings of some finite length (which we
denote by \(in(g)\)) to \(\{0,1\}\).

We now define the notion of a general Boolean circuit with gates from \(B\).Just as we defined canonical variables in Reference:NANDcanonical, it will be convenient for us to assume that the vertex set of such a circuit is an interval of the form \(\{0,1,2,\ldots,n+s \}\) for \(n,s \in \N\), where the first \(n\) vertices correspond to the inputs and the last \(m\) vertices correspond to the outputs.

Let \(B\) be a basis for Boolean circuits. A *circuit over the basis \(B\)*
(or *\(B\)-circuit* for short) with \(n\) inputs and \(m\) outputs is a
labeled directed acyclic graph (DAG) over the vertex set \([n+s]\) for
\(s\in \N\). The vertices \(\{0,\ldots, n-1\}\) are known as the “input
variables” and have in-degree zero. Every vertex apart from the input
variables is known as a *gate*. Each such vertex is labeled with a
function \(g \in B\) and has in-degree \(in(g)\). The last \(m\) vertices
\(\{ n+s-m,\ldots, n+s-1 \}\) have out-degree zero and are known as the
*output gates*. We denote the circuit as \(C=([n+s],E,L)\) where \([n+s],E\)
are the vertices and edges of the circuit, and
\(L:\{n,\ldots,n+s-1\} \rightarrow B\) is the labeling function that maps
vertices into the set \(B\).

To make sure you understand this definition, stop and think how a Boolean circuits with AND, OR, and NOT gates corresponds to a \(B\)-circuit per Reference:circuits-def, where \(B= \{ AND, OR, NOT \}\) and \(AND:\{0,1\}^2 \rightarrow \{0,1\}\) is the function \(AND(a,b)=a \cdot b\), \(OR(a,b) \rightarrow \{0,1\}\) is the function \(OR(a,b) = 1-(1-a)(1-b)\) and \(NOT:\{0,1\} \rightarrow \{0,1\}\) is the function \(NOT(a)=1-a\).Another commonly used notation \(x \wedge y\) for \(AND(x,y)\), \(x \vee y\) for \(OR(x,y)\) and \(\overline{x}\) or \(\neg x\) for \(NOT(x)\).

The *size* of a circuit \(C\), denoted by \(|C|\), is the number of gates
that it contains. An \(n\)-input \(m\)-output circuit \(C=([n+s],E,L)\)
computes a function \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) as follows. For
every input \(x\in \{0,1\}^n\), we inductively define the *value* of every
vertex based on its incoming edges:

- For the source vertices \(\{0,\ldots,i-1\}\) we define \(val(i) =x_i\) for all \(i\in [n]\).
- For a non source vertex \(v\) that is labeled with \(g\in B\), if its incoming neighbors are vertices \(v_1,\ldots,v_k\) (sorted in order) and their values have all been set then we let \(\val(v)=f(\val(v_1),\ldots,\val(v_k))\).
- Go back to step 2 until all vertices have values.
- Output \(\val(n+s-m),\ldots,\val(n+s-1)\).

The *output* of the circuit \(C\) on input \(x\), denoted by \(C(x)\), is the
string \(y\in \{0,1\}^m\) outputted by this process. We say that the
circuit \(C\) *computes the function \(F\)* if for every \(x\in \{0,1\}^n\),
\(C(x)=F(x)\).

We have seen in Reference:NAND-univ-thm that *every* function
\(f:\{0,1\}^k \rightarrow \{0,1\}\) has a NAND program with at most
\(10\cdot 2^k\) lines, and hence Reference:NAND-circ-thm implies the
following theorem (see Reference:NAND-all-circ-thm-ex):The bound that comes out of the proof of Reference:NAND-univ-thm
is \(5\cdot 2^k\) and in fact can be easily optimized further. As \(k\)
grows, we can also use the bound of \(O(2^k/k)\) mentioned in
Reference:tight-upper-bound.

For every function \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(B\) a subset of the functions from \(\{0,1\}^k\) to \(\{0,1\}\), if we let \(S_{NAND}(f)\) denote the smallest number of lines in a NAND program that computes \(F\) and \(S_B(f)\) denote the smallest number of vertices in a Boolean circuit with the basis \(B\), then \[ S_{NAND}(f) \leq (10\cdot 2^k)S_B(f) \]

One can ask whether there is an equivalence here as well. However, this is not the case. For example if the set \(B\) only consists of constant functions, then clearly a circuit whose gates are in \(B\) cannot compute any non-constant function. A slightly less boring example is if \(B\) consists of the \(\wedge\) (i.e. AND) function (as opposed to the \(NAND\) function). One can show that such a circuit will always output \(0\) on the all zero inputs, and hence it can never compute the simple negation function \(\neg:\{0,1\} \rightarrow \{0,1\}\) such that \(\neg(x)=1-x\).

We say that a subset \(B\) of functions from \(k\) bits to a single bit is a
*universal basis* if there is a “\(B\)-circuit” (i.e., circuit all whose
gates are labeled with functions in \(B\)) that computes the \(NAND\)
function. Reference:universal-basis asks you to explore some examples of
universal and non-universal bases.

The *depth* of a Boolean circuit is the length of the longest path in
it. The notion of depth is tightly connected to the *parallelism
complexity* of the circuit. “Shallow” circuits are easier to
parallelize, since a \(k\) long path we mean that we have a sequence of
\(k\) gates that each needs to wait for the output of the other until it
completes its computation.

It is a good exercise for you to verify that every function \(F:\{0,1\}^n \rightarrow \{0,1\}\) has a circuit that computes it which is of \(O(2^n)\) (in fact even \(O(2^n/n)\)) size and \(O(\log n)\) depth. However, there are functions that require at least \(\log n/10\) depth (can you see why?). There are also function for which the smallest size known circuits that compute them requires a much larger depth.

## Neural networks

One particular basis we can use are *threshold gates*. For every vector
\(w= (w_0,\ldots,w_{k-1})\) of integers and integer \(t\) (some or all of
whom could be negative), the *threshold function corresponding to \(w,t\)*
is the function \(T_{w,t}:\{0,1\}^k \rightarrow \{0,1\}\) that maps
\(x\in \{0,1\}^k\) to \(1\) if and only if
\(\sum_{i=0}^{k-1} w_i x_i \geq t\). For example, the threshold function
\(T_{w,t}\) corresponding to \(w=(1,1,1,1,1)\) and \(t=3\) is simply the
majority function \(MAJ_5\) on \(\{0,1\}^5\). The function
\(NAND:\{0,1\}^2 \rightarrow \{0,1\}\) is the threshold function
corresponding to \(w=(-1,-1)\) and \(t=-1\), since \(NAND(x_0,x_1)=1\) if and
only if \(x_0 + x_1 \leq 1\) or equivalently, \(-x_0 - x_1 \geq -1\).

Threshold gates can be thought of as an approximation for *neuron cells*
that make up the core of human and animal brains. To a first
approximation, a neuron has \(k\) inputs and a single output and the
neurons “fires” or “turns on” its output when those signals pass some
threshold.Typically we think of an input to neurons as being a real number
rather than a binary string, but we can reduce to the binary case by
representing a real number in the binary basis, and multiplying the
weight of the bit corresponding to the \(i^{th}\) digit by \(2^i\). Hence circuits with threshold gates are sometimes known
as *neural networks*. Unlike the cases above, when we considered \(k\) to
be a small constant, in such neural networks we often do not put any
bound on the number of inputs. However, since any threshold function on
\(k\) inputs can be computed by a NAND program of \(poly(k)\) lines (see
Reference:threshold-nand-ex), the power of NAND programs and neural
networks is not very different.

## Biological computing

Computation can be based on biological or chemical
systems.
For example the *lac* operon
produces the enzymes needed to digest lactose only if the conditions
\(x \wedge (\neg y)\) hold where \(x\) is “lactose is present” and \(y\) is
“glucose is present”. Researchers have managed to create
transistors,
and from them the NAND function and other logic gates, based on DNA
molecules (see also Reference:transcriptorfig). One motivation for DNA
computing is to achieve increased parallelism or storage density;
another is to create “smart biological agents” that could perhaps be
injected into bodies, replicate themselves, and fix or kill cells that
were damaged by a disease such as cancer. Computing in biological
systems is not restricted of course to DNA. Even larger systems such as
flocks of
birds
can be considered as computational processes.

## Cellular automata and the game of life

As we will discuss later, cellular automata such as Conway’s “Game of Life” can be used to simulate computation gates, see Reference:gameoflifefig.

## Circuit evaluation algorithm

A Boolean circuit is a labeled graph, and hence we can use the
*adjacency list* representation to represent an \(s\)-vertex circuit over
an arity-\(k\) basis \(B\) by \(s\) elements of \(B\) (that can be identified
with numbers in \([|B|]\)) and \(s\) lists of \(k\) numbers in \([s]\). Hence
for every fixed basis \(B\) we can represent such a circuit by a string of
length \(O(s \log s)\).The implicit constant in the \(O\) notation can depend on the basis
\(B\). We can define \(CIRCEVAL_{B,s,n,m}\) to be the
function that takes as input a pair \((C,x)\) where \(C\) is string
describing an \(s\)-size \(n\)-input \(m\)-output circuit over \(B\), and an
input \(x\in \{0,1\}^n\), and returns the evaluation of \(C\) on the input
\(x\).

Reference:NAND-all-circ-thm implies that every circuit \(C\) of \(s\) gates over a \(k\)-ary basis \(B\) can be transformed into a NAND program of \(s'=O(s\cdot 2^k)\) lines, and hence we can combine this transformation with last lecture’s evaluation procedure for NAND programs to conclude that \(CIRCEVAL\) for circuits of \(s\) gates over \(B\) can be computed by a NAND program of \(O(s'^2 \log s)= O(s^2 2^{2k}(\log s + k))\) lines.In fact, as we mentioned, it is possible to improve this to \(O(s' \log^2 s')=O(s2^k(\log s + k)^2)\) lines.

### Advanced note: evaluating circuits in quasilinear time.

We can improve the evaluation procedure, and evaluate \(s\)-size constant fan-in circuits (or NAND programs) in \(O(s polylog(s))\) lines.

## The physical extended Church-Turing thesis

We’ve seen that NAND gates can be implemented using very different systems in the physical world. What about the reverse direction? Can NAND programs simulate any physical computer?

We can take a leap of faith and stipulate that NAND programs do actually
encapsulate *every* computation that we can think of. Such a statement
(in the realm of infinite functions, which we’ll encounter in a couple
of lectures) is typically attributed to Alonzo Church and Alan Turing,
and in that context is known as the *Church Turing Thesis*. As we will
discuss in future lectures, the Church-Turing Thesis is not a
mathematical theorem or conjecture. Rather, like theories in physics,
the Church-Turing Thesis is about mathematically modelling the real
world. In the context of finite functions, we can make the following
informal hypothesis or prediction:

If a function \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) can be computed in the physical world using \(s\) amount of “physical resources” then it can be computed by a NAND program of roughly \(s\) lines.

We call this hypothesis the **“Physical Extended Church-Turing Thesis”**
or *PECTT* for short. A priori it might seem rather extreme to
hypothesize that our meager NAND model captures all possible physical
computation. But yet, in more than a century of computing technologies,
no one has yet built any scalable computing device that challenges this
hypothesis.

We now discuss the “fine print” of the PECTT in more detail, as well as the (so far unsuccessful) challenges that have been raised against it. There is no single universally-agreed-upon formalization of “roughly \(s\) physical resources”, but we can approximate this notion by considering the size of any physical computing device and the time it takes to compute the output, and ask that any such device can be simulated by a NAND program with a number of lines that is a polynomial (with not too large exponent) in the size of the system and the time it takes it to operate.

In other words, we can phrase the PECTT as stipulating that any function
that can be computed by a device of volume \(V\) and time \(t\), must be
computable by a NAND program that has at most \(\alpha(Vt)^\beta\) lines
for some constants \(\alpha,\beta\). The exact values for \(\alpha,\beta\)
are not so clear, but it is generally accepted that if
\(F:\{0,1\}^n \rightarrow \{0,1\}\) is an *exponentially hard* function,
in the sense that it has no NAND program of fewer than, say, \(2^{n/2}\)
lines, then a demonstration of a physical device that can compute \(F\)
for moderate input lengths (e.g., \(n=500\)) would be a violation of the
PECTT.

Advanced note: making things concrete:We can attempt at a more exact phrasing of the PECTT as follows. Suppose that \(Z\) is a physical system that accepts \(n\) binary stimuli and has a binary output, and can be enclosed in a sphere of volume \(V\). We say that the system \(Z\)computesa function \(F:\{0,1\}^n \rightarrow \{0,1\}\) within \(t\) seconds if whenever we set the stimuli to some value \(x\in \{0,1\}^n\), if we measure the output after \(t\) seconds. We can phrase the PECTT as stipulating that whenever there exists such a system \(Z\) computes \(F\) within \(t\) seconds, there exists a NAND program that computes \(F\) of at most \(\alpha(Vt)^2\) lines, where \(\alpha\) is some normalization constant.We can also consider variants where we use surface area instead of volume, or use a different power than \(2\). However, none of these choices makes a qualitative difference to the discussion below. In particular, suppose that \(F:\{0,1\}^n \rightarrow \{0,1\}\) is a function that requires \(2^n/(100n)>2^{0.8n}\) lines for any NAND program (we have seen that such functions exist in the last lecture). Then the PECTT would imply that either the volume or the time of a system that computes \(F\) will have to be at least \(2^{0.2 n}/\sqrt{\alpha}\). To fully make it concrete, we need to decide on the units for measuring time and volume, and the normalization constant \(\alpha\). One conservative choice is to assume that we could squeeze computation to the absolute physical limits (which are many orders of magnitude beyond current technology). This corresponds to setting \(\alpha=1\) and using the Planck units for volume and time. ThePlanck length\(\ell_P\) (which is, roughly speaking, the shortest distance that can theoretically be measured) is roughly \(2^{-120}\) meters. ThePlanck time\(t_P\) (which is the time it takes for light to travel one Planck length) is about \(2^{-150}\) seconds. In the above setting, if a function \(F\) takes, say, 1KB of input (e.g., roughly \(10^4\) bits, which can encode a \(100\) by \(100\) bitmap image), and requires at least \(2^{0.8 n}= 2^{0.8 \cdot 10^4}\) NAND lines to compute, then any physical system that computes it would require either volume of \(2^{0.2\cdot 10^4}\) Planck length cubed, which is more than \(2^{1500}\) meters cubed or take at least \(2^{0.2 \cdot 10^4}\) Planck Time units, which is larger than \(2^{1500}\) seconds. To get a sense of how big that number is, note that the universe is only about \(2^{60}\) seconds old, and its observable radius is only roughly \(2^{90}\) meters. This suggests that it is possible toempirically falsifythe PECTT by presenting a smaller-than-universe-size system that solves such a function.There are of course several hurdles to refuting the PECTT in this way, one of which is that we can’t actually test the system on all possible inputs. However, it turns we can get around this issue using notions such asinteractive proofsandprogram checkingthat we will see later in this course. Another, perhaps more salient problem, is that while we know many hard functions exist, at the moment there isno single explicit function\(F:\{0,1\}^n \rightarrow \{0,1\}\) for which we canprovean \(\omega(n)\) (let alone \(\Omega(2^n/n)\)) lower bound on the number of lines that a NAND program needs to compute it.

### Attempts at refuting the PECTT

One of the admirable traits of mankind is the refusal to accept limitations. In the best case this is manifested by people achieving longstanding “impossible” challenges such as heavier-than-air flight, putting a person on the moon, circumnavigating the globe, or even resolving Fermat’s Last Theorem. In the worst case it is manifested by people continually following the footsteps of previous failures to try to do proven-impossible tasks such as build a perpetual motion machine, trisect an angle with a compass and straightedge, or refute Bell’s inequality. The Physical Extended Church Turing thesis (in its various forms) has attracted both types of people. Here are some physical devices that have been speculated to achieve computational tasks that cannot be done by not-too-large NAND programs:

**Spaghetti sort:**One of the first lower bounds that Computer Science students encounter is that sorting \(n\) numbers requires making \(\Omega(n \log n)\) comparisons. The “spaghetti sort” is a description of a proposed “mechanical computer” that would do this faster. The idea is that to sort \(n\) numbers \(x_1,\ldots,x_n\), we could cut \(n\) spaghetti noodles into lengths \(x_1,\ldots,x_n\), and then if we simply hold them together in our hand and bring them down to a flat surface, they will emerge in sorted order. There are a great many reasons why this is not truly a challenge to the PECTT hypothesis, and I will not ruin the reader’s fun in finding them out by her or himself.**Soap bubbles:**One function \(F:\{0,1\}^n \rightarrow \{0,1\}\) that is conjectured to require a large number of NAND lines to solve is the*Euclidean Steiner Tree*problem. This is the problem where one is given \(m\) points in the plane \((x_1,y_1),\ldots,(x_m,y_m)\) (say with integer coordinates ranging from \(1\) till \(m\), and hence the list can be represented as a string of \(n=O(m \log m)\) size) and some number \(K\). The goal is to figure out whether it is possible to connect all the points by line segments of total length at most \(K\). This function is conjectured to be hard because it is*NP complete*- a concept that we’ll encounter later in this course - and it is in fact reasonable to conjecture that as \(m\) grows, the number of NAND lines required to compute this function grows*exponentially*in \(m\), meaning that the PECTT would predict that if \(m\) is sufficiently large (such as few hundreds or so) then no physical device could compute \(F\). Yet, some people claimed that there is in fact a very simple physical device that could solve this problem, that can be constructed using some wooden pegs and soap. The idea is that if we take two glass plates, and put \(m\) wooden pegs between them in the locations \((x_1,y_1),\ldots,(x_m,y_m)\) then bubbles will form whose edges touch those pegs in the way that will minimize the total energy which turns out to be a function of the total length of the line segments. The problem with this device of course is that nature, just like people, often gets stuck in “local optima”. That is, the resulting configuration will not be one that achieves the absolute minimum of the total energy but rather one that can’t be improved with local changes. Aaronson has carried out actual experiments (see Reference:aaronsonsoapfig), and saw that while this device often is successful for three or four pegs, it starts yielding suboptimal results once the number of pegs grows beyond that.

**DNA computing.**People have suggested using the properties of DNA to do hard computational problems. The main advantage of DNA is the ability to potentially encode a lot of information in relatively small physical space, as well as compute on this information in a highly parallel manner. At the time of this writing, it was demonstrated that one can use DNA to store about \(10^{16}\) bits of information in a region of radius about milimiter, as opposed to about \(10^{10}\) bits with the best known hard disk technology. This does not posit a real challenge to the PECTT but does suggest that one should be conservative about the choice of constant and not assume that current hard disk + silicon technologies are the absolute best possible.We were extremely conservative in the suggested parameters for the PECTT, having assumed that as many as \(\ell_P^{-2}10^{-6} \sim 10^{61}\) bits could potentially be stored in a milimeter radius region.**Continuous/real computers.**The physical world is often described using continuous quantities such as time and space, and people have suggested that analog devices might have direct access to computing with real-valued quantities and would be inherently more powerful than discrete models such as NAND machines. Whether the “true” physical world is continuous or discrete is an open question. In fact, we do not even know how to precisely*phrase*this question, let alone answer it. Yet, regardless of the answer, it seems clear that the effort to measure a continuous quantity grows with the level of accuracy desired, and so there is no “free lunch” or way to bypass the PECTT using such machines (see also this paper). Related to that are proposals known as “hypercomputing” or “Zeno’s computers” which attempt to use the continuity of time by doing the first operation in one second, the second one in half a second, the third operation in a quarter second and so on.. These fail for a similar reason to the one guaranteeing that Achilles will eventually catch the tortoise despite the original Zeno’s paradox.**Relativity computer and time travel.**The formulation above assumed the notion of time, but under the theory of relativity time is in the eye of the observer. One approach to solve hard problems is to leave the computer to run for a lot of time from*his*perspective, but to ensure that this is actually a short while from*our*perspective. One approach to do so is for the user to start the computer and then go for a quick jog at close to the speed of light before checking on its status. Depending on how fast one goes, few seconds from the point of view of the user might correspond to centuries in computer time (it might even finish updating its Windows operating system!). Of course the catch here is that the energy required from the user is proportional to how close one needs to get to the speed of light. A more interesting proposal is to use time travel via*closed timelike curves (CTCs)*. In this case we could run an arbitrarily long computation by doing some calculations, remembering the current state, and the travelling back in time to continue where we left off. Indeed, if CTCs exist then we’d probably have to revise the PECTT (though in this case I will simply travel back in time and edit these notes, so I can claim I never conjectured it in the first place…)**Humans.**Another computing system that has been proposed as a counterexample to the PECTT is a 3 pound computer of about 0.1m radius, namely the human brain. Humans can walk around, talk, feel, and do others things that are not commonly done by NAND programs, but can they compute partial functions that NAND programs cannot? There are certainly computational tasks that*at the moment*humans do better than computers (e.g., play some video games, at the moment), but based on our current understanding of the brain, humans (or other animals) have no*inherent*computational advantage over computers. The brain has about \(10^{11}\) neurons, each operating in a speed of about \(1000\) operations per seconds. Hence a rough first approximation is that a NAND program of about \(10^{14}\) lines could simulate one second of a brain’s activity.This is a very rough approximation that could be wrong to a few orders of magnitude in either direction. For one, there are other structures in the brain apart from neurons that one might need to simulate, hence requiring higher overhead. On ther other hand, it is by no mean clear that we need to fully clone the brain in order to achieve the same computational tasks that it does. Note that the fact that such a NAND program (likely) exists does not mean it is easy to*find*it. After all, constructing this program took evolution billions of years. Much of the recent efforts in artificial intelligence research is focused on finding programs that replicate some of the brain’s capabilities and they take massive computational effort to discover, these programs often turn out to be much smaller than the pessimistic estimates above. For example, at the time of this writing, Google’s neural network for machine translation has about \(10^4\) nodes (and can be simulated by a NAND program of comparable size). Philosophers, priests and many others have since time immemorial argued that there is something about humans that cannot be captured by mechanical devices such as computers; whether or not that is the case, the evidence is thin that humans can perform computational tasks that are inherently impossible to achieve by computers of similar complexity.There are some well known scientists that have advocated that humans have inherent computational advantages over computers. See also this.**Quantum computation.**The most compelling attack on the Physical Extended Church Turing Thesis comes from the notion of*quantum computing*. The idea was initiated by the observation that systems with strong quantum effects are very hard to simulate on a computer. Turning this observation on its head, people have proposed using such systems to perform computations that we do not know how to do otherwise. At the time of this writing, Scalable quantum computers have not yet been built, but it is a fascinating possibility, and one that does not seem to contradict any known law of nature. We will discuss quantum computing in much more detail later in this course. Modeling it will essentially involve extending the NAND programming language to the “QNAND” programming language that has one more (very special) operation. However, the main take away is that while quantum computing does suggest we need to amend the PECTT, it does*not*require a complete revision of our worldview. Indeed, almost all of the content of this course remains the same whether the underlying computational model is the “classical” model of NAND programs or the quantum model of QNAND programs (also known as*quantum circuits*).

While even the precise phrasing of the PECTT, let alone understanding
its correctness, is still a subject of research, some variant of it is
already implicitly assumed in practice. A statement such as “this
cryptosystem provides 128 bits of security” really means that **(a)** it
is conjectured that there is no Boolean circuit (or, equivalently, a
NAND gate) of size much smaller than \(2^{128}\) that can break the
system,We say “conjectured” and not “proved” because, while we can
phrase such a statement as a precise mathematical conjecture, at the
moment we are unable to *prove* such a statement for any
cryptosystem. This is related to the P vs NP question we will
discuss in future lectures and **(b)** we assume that no other physical mechanism can
do better, and hence it would take roughly a \(2^{128}\) amount of
“resources” to break the system.

## Lecture summary

- NAND gates can be implemented by a variety of physical means.
- NAND programs are equivalent (up to constants) to Boolean circuits using any finite universal basis.
- By a leap of faith, we could hypothesize that the number of lines in
the smallest NAND program for a function \(F\) captures roughly the
amount of physical resources required to compute \(F\). This statement
is known as the
*Physical Extended Church-Turing Thesis (PECTT)*. - NAND programs capture a surprisingly wide array of computational
models. The strongest currently known challenge to the PECTT comes
from the potential for using quantum mechanical effects to speed-up
computation, a model known as
*quantum computers*.

## Exercises

Prove Reference:NAND-circ-thm.

For every one of the following sets, either prove that it is a universal
basis or prove that it is not. 1. \(B = \{ \wedge, \vee, \neg \}\). (To
make all of them be function on two inputs, define
\(\neg(x,y)=\overline{x}\).)

2. \(B = \{ \wedge, \vee \}\).

3. \(B= \{ \oplus,0,1 \}\) where \(\oplus:\{0,1\}^2 \rightarrow \{0,1\}\) is
the XOR function and \(0\) and \(1\) are the constant functions that output
\(0\) and \(1\).

4. \(B = \{ LOOKUP_1,0,1 \}\) where \(0\) and \(1\) are the constant functions
as above and \(LOOKUP_1:\{0,1\}^3 \rightarrow \{0,1\}\) satisfies
\(LOOKUP_1(a,b,c)\) equals \(a\) if \(c=0\) and equals \(b\) if \(c=1\).

Prove that for every subset \(B\) of the functions from \(\{0,1\}^k\) to \(\{0,1\}\), if \(B\) is universal then there is a \(B\)-circuit of at most \(O(k)\) gates to compute the \(NAND\) function (you can start by showing that there is a \(B\) circuit of at most \(O(k^{16})\) gates).Thanks to Alec Sun for solving this problem.

Prove that for every \(w,t\), the function \(T_{w,t}\) can be computed by a NAND program of at most \(O(k^3)\) lines.

## Bibliographical notes

Scott Aaronson’s blog post on how information is physical is a good discussion on issues related to the physical extended Church-Turing Physics. Aaronson’s survey on NP complete problems and physical reality is also a great source for some of these issues, though might be easier to read after we reach the lectures on NP and NP completeness.

## Further explorations

Some topics related to this lecture that might be accessible to advanced students include:

- The notion of the fundamental limits for information and their interplay with physics, is still not well understood.