Introduction to Theoretical Computer Science — Boaz Barak

# Physical implementations of NAND programs

• Understand how NAND programs can map to physical processes in a variety of ways.
• Learn the model of Boolean circuits and get proficient in moving between description of a NAND program as a code and as a circuit or labeled graph.
• See that NAND is a universal basis for circuits, and examples for universal and non-universal families of gates.
• Understand the physical extended Church-Turing thesis that NAND programs capture all feasible computation in the physical world, and its physical and philosophical implications.

“In existing digital computing devices various mechanical or electrical devices have been used as elements: Wheels, which can be locked … which on moving from one position to another transmit electric pulses that may cause other similar wheels to move; single or combined telegraph relays, actuated by an electromagnet and opening or closing electric circuits; combinations of these two elements;—and finally there exists the plausible and tempting possibility of using vacuum tubes”, John von Neumann, first draft of a report on the EDVAC, 1945

We have defined NAND programs as a model for computation, but is this model only a mathematical abstraction, or is it connected in some way to physical reality? For example, if a function $$F:\{0,1\}^n \rightarrow \{0,1\}$$ can be computed by a NAND program of $$s$$ lines, is it possible, given an actual input $$x\in \{0,1\}^n$$, to compute $$F(x)$$ in the real world using an amount of resources that is roughly proportional to $$s$$?

In some sense, we already know that the answer to this question is Yes. We have seen a Python program that can evaluate NAND programs, and so if we have a NAND program $$P$$, we can use any computer with Python installed on it to evaluate $$P$$ on inputs of our choice. But do we really need modern computers and programming languages to run NAND programs? And can we understand more directly how we can map such programs to actual physical processes that produce an output from an input? This is the content of this lecture.

We will also talk about the following “dual” question. Suppose we have some way to compute a function $$F:\{0,1\}^n \rightarrow \{0,1\}$$ using roughly an $$s$$ amount of “physical resources” such as material, energy, time, etc.. Does this mean that there is also a NAND program to compute $$F$$ using a number of lines that is not much bigger than $$s$$? This might seem like a wishful fantasy, but we will see that the answer to this question might be (up to some important caveats) essentially Yes as well.

## Physical implementation of computing devices.

Computation is an abstract notion, that is distinct from its physical implementations. While most modern computing devices are obtained by mapping logical gates to semi-conductor based transistors, over history people have computed using a huge variety of mechanisms, including mechanical systems, gas and liquid (known as fluidics), biological and chemical processes, and even living creatures (e.g., see Reference:crabfig or this video for how crabs or slime mold can be used to do computations).

In this lecture we review some of these implementations, both so you can get an appreciation of how it is possible to directly translate NAND programs to the physical world, without going through the entire stack of architecture, operating systems, compilers, etc… as well as to emphasize that silicon-based processors are by no means the only way to perform computation. Indeed, as we will see much later in this course, a very exciting recent line of works involves using different media for computation that would allow us to take advantage of quantum mechanical effects to enable different types of algorithms.

## Transistors and physical logic gates

A transistor can be thought of as an electric circuit with two inputs, known as source and gate and an output, known as the sink. The gate controls whether current flows from the source to the sink. In a standard transistor, if the gate is “ON” then current can flow from the source to the sink and if it is “OFF” then it can’t. In a complementary transistor this is reversed: if the gate is “OFF” then current can flow from the source to the sink and if it is “ON” then it can’t.

There are several ways to implement the logic of a transistor. For example, we can use faucets to implement it using water pressure (e.g. Reference:transistor-water-fig).This might seem as curiosity but there is a field known as fluidics concerned with implementing logical operations using liquids or gasses. Some of the motivations include operating in extreme environmental conditions such as in space or a battlefield, where standard electronic equipment would not survive. However, the standard implementation uses electrical current. One of the original implementations used vacuum tubes. As its name implies, a vacuum tube is a tube containing nothing (i.e., vacuum) and where a priori electrons could freely flow from source (a wire) to the sink (a plate). However, there is a gate (a grid) between the two, where modulating its voltage can block the flow of electrons.

Early vacuum tubes were roughly the size of lightbulbs (and looked very much like them too). In the 1950’s they were supplanted by transistors, which implement the same logic using semiconductors which are materials that normally do not conduct electricity but whose conductivity can be modified and controlled by inserting impurities (“doping”) and an external electric field (this is known as the field effect). In the 1960’s computers were started to be implemented using integrated circuits which enabled much greater density. In 1965, Gordon Moore predicted that the number of transistors per circuit would double every year (see Reference:moorefig), and that this would lead to “such wonders as home computers —or at least terminals connected to a central computer— automatic controls for automobiles, and personal portable communications equipment”. Since then, (adjusted versions of) this so-called “Moore’s law” has been running strong, though exponential growth cannot be sustained forever, and some physical limitations are already becoming apparent.

## Gates and circuits

We can use transistors to implement a NAND gate, which would be a system with two input wires $$x,y$$ and one output wire $$z$$, such that if we identify high voltage with “$$1$$” and low voltage with “$$0$$”, then the wire $$z$$ will equal to “$$1$$” if and only if the NAND of the values of the wires $$x$$ and $$y$$ is $$1$$ (see Reference:transistor-nand-fig).

More generally, we can use transistors to implement the model of Boolean circuits. We list the formal definition below, but let us start with the informal one:

Let $$B$$ be some set of functions (known as “gates”) from $$\bits^k$$ to $$\{0,1\}$$. A Boolean circuit with the basis $$B$$ is obtained by connecting “gates” which compute functions in $$B$$ together by “wires” where each gate has $$k$$ wires going into it and one wire going out of it. We have $$n$$ special wires known as the “input wires” and $$m$$ special wires known as the “output wires”. To compute a function $$F:\{0,1\}^n \rightarrow \{0,1\}^m$$ using a circuit, we feed the bits of $$x$$ to the $$n$$ input wires, and then each gate computes the corresponding function, and we “read off” the output $$y\in \{0,1\}^m$$ from the $$m$$ output wires.

The number $$k$$ is known as the arity of the basis $$B$$. We think of $$k$$ as a small number (such as $$k=2$$ or $$k=3$$) and so the idea behind a Boolean circuit is that we can compute complex functions by combining together the simple components which are the functions in $$B$$. It turns out that NAND programs correspond to circuits where the basis is the single function $$NAND:\{0,1\}^2 \rightarrow \{0,1\}$$. We now show this more formally.

## Representing programs as graphs

We now define NAND programs as circuits, using the notion of directed acyclic graphs (DAGs).

If you are not comfortable with the definitions of graphs, and in particular directed acyclic graphs (DAGs), now would be a great time to go back to the “mathematical background” lecture, as well as some of the resources here, and review these notions.

A NAND circuit with $$n$$ inputs and $$m$$ outputs is a labeled directed acyclic graph (DAG) in which every vertex has in-degree at most two. We require that there are $$n$$ vertices with in-degree zero, known as input variables, that are labeled with x_$$\expr{i}$$ for $$i\in [n]$$. Every vertex apart from the input variables is known as a gate. We require that there are $$m$$ vertices of out-degree zero, denoted as the output gates, and that are labeled with y_$$\expr{j}$$ for $$j\in [m]$$. While not all vertices are labeled, no two vertices get the same label. We denote the circuit as $$C=(V,E,L)$$ where $$V,E$$ are the vertices and edges of the circuit, and $$L:V \rightarrow_p S$$ is the (partial) one-to-one labeling function that maps vertices into the set $$S=\{$$ x_0,$$\ldots$$,x_$$\expr{n-1}$$,y_0,$$\ldots$$, y_$$\expr{m-1}$$,$$\}$$. The size of a circuit $$C$$, denoted by $$|C|$$, is the number of gates that it contains.

The definition of NAND circuits is not ultimately that complicated, but may take a second or third read to fully parse. It might help to look at Reference:XORcircuitfig, which describes the NAND circuit that corresponds to the 4-line NAND program we presented above for the $$XOR_2$$ function.

A NAND circuit corresponds to computation in the following way. To compute some output on an input $$x\in \{0,1\}^n$$, we start by assigning to the input vertex labeled with x_$$\expr{i}$$ the value $$x_i$$, and then proceed by assigning for every gate $$v$$ the value that is the NAND of the values assigned to its in-neighbors (if it has less than two in-neighbors, we replace the value of the missing neighbors by zero). The output $$y\in \{0,1\}^m$$ corresponds to the value assigned to the output gates, with $$y_j$$ equal to the value assigned to the value assigned to the gate labeled y_$$\expr{j}$$ for every $$j\in [m]$$. Formally, this is defined as follows:

Let $$F:\{0,1\}^n \rightarrow \{0,1\}^m$$ and let $$C=(V,E,L)$$ be a NAND circuit with $$n$$ inputs and $$m$$ outputs. We say that $$C$$ computes $$F$$ if there is a map $$Z:V \rightarrow \{0,1\}$$, such that for every $$x\in \{0,1\}^n$$, if $$y=F(x)$$ then:
* For every $$i\in [n]$$, if $$v$$ is labeled with x_$$\expr{i}$$ then $$Z(v)=x_i$$.
* For every $$j\in[m]$$, if $$v$$ is labeled with y_$$\expr{j}$$ then $$Z(v)=y_j$$.
* For every gate $$v$$ with in-neighbors $$u,w$$, if $$a=Z(u)$$ and $$b=Z(w)$$, then $$Z(v)=NAND(a,b)$$. (If $$v$$ has fewer than two neighbors then we replace either $$b$$ or both $$a$$ and $$b$$ with zero in the condition above.)

You should make sure you understand why Reference:NANDcirccomputedef captures the informal description above. This might require reading the definition a second or third time, but would be crucial for the rest of this course. Moreover, a priori it is not clear that for every circuit $$C$$ and assignment $$x$$ there is a map $$Z:V \rightarrow \{0,1\}$$ that satisfies the conditions of Reference:NANDcirccomputedef. However, this follows from Reference:circuitprogequivthm

The following theorem says that these two notions of computing a function are actually equivalent: we can transform a NAND program into a NAND circuit computing the same function, and vice versa.

For every $$F:\{0,1\}^n \rightarrow \{0,1\}^m$$ and $$s\in \N$$, $$F$$ can be computed by an $$s$$-line NAND program if and only if $$F$$ can be computed by an $$n$$-input $$m$$-output NAND circuit of $$s$$ gates.

The idea behind the proof is simple (see Reference:NANDcircuit_transfig for an example). Just like we did to the XOR program, if we have a NAND program $$P$$ of $$s$$ lines, $$n$$ inputs, and $$m$$ outputs, we can transform it into a NAND circuit with $$n$$ inputs and $$s$$ gates (i.e., a graph of $$n+s$$ vertices, $$n$$ of which are sources), where each gate corresponds to a line in the program $$P$$. If line $$\ell$$ involves the NAND of two variables foo and bar then if $$\ell'$$ and $$\ell''$$ are the lines where foo and bar were last assigned a value, then we add edges going into the gate corresponding to $$\ell$$ from the gates corresponding to $$\ell',\ell''$$. (If one of the variables was an input variable, then we add an edge from that variable, if one of them was an uninitialized then we add no edge, and use the convention that it amounts to defaulting to zero.) In the other direction, we can transform a NAND circuit $$C$$ of $$n$$ inputs, $$m$$ outputs and $$s$$ gates to an $$s$$-line program by essentially inverting this process. For every gate in the program, we will have a line in the program which assigns to a variable the NAND of the variables corresponding to the in-neighbors of this gate. If the gate is an output gate labeled with y_$$\expr{j}$$ then the corresponding line will assign the value to the variable y_$$\expr{j}$$. Otherwise we will assign the value to a fresh “workspace” variable. We now show the formal proof.

We start with the “only if” direction. That is, we show how to transform a NAND program to a circuit. Suppose that $$P$$ is an $$S$$ line program that computes $$F$$. We will build a NAND circuit $$C=(V,E,L)$$ that computes $$F$$ as follows. The vertex set $$V$$ will have the $$n+s$$ elements $$\{ (0,0), \ldots, (0,n-1),(1,0),\ldots,(1,s-1) \}$$. That is, it will have $$n$$ vertices of the form $$(0,i)$$ for $$i\in [n]$$ (corresponding to the $$n$$ inputs), and $$S$$ vertices of the form $$(1,\ell)$$ (corresponding to the lines in the program). For every line $$\ell$$ in the program $$P$$ of the form foo := bar NAND baz, we put edges in the graph of the form $$\overrightarrow{(1,\ell')\;(1,\ell)}$$ and $$\overrightarrow{(1,\ell'')\;(1,\ell)}$$ where $$\ell'$$ and $$\ell''$$ are the last lines before $$\ell$$ in which the variables bar and baz were assigned a value. If the variable bar and/or baz was not assigned a value prior to the $$\ell$$-th line and is not an input variable then we don’t add a corresponding edge. If the variable bar and/or baz is an input variable x_$$\expr{i}$$ then we add the edge $$\overrightarrow{(0,i)\;(1,\ell)}$$. We label the vertices of the form $$(0,i)$$ with x_$$\expr{i}$$ for every $$i\in [n]$$. For every $$j\in[m]$$, let $$\ell$$ be the last line in which the variable y_$$\expr{j}$$ is assigned a value,As noted in the appendix, valid NAND programs must assign a value to all their output variables. and label the vertex $$(1,\ell)$$ with y_$$\expr{j}$$. Note that the vertices of the form $$(0,i)$$ have in-degree zero, and all edges of the form $$\overrightarrow{(1,\ell')\;(1,\ell)}$$ satisfy $$\ell>\ell'$$. Hence this graph is a DAG, as in any cycle there would have to be at least on edge going from a vertex of the form $$(1,\ell)$$ to a vertex of the form $$(1,\ell')$$ for $$\ell'<\ell$$ (can you see why?). Also, since we don’t allow a variable of the form y_$$\expr{j}$$ on the right-hand side of a NAND operation, the output vertices have out-degree zero.

To complete the proof of the “only if” direction, we need to show that the circuit $$C$$ we constructed computes the same function $$F$$ as the program $$P$$ we were given. Indeed, let $$x\in \{0,1\}^n$$ and $$y = F(x)$$. For every $$\ell$$, let $$z_\ell$$ be the value that is assigned by the $$\ell$$-th line in the execution of $$P$$ on input $$x$$. Now, as per Reference:NANDcirccomputedef, define the map $$Z:V \rightarrow \{0,1\}$$ as follows: $$Z((0,i))=x_i$$ for $$i\in [n]$$ and $$Z((1,\ell))=z_\ell$$ for every $$\ell \in [s]$$. Then, by our construction of the circuit, the map satisfies the condition that for vertex $$v$$ with in-neighbors $$u$$ and $$w$$, the value $$Z(v)$$ is the NAND of $$Z(u)$$ and $$Z(w)$$ (replacing missing neighbors with the value $$0$$), and hence in particular for every $$j\in [m]$$, the value assigned in the last line that touches y_$$\expr{j}$$ equals $$y_j$$. Thus the circuit $$C$$ does compute the same function $$F$$.

For the “if” direction, we need to transform an $$s$$-gate circuit $$C=(V,E,L)$$ that computes $$F:\{0,1\}^n \rightarrow \{0,1\}^m$$ into an $$S$$-line NAND program $$P$$ that computes the same function. We start by doing a topological sort of the graph $$C$$. That is we sort the vertex set $$V$$ as $$\{v_0,\ldots,v_{n+S-1} \}$$ such that $$\overrightarrow{v_i v_j} \in E$$, $$v_i < v_j$$. Such a sorting can be found for every DAG (see also Reference:topologicalsortex). Moreover, because the input vertices of $$C$$ are “sources” (have in-degree zero), we can ensure they are placed first in this sorting and moreover for every $$i\in [n]$$, $$v_i$$ is the input vertex labeled with x_$$\expr{i}$$.

Now for $$\ell=0,1,\ldots,n+s-1$$ we will define a variable $$var(\ell)$$ in our resulting program as follows: If $$\ell<n$$ then $$var(\ell)$$ equals x_$$\expr{i}$$. If $$v_\ell$$ is an output gate labeled with y_$$\expr{j}$$ then $$var(\ell)$$ equals y_$$\expr{j}$$. otherwise $$var(\ell)$$ will be a temporary workspace variable temp_$$\expr{\ell-n}$$. Our program $$P$$ will have $$s$$ lines, where for every $$k\in [s]$$, if the in-neighbors of $$v_{n+k}$$ are $$v_i$$ and $$v_j$$ then the $$k$$-th line in the program will be $$var(n+k)$$ := $$var(i)$$ NAND $$var(j)$$. If $$v_k$$ has fewer than two in-neighbors then we replace the corresponding variable with the variable zero (which is never set to any value and hence retains its default value of $$0$$.

To complete the proof of the “if” direction we need to show that the program $$P$$ we constructed computes the same function $$F$$ as the circuit $$C$$ we were given. Indeed, let $$x\in \{0,1\}^n$$ and $$y=F(x)$$. Since $$C$$ computes $$F$$, there is a map $$Z:V \rightarrow \{0,1\}$$ as per Reference:NANDcirccomputedef. We claim that if we run the program $$P$$ on input $$x$$, then for every $$k\in [s]$$ the value assigned by the $$k$$-th line corresponds to $$Z(v_{n+k})$$. Indeed by construction the value assigned in the $$k$$-th line corresponds to the NAND of the value assigned to the in-neighbors of $$v_{n+k}$$. Hence in particular if $$v_{n+k}$$ is the output gate labeled y_$$\expr{j}$$ then this value will equal $$y_j$$, meaning that on input $$x$$ our program will output $$y=F(x)$$.

## Composition from graphs

Given Reference:circuitprogequivthm, we can reprove our composition theorems in the circuit formalism, which has the advantage of making them more intuitive. That is, we can prove Reference:seqcompositionthm and Reference:parcompositionthm by showing how to transform a circuits for $$F$$ and $$G$$ into circuits for $$F \circ G$$ and $$F \oplus G$$. This is what we do now:

If $$C,D$$ are NAND circuits such that $$C$$ computes $$F:\{0,1\}^n \rightarrow \{0,1\}^m$$ and $$D$$ computes $$G:\{0,1\}^m \rightarrow \{0,1\}^k$$ then there is a circuit $$E$$ of size $$|C|+|D|$$ computing the function $$G\circ F:\{0,1\}^n \rightarrow \{0,1\}^k$$.

Let $$C$$ be the $$n$$-input $$m$$-output circuit computing $$F$$ and $$D$$ be the $$m$$-input $$k$$-output circuit computing $$G$$. The circuit to compute $$G \circ F$$ is illustrated in Reference:serialcompfig. We simply “stack” $$D$$ after $$C$$, by obtaining a combined circuit with $$n$$ inputs and $$|C|+|D|$$ gates. The gates of $$C$$ remain the same, except that we identify the output gates of $$C$$ with the input gates of $$D$$. That is, for every edge that connected the $$i$$-th input of $$D$$ to a gate $$v$$ of $$D$$, we now connect to $$v$$ the output gate of $$C$$ corresponding to y_$$\expr{i}$$ instead. After doing so, we remove the output labels from $$C$$ and keep only the outputs of $$D$$. For every input $$x$$, if we execute the composed circuits on $$x$$ (i.e., compute a map $$Z$$ from the vertices to $$\{0,1\}$$ as per Reference:NANDcirccomputedef), then the output gates of $$C$$ will get the values corresponding to $$F(x)$$ and hence the output gates of $$D$$ will have the value $$G(F(x))$$.

If $$C,D$$ are NAND circuits such that $$C$$ computes $$F:\{0,1\}^n \rightarrow \{0,1\}^m$$ and $$D$$ computes $$G:\{0,1\}^{n'} \rightarrow \{0,1\}^{m'}$$ then there is a circuit $$E$$ of size $$|C|+|D|$$ computing the function $$G\oplus F : \{0,1\}^{n+n'} \rightarrow \{0,1\}^{m+m'}$$.

If $$C,D$$ are circuits that compute $$F,G$$ then we can transform them to a circuit $$E$$ that computes $$F \oplus G$$ as in Reference:parallelcompositioncircfig. The circuit $$E$$ simply consists of two disjoint copies of the circuits $$C$$ and $$D$$, where we modify the labelling of the inputs of $$D$$ from x_$$0$$,$$\ldots$$,x_$$n'-1$$ to x_$$n$$,$$\ldots$$,x_$$n+n'-1$$ and the labelling of the outputs of $$D$$ from y_$$0$$,$$\ldots$$,y_$$m'-1$$ to y_$$m$$,$$\ldots$$,y_$$m+m'-1$$. By the fact that $$C$$ and $$D$$ compute $$F$$ and $$G$$ respectively, we see that $$E$$ computes the function $$F \oplus G: \{0,1\}^{n+n'}\rightarrow \{0,1\}^{m+m'}$$ that on input $$x \in \{0,1\}^{n+n'}$$ outputs $$F(x_0,\ldots,x_{n-1})G(x_n,\ldots,x_{n+n'-1})$$.

## General Boolean circuits: a formal definition

We now define the notion of general Boolean circuits that can use any set $$B$$ of gates and not just the NAND gate.

A basis for Boolean circuits is a finite set $$B = \{ g_0 , \ldots , g_{c-1} \}$$ of finite Boolean functions, where each function $$g \in B$$ maps strings of some finite length (which we denote by $$in(g)$$) to $$\{0,1\}$$.

We now define the notion of a general Boolean circuit with gates from $$B$$.Just as we defined canonical variables in Reference:NANDcanonical, it will be convenient for us to assume that the vertex set of such a circuit is an interval of the form $$\{0,1,2,\ldots,n+s \}$$ for $$n,s \in \N$$, where the first $$n$$ vertices correspond to the inputs and the last $$m$$ vertices correspond to the outputs.

Let $$B$$ be a basis for Boolean circuits. A circuit over the basis $$B$$ (or $$B$$-circuit for short) with $$n$$ inputs and $$m$$ outputs is a labeled directed acyclic graph (DAG) over the vertex set $$[n+s]$$ for $$s\in \N$$. The vertices $$\{0,\ldots, n-1\}$$ are known as the “input variables” and have in-degree zero. Every vertex apart from the input variables is known as a gate. Each such vertex is labeled with a function $$g \in B$$ and has in-degree $$in(g)$$. The last $$m$$ vertices $$\{ n+s-m,\ldots, n+s-1 \}$$ have out-degree zero and are known as the output gates. We denote the circuit as $$C=([n+s],E,L)$$ where $$[n+s],E$$ are the vertices and edges of the circuit, and $$L:\{n,\ldots,n+s-1\} \rightarrow B$$ is the labeling function that maps vertices into the set $$B$$.

To make sure you understand this definition, stop and think how a Boolean circuits with AND, OR, and NOT gates corresponds to a $$B$$-circuit per Reference:circuits-def, where $$B= \{ AND, OR, NOT \}$$ and $$AND:\{0,1\}^2 \rightarrow \{0,1\}$$ is the function $$AND(a,b)=a \cdot b$$, $$OR(a,b) \rightarrow \{0,1\}$$ is the function $$OR(a,b) = 1-(1-a)(1-b)$$ and $$NOT:\{0,1\} \rightarrow \{0,1\}$$ is the function $$NOT(a)=1-a$$.Another commonly used notation $$x \wedge y$$ for $$AND(x,y)$$, $$x \vee y$$ for $$OR(x,y)$$ and $$\overline{x}$$ or $$\neg x$$ for $$NOT(x)$$.

The size of a circuit $$C$$, denoted by $$|C|$$, is the number of gates that it contains. An $$n$$-input $$m$$-output circuit $$C=([n+s],E,L)$$ computes a function $$F:\{0,1\}^n \rightarrow \{0,1\}^m$$ as follows. For every input $$x\in \{0,1\}^n$$, we inductively define the value of every vertex based on its incoming edges:

1. For the source vertices $$\{0,\ldots,i-1\}$$ we define $$val(i) =x_i$$ for all $$i\in [n]$$.
2. For a non source vertex $$v$$ that is labeled with $$g\in B$$, if its incoming neighbors are vertices $$v_1,\ldots,v_k$$ (sorted in order) and their values have all been set then we let $$\val(v)=f(\val(v_1),\ldots,\val(v_k))$$.
3. Go back to step 2 until all vertices have values.
4. Output $$\val(n+s-m),\ldots,\val(n+s-1)$$.

The output of the circuit $$C$$ on input $$x$$, denoted by $$C(x)$$, is the string $$y\in \{0,1\}^m$$ outputted by this process. We say that the circuit $$C$$ computes the function $$F$$ if for every $$x\in \{0,1\}^n$$, $$C(x)=F(x)$$.

We have seen in Reference:NAND-univ-thm that every function $$f:\{0,1\}^k \rightarrow \{0,1\}$$ has a NAND program with at most $$10\cdot 2^k$$ lines, and hence Reference:NAND-circ-thm implies the following theorem (see Reference:NAND-all-circ-thm-ex):The bound that comes out of the proof of Reference:NAND-univ-thm is $$5\cdot 2^k$$ and in fact can be easily optimized further. As $$k$$ grows, we can also use the bound of $$O(2^k/k)$$ mentioned in Reference:tight-upper-bound.

For every function $$F:\{0,1\}^n \rightarrow \{0,1\}^m$$ and $$B$$ a subset of the functions from $$\{0,1\}^k$$ to $$\{0,1\}$$, if we let $$S_{NAND}(f)$$ denote the smallest number of lines in a NAND program that computes $$F$$ and $$S_B(f)$$ denote the smallest number of vertices in a Boolean circuit with the basis $$B$$, then $S_{NAND}(f) \leq (10\cdot 2^k)S_B(f)$

One can ask whether there is an equivalence here as well. However, this is not the case. For example if the set $$B$$ only consists of constant functions, then clearly a circuit whose gates are in $$B$$ cannot compute any non-constant function. A slightly less boring example is if $$B$$ consists of the $$\wedge$$ (i.e. AND) function (as opposed to the $$NAND$$ function). One can show that such a circuit will always output $$0$$ on the all zero inputs, and hence it can never compute the simple negation function $$\neg:\{0,1\} \rightarrow \{0,1\}$$ such that $$\neg(x)=1-x$$.

We say that a subset $$B$$ of functions from $$k$$ bits to a single bit is a universal basis if there is a “$$B$$-circuit” (i.e., circuit all whose gates are labeled with functions in $$B$$) that computes the $$NAND$$ function. Reference:universal-basis asks you to explore some examples of universal and non-universal bases.

The depth of a Boolean circuit is the length of the longest path in it. The notion of depth is tightly connected to the parallelism complexity of the circuit. “Shallow” circuits are easier to parallelize, since a $$k$$ long path we mean that we have a sequence of $$k$$ gates that each needs to wait for the output of the other until it completes its computation.

It is a good exercise for you to verify that every function $$F:\{0,1\}^n \rightarrow \{0,1\}$$ has a circuit that computes it which is of $$O(2^n)$$ (in fact even $$O(2^n/n)$$) size and $$O(\log n)$$ depth. However, there are functions that require at least $$\log n/10$$ depth (can you see why?). There are also function for which the smallest size known circuits that compute them requires a much larger depth.

## Neural networks

One particular basis we can use are threshold gates. For every vector $$w= (w_0,\ldots,w_{k-1})$$ of integers and integer $$t$$ (some or all of whom could be negative), the threshold function corresponding to $$w,t$$ is the function $$T_{w,t}:\{0,1\}^k \rightarrow \{0,1\}$$ that maps $$x\in \{0,1\}^k$$ to $$1$$ if and only if $$\sum_{i=0}^{k-1} w_i x_i \geq t$$. For example, the threshold function $$T_{w,t}$$ corresponding to $$w=(1,1,1,1,1)$$ and $$t=3$$ is simply the majority function $$MAJ_5$$ on $$\{0,1\}^5$$. The function $$NAND:\{0,1\}^2 \rightarrow \{0,1\}$$ is the threshold function corresponding to $$w=(-1,-1)$$ and $$t=-1$$, since $$NAND(x_0,x_1)=1$$ if and only if $$x_0 + x_1 \leq 1$$ or equivalently, $$-x_0 - x_1 \geq -1$$.

Threshold gates can be thought of as an approximation for neuron cells that make up the core of human and animal brains. To a first approximation, a neuron has $$k$$ inputs and a single output and the neurons “fires” or “turns on” its output when those signals pass some threshold.Typically we think of an input to neurons as being a real number rather than a binary string, but we can reduce to the binary case by representing a real number in the binary basis, and multiplying the weight of the bit corresponding to the $$i^{th}$$ digit by $$2^i$$. Hence circuits with threshold gates are sometimes known as neural networks. Unlike the cases above, when we considered $$k$$ to be a small constant, in such neural networks we often do not put any bound on the number of inputs. However, since any threshold function on $$k$$ inputs can be computed by a NAND program of $$poly(k)$$ lines (see Reference:threshold-nand-ex), the power of NAND programs and neural networks is not very different.

## Biological computing

Computation can be based on biological or chemical systems. For example the lac operon produces the enzymes needed to digest lactose only if the conditions $$x \wedge (\neg y)$$ hold where $$x$$ is “lactose is present” and $$y$$ is “glucose is present”. Researchers have managed to create transistors, and from them the NAND function and other logic gates, based on DNA molecules (see also Reference:transcriptorfig). One motivation for DNA computing is to achieve increased parallelism or storage density; another is to create “smart biological agents” that could perhaps be injected into bodies, replicate themselves, and fix or kill cells that were damaged by a disease such as cancer. Computing in biological systems is not restricted of course to DNA. Even larger systems such as flocks of birds can be considered as computational processes.

## Cellular automata and the game of life

As we will discuss later, cellular automata such as Conway’s “Game of Life” can be used to simulate computation gates, see Reference:gameoflifefig.

## Circuit evaluation algorithm

A Boolean circuit is a labeled graph, and hence we can use the adjacency list representation to represent an $$s$$-vertex circuit over an arity-$$k$$ basis $$B$$ by $$s$$ elements of $$B$$ (that can be identified with numbers in $$[|B|]$$) and $$s$$ lists of $$k$$ numbers in $$[s]$$. Hence for every fixed basis $$B$$ we can represent such a circuit by a string of length $$O(s \log s)$$.The implicit constant in the $$O$$ notation can depend on the basis $$B$$. We can define $$CIRCEVAL_{B,s,n,m}$$ to be the function that takes as input a pair $$(C,x)$$ where $$C$$ is string describing an $$s$$-size $$n$$-input $$m$$-output circuit over $$B$$, and an input $$x\in \{0,1\}^n$$, and returns the evaluation of $$C$$ on the input $$x$$.

Reference:NAND-all-circ-thm implies that every circuit $$C$$ of $$s$$ gates over a $$k$$-ary basis $$B$$ can be transformed into a NAND program of $$s'=O(s\cdot 2^k)$$ lines, and hence we can combine this transformation with last lecture’s evaluation procedure for NAND programs to conclude that $$CIRCEVAL$$ for circuits of $$s$$ gates over $$B$$ can be computed by a NAND program of $$O(s'^2 \log s)= O(s^2 2^{2k}(\log s + k))$$ lines.In fact, as we mentioned, it is possible to improve this to $$O(s' \log^2 s')=O(s2^k(\log s + k)^2)$$ lines.

### Advanced note: evaluating circuits in quasilinear time.

We can improve the evaluation procedure, and evaluate $$s$$-size constant fan-in circuits (or NAND programs) in $$O(s polylog(s))$$ lines.

## The physical extended Church-Turing thesis

We’ve seen that NAND gates can be implemented using very different systems in the physical world. What about the reverse direction? Can NAND programs simulate any physical computer?

We can take a leap of faith and stipulate that NAND programs do actually encapsulate every computation that we can think of. Such a statement (in the realm of infinite functions, which we’ll encounter in a couple of lectures) is typically attributed to Alonzo Church and Alan Turing, and in that context is known as the Church Turing Thesis. As we will discuss in future lectures, the Church-Turing Thesis is not a mathematical theorem or conjecture. Rather, like theories in physics, the Church-Turing Thesis is about mathematically modelling the real world. In the context of finite functions, we can make the following informal hypothesis or prediction:

If a function $$F:\{0,1\}^n \rightarrow \{0,1\}^m$$ can be computed in the physical world using $$s$$ amount of “physical resources” then it can be computed by a NAND program of roughly $$s$$ lines.

We call this hypothesis the “Physical Extended Church-Turing Thesis” or PECTT for short. A priori it might seem rather extreme to hypothesize that our meager NAND model captures all possible physical computation. But yet, in more than a century of computing technologies, no one has yet built any scalable computing device that challenges this hypothesis.

We now discuss the “fine print” of the PECTT in more detail, as well as the (so far unsuccessful) challenges that have been raised against it. There is no single universally-agreed-upon formalization of “roughly $$s$$ physical resources”, but we can approximate this notion by considering the size of any physical computing device and the time it takes to compute the output, and ask that any such device can be simulated by a NAND program with a number of lines that is a polynomial (with not too large exponent) in the size of the system and the time it takes it to operate.

In other words, we can phrase the PECTT as stipulating that any function that can be computed by a device of volume $$V$$ and time $$t$$, must be computable by a NAND program that has at most $$\alpha(Vt)^\beta$$ lines for some constants $$\alpha,\beta$$. The exact values for $$\alpha,\beta$$ are not so clear, but it is generally accepted that if $$F:\{0,1\}^n \rightarrow \{0,1\}$$ is an exponentially hard function, in the sense that it has no NAND program of fewer than, say, $$2^{n/2}$$ lines, then a demonstration of a physical device that can compute $$F$$ for moderate input lengths (e.g., $$n=500$$) would be a violation of the PECTT.

Advanced note: making things concrete: We can attempt at a more exact phrasing of the PECTT as follows. Suppose that $$Z$$ is a physical system that accepts $$n$$ binary stimuli and has a binary output, and can be enclosed in a sphere of volume $$V$$. We say that the system $$Z$$ computes a function $$F:\{0,1\}^n \rightarrow \{0,1\}$$ within $$t$$ seconds if whenever we set the stimuli to some value $$x\in \{0,1\}^n$$, if we measure the output after $$t$$ seconds. We can phrase the PECTT as stipulating that whenever there exists such a system $$Z$$ computes $$F$$ within $$t$$ seconds, there exists a NAND program that computes $$F$$ of at most $$\alpha(Vt)^2$$ lines, where $$\alpha$$ is some normalization constant.We can also consider variants where we use surface area instead of volume, or use a different power than $$2$$. However, none of these choices makes a qualitative difference to the discussion below. In particular, suppose that $$F:\{0,1\}^n \rightarrow \{0,1\}$$ is a function that requires $$2^n/(100n)>2^{0.8n}$$ lines for any NAND program (we have seen that such functions exist in the last lecture). Then the PECTT would imply that either the volume or the time of a system that computes $$F$$ will have to be at least $$2^{0.2 n}/\sqrt{\alpha}$$. To fully make it concrete, we need to decide on the units for measuring time and volume, and the normalization constant $$\alpha$$. One conservative choice is to assume that we could squeeze computation to the absolute physical limits (which are many orders of magnitude beyond current technology). This corresponds to setting $$\alpha=1$$ and using the Planck units for volume and time. The Planck length $$\ell_P$$ (which is, roughly speaking, the shortest distance that can theoretically be measured) is roughly $$2^{-120}$$ meters. The Planck time $$t_P$$ (which is the time it takes for light to travel one Planck length) is about $$2^{-150}$$ seconds. In the above setting, if a function $$F$$ takes, say, 1KB of input (e.g., roughly $$10^4$$ bits, which can encode a $$100$$ by $$100$$ bitmap image), and requires at least $$2^{0.8 n}= 2^{0.8 \cdot 10^4}$$ NAND lines to compute, then any physical system that computes it would require either volume of $$2^{0.2\cdot 10^4}$$ Planck length cubed, which is more than $$2^{1500}$$ meters cubed or take at least $$2^{0.2 \cdot 10^4}$$ Planck Time units, which is larger than $$2^{1500}$$ seconds. To get a sense of how big that number is, note that the universe is only about $$2^{60}$$ seconds old, and its observable radius is only roughly $$2^{90}$$ meters. This suggests that it is possible to empirically falsify the PECTT by presenting a smaller-than-universe-size system that solves such a function.There are of course several hurdles to refuting the PECTT in this way, one of which is that we can’t actually test the system on all possible inputs. However, it turns we can get around this issue using notions such as interactive proofs and program checking that we will see later in this course. Another, perhaps more salient problem, is that while we know many hard functions exist, at the moment there is no single explicit function $$F:\{0,1\}^n \rightarrow \{0,1\}$$ for which we can prove an $$\omega(n)$$ (let alone $$\Omega(2^n/n)$$) lower bound on the number of lines that a NAND program needs to compute it.

### Attempts at refuting the PECTT

One of the admirable traits of mankind is the refusal to accept limitations. In the best case this is manifested by people achieving longstanding “impossible” challenges such as heavier-than-air flight, putting a person on the moon, circumnavigating the globe, or even resolving Fermat’s Last Theorem. In the worst case it is manifested by people continually following the footsteps of previous failures to try to do proven-impossible tasks such as build a perpetual motion machine, trisect an angle with a compass and straightedge, or refute Bell’s inequality. The Physical Extended Church Turing thesis (in its various forms) has attracted both types of people. Here are some physical devices that have been speculated to achieve computational tasks that cannot be done by not-too-large NAND programs:

• Spaghetti sort: One of the first lower bounds that Computer Science students encounter is that sorting $$n$$ numbers requires making $$\Omega(n \log n)$$ comparisons. The “spaghetti sort” is a description of a proposed “mechanical computer” that would do this faster. The idea is that to sort $$n$$ numbers $$x_1,\ldots,x_n$$, we could cut $$n$$ spaghetti noodles into lengths $$x_1,\ldots,x_n$$, and then if we simply hold them together in our hand and bring them down to a flat surface, they will emerge in sorted order. There are a great many reasons why this is not truly a challenge to the PECTT hypothesis, and I will not ruin the reader’s fun in finding them out by her or himself.
• Soap bubbles: One function $$F:\{0,1\}^n \rightarrow \{0,1\}$$ that is conjectured to require a large number of NAND lines to solve is the Euclidean Steiner Tree problem. This is the problem where one is given $$m$$ points in the plane $$(x_1,y_1),\ldots,(x_m,y_m)$$ (say with integer coordinates ranging from $$1$$ till $$m$$, and hence the list can be represented as a string of $$n=O(m \log m)$$ size) and some number $$K$$. The goal is to figure out whether it is possible to connect all the points by line segments of total length at most $$K$$. This function is conjectured to be hard because it is NP complete - a concept that we’ll encounter later in this course - and it is in fact reasonable to conjecture that as $$m$$ grows, the number of NAND lines required to compute this function grows exponentially in $$m$$, meaning that the PECTT would predict that if $$m$$ is sufficiently large (such as few hundreds or so) then no physical device could compute $$F$$. Yet, some people claimed that there is in fact a very simple physical device that could solve this problem, that can be constructed using some wooden pegs and soap. The idea is that if we take two glass plates, and put $$m$$ wooden pegs between them in the locations $$(x_1,y_1),\ldots,(x_m,y_m)$$ then bubbles will form whose edges touch those pegs in the way that will minimize the total energy which turns out to be a function of the total length of the line segments. The problem with this device of course is that nature, just like people, often gets stuck in “local optima”. That is, the resulting configuration will not be one that achieves the absolute minimum of the total energy but rather one that can’t be improved with local changes. Aaronson has carried out actual experiments (see Reference:aaronsonsoapfig), and saw that while this device often is successful for three or four pegs, it starts yielding suboptimal results once the number of pegs grows beyond that.
• DNA computing. People have suggested using the properties of DNA to do hard computational problems. The main advantage of DNA is the ability to potentially encode a lot of information in relatively small physical space, as well as compute on this information in a highly parallel manner. At the time of this writing, it was demonstrated that one can use DNA to store about $$10^{16}$$ bits of information in a region of radius about milimiter, as opposed to about $$10^{10}$$ bits with the best known hard disk technology. This does not posit a real challenge to the PECTT but does suggest that one should be conservative about the choice of constant and not assume that current hard disk + silicon technologies are the absolute best possible.We were extremely conservative in the suggested parameters for the PECTT, having assumed that as many as $$\ell_P^{-2}10^{-6} \sim 10^{61}$$ bits could potentially be stored in a milimeter radius region.
• Continuous/real computers. The physical world is often described using continuous quantities such as time and space, and people have suggested that analog devices might have direct access to computing with real-valued quantities and would be inherently more powerful than discrete models such as NAND machines. Whether the “true” physical world is continuous or discrete is an open question. In fact, we do not even know how to precisely phrase this question, let alone answer it. Yet, regardless of the answer, it seems clear that the effort to measure a continuous quantity grows with the level of accuracy desired, and so there is no “free lunch” or way to bypass the PECTT using such machines (see also this paper). Related to that are proposals known as “hypercomputing” or “Zeno’s computers” which attempt to use the continuity of time by doing the first operation in one second, the second one in half a second, the third operation in a quarter second and so on.. These fail for a similar reason to the one guaranteeing that Achilles will eventually catch the tortoise despite the original Zeno’s paradox.
• Relativity computer and time travel. The formulation above assumed the notion of time, but under the theory of relativity time is in the eye of the observer. One approach to solve hard problems is to leave the computer to run for a lot of time from his perspective, but to ensure that this is actually a short while from our perspective. One approach to do so is for the user to start the computer and then go for a quick jog at close to the speed of light before checking on its status. Depending on how fast one goes, few seconds from the point of view of the user might correspond to centuries in computer time (it might even finish updating its Windows operating system!). Of course the catch here is that the energy required from the user is proportional to how close one needs to get to the speed of light. A more interesting proposal is to use time travel via closed timelike curves (CTCs). In this case we could run an arbitrarily long computation by doing some calculations, remembering the current state, and the travelling back in time to continue where we left off. Indeed, if CTCs exist then we’d probably have to revise the PECTT (though in this case I will simply travel back in time and edit these notes, so I can claim I never conjectured it in the first place…)
• Humans. Another computing system that has been proposed as a counterexample to the PECTT is a 3 pound computer of about 0.1m radius, namely the human brain. Humans can walk around, talk, feel, and do others things that are not commonly done by NAND programs, but can they compute partial functions that NAND programs cannot? There are certainly computational tasks that at the moment humans do better than computers (e.g., play some video games, at the moment), but based on our current understanding of the brain, humans (or other animals) have no inherent computational advantage over computers. The brain has about $$10^{11}$$ neurons, each operating in a speed of about $$1000$$ operations per seconds. Hence a rough first approximation is that a NAND program of about $$10^{14}$$ lines could simulate one second of a brain’s activity.This is a very rough approximation that could be wrong to a few orders of magnitude in either direction. For one, there are other structures in the brain apart from neurons that one might need to simulate, hence requiring higher overhead. On ther other hand, it is by no mean clear that we need to fully clone the brain in order to achieve the same computational tasks that it does. Note that the fact that such a NAND program (likely) exists does not mean it is easy to find it. After all, constructing this program took evolution billions of years. Much of the recent efforts in artificial intelligence research is focused on finding programs that replicate some of the brain’s capabilities and they take massive computational effort to discover, these programs often turn out to be much smaller than the pessimistic estimates above. For example, at the time of this writing, Google’s neural network for machine translation has about $$10^4$$ nodes (and can be simulated by a NAND program of comparable size). Philosophers, priests and many others have since time immemorial argued that there is something about humans that cannot be captured by mechanical devices such as computers; whether or not that is the case, the evidence is thin that humans can perform computational tasks that are inherently impossible to achieve by computers of similar complexity.There are some well known scientists that have advocated that humans have inherent computational advantages over computers. See also this.
• Quantum computation. The most compelling attack on the Physical Extended Church Turing Thesis comes from the notion of quantum computing. The idea was initiated by the observation that systems with strong quantum effects are very hard to simulate on a computer. Turning this observation on its head, people have proposed using such systems to perform computations that we do not know how to do otherwise. At the time of this writing, Scalable quantum computers have not yet been built, but it is a fascinating possibility, and one that does not seem to contradict any known law of nature. We will discuss quantum computing in much more detail later in this course. Modeling it will essentially involve extending the NAND programming language to the “QNAND” programming language that has one more (very special) operation. However, the main take away is that while quantum computing does suggest we need to amend the PECTT, it does not require a complete revision of our worldview. Indeed, almost all of the content of this course remains the same whether the underlying computational model is the “classical” model of NAND programs or the quantum model of QNAND programs (also known as quantum circuits).

While even the precise phrasing of the PECTT, let alone understanding its correctness, is still a subject of research, some variant of it is already implicitly assumed in practice. A statement such as “this cryptosystem provides 128 bits of security” really means that (a) it is conjectured that there is no Boolean circuit (or, equivalently, a NAND gate) of size much smaller than $$2^{128}$$ that can break the system,We say “conjectured” and not “proved” because, while we can phrase such a statement as a precise mathematical conjecture, at the moment we are unable to prove such a statement for any cryptosystem. This is related to the P vs NP question we will discuss in future lectures and (b) we assume that no other physical mechanism can do better, and hence it would take roughly a $$2^{128}$$ amount of “resources” to break the system.

## Lecture summary

• NAND gates can be implemented by a variety of physical means.
• NAND programs are equivalent (up to constants) to Boolean circuits using any finite universal basis.
• By a leap of faith, we could hypothesize that the number of lines in the smallest NAND program for a function $$F$$ captures roughly the amount of physical resources required to compute $$F$$. This statement is known as the Physical Extended Church-Turing Thesis (PECTT).
• NAND programs capture a surprisingly wide array of computational models. The strongest currently known challenge to the PECTT comes from the potential for using quantum mechanical effects to speed-up computation, a model known as quantum computers.

## Exercises

For every one of the following sets, either prove that it is a universal basis or prove that it is not. 1. $$B = \{ \wedge, \vee, \neg \}$$. (To make all of them be function on two inputs, define $$\neg(x,y)=\overline{x}$$.)
2. $$B = \{ \wedge, \vee \}$$.
3. $$B= \{ \oplus,0,1 \}$$ where $$\oplus:\{0,1\}^2 \rightarrow \{0,1\}$$ is the XOR function and $$0$$ and $$1$$ are the constant functions that output $$0$$ and $$1$$.
4. $$B = \{ LOOKUP_1,0,1 \}$$ where $$0$$ and $$1$$ are the constant functions as above and $$LOOKUP_1:\{0,1\}^3 \rightarrow \{0,1\}$$ satisfies $$LOOKUP_1(a,b,c)$$ equals $$a$$ if $$c=0$$ and equals $$b$$ if $$c=1$$.

Prove that for every subset $$B$$ of the functions from $$\{0,1\}^k$$ to $$\{0,1\}$$, if $$B$$ is universal then there is a $$B$$-circuit of at most $$O(k)$$ gates to compute the $$NAND$$ function (you can start by showing that there is a $$B$$ circuit of at most $$O(k^{16})$$ gates).Thanks to Alec Sun for solving this problem.

Prove that for every $$w,t$$, the function $$T_{w,t}$$ can be computed by a NAND program of at most $$O(k^3)$$ lines.

## Bibliographical notes

Scott Aaronson’s blog post on how information is physical is a good discussion on issues related to the physical extended Church-Turing Physics. Aaronson’s survey on NP complete problems and physical reality is also a great source for some of these issues, though might be easier to read after we reach the lectures on NP and NP completeness.

## Further explorations

Some topics related to this lecture that might be accessible to advanced students include:

• The notion of the fundamental limits for information and their interplay with physics, is still not well understood.