Introduction to Theoretical Computer Science — Boaz Barak

Index

\[ \newcommand{\undefined}{} \newcommand{\hfill}{} \newcommand{\qedhere}{\square} \newcommand{\qed}{\square} \newcommand{\ensuremath}[1]{#1} \newcommand{\bbA}{\mathbb A} \newcommand{\bbB}{\mathbb B} \newcommand{\bbC}{\mathbb C} \newcommand{\bbD}{\mathbb D} \newcommand{\bbE}{\mathbb E} \newcommand{\bbF}{\mathbb F} \newcommand{\bbG}{\mathbb G} \newcommand{\bbH}{\mathbb H} \newcommand{\bbI}{\mathbb I} \newcommand{\bbJ}{\mathbb J} \newcommand{\bbK}{\mathbb K} \newcommand{\bbL}{\mathbb L} \newcommand{\bbM}{\mathbb M} \newcommand{\bbN}{\mathbb N} \newcommand{\bbO}{\mathbb O} \newcommand{\bbP}{\mathbb P} \newcommand{\bbQ}{\mathbb Q} \newcommand{\bbR}{\mathbb R} \newcommand{\bbS}{\mathbb S} \newcommand{\bbT}{\mathbb T} \newcommand{\bbU}{\mathbb U} \newcommand{\bbV}{\mathbb V} \newcommand{\bbW}{\mathbb W} \newcommand{\bbX}{\mathbb X} \newcommand{\bbY}{\mathbb Y} \newcommand{\bbZ}{\mathbb Z} \newcommand{\sA}{\mathscr A} \newcommand{\sB}{\mathscr B} \newcommand{\sC}{\mathscr C} \newcommand{\sD}{\mathscr D} \newcommand{\sE}{\mathscr E} \newcommand{\sF}{\mathscr F} \newcommand{\sG}{\mathscr G} \newcommand{\sH}{\mathscr H} \newcommand{\sI}{\mathscr I} \newcommand{\sJ}{\mathscr J} \newcommand{\sK}{\mathscr K} \newcommand{\sL}{\mathscr L} \newcommand{\sM}{\mathscr M} \newcommand{\sN}{\mathscr N} \newcommand{\sO}{\mathscr O} \newcommand{\sP}{\mathscr P} \newcommand{\sQ}{\mathscr Q} \newcommand{\sR}{\mathscr R} \newcommand{\sS}{\mathscr S} \newcommand{\sT}{\mathscr T} \newcommand{\sU}{\mathscr U} \newcommand{\sV}{\mathscr V} \newcommand{\sW}{\mathscr W} \newcommand{\sX}{\mathscr X} \newcommand{\sY}{\mathscr Y} \newcommand{\sZ}{\mathscr Z} \newcommand{\sfA}{\mathsf A} \newcommand{\sfB}{\mathsf B} \newcommand{\sfC}{\mathsf C} \newcommand{\sfD}{\mathsf D} \newcommand{\sfE}{\mathsf E} \newcommand{\sfF}{\mathsf F} \newcommand{\sfG}{\mathsf G} \newcommand{\sfH}{\mathsf H} \newcommand{\sfI}{\mathsf I} \newcommand{\sfJ}{\mathsf J} \newcommand{\sfK}{\mathsf K} \newcommand{\sfL}{\mathsf L} \newcommand{\sfM}{\mathsf M} \newcommand{\sfN}{\mathsf N} \newcommand{\sfO}{\mathsf O} \newcommand{\sfP}{\mathsf P} \newcommand{\sfQ}{\mathsf Q} \newcommand{\sfR}{\mathsf R} \newcommand{\sfS}{\mathsf S} \newcommand{\sfT}{\mathsf T} \newcommand{\sfU}{\mathsf U} \newcommand{\sfV}{\mathsf V} \newcommand{\sfW}{\mathsf W} \newcommand{\sfX}{\mathsf X} \newcommand{\sfY}{\mathsf Y} \newcommand{\sfZ}{\mathsf Z} \newcommand{\cA}{\mathcal A} \newcommand{\cB}{\mathcal B} \newcommand{\cC}{\mathcal C} \newcommand{\cD}{\mathcal D} \newcommand{\cE}{\mathcal E} \newcommand{\cF}{\mathcal F} \newcommand{\cG}{\mathcal G} \newcommand{\cH}{\mathcal H} \newcommand{\cI}{\mathcal I} \newcommand{\cJ}{\mathcal J} \newcommand{\cK}{\mathcal K} \newcommand{\cL}{\mathcal L} \newcommand{\cM}{\mathcal M} \newcommand{\cN}{\mathcal N} \newcommand{\cO}{\mathcal O} \newcommand{\cP}{\mathcal P} \newcommand{\cQ}{\mathcal Q} \newcommand{\cR}{\mathcal R} \newcommand{\cS}{\mathcal S} \newcommand{\cT}{\mathcal T} \newcommand{\cU}{\mathcal U} \newcommand{\cV}{\mathcal V} \newcommand{\cW}{\mathcal W} \newcommand{\cX}{\mathcal X} \newcommand{\cY}{\mathcal Y} \newcommand{\cZ}{\mathcal Z} \newcommand{\bfA}{\mathbf A} \newcommand{\bfB}{\mathbf B} \newcommand{\bfC}{\mathbf C} \newcommand{\bfD}{\mathbf D} \newcommand{\bfE}{\mathbf E} \newcommand{\bfF}{\mathbf F} \newcommand{\bfG}{\mathbf G} \newcommand{\bfH}{\mathbf H} \newcommand{\bfI}{\mathbf I} \newcommand{\bfJ}{\mathbf J} \newcommand{\bfK}{\mathbf K} \newcommand{\bfL}{\mathbf L} \newcommand{\bfM}{\mathbf M} \newcommand{\bfN}{\mathbf N} \newcommand{\bfO}{\mathbf O} \newcommand{\bfP}{\mathbf P} \newcommand{\bfQ}{\mathbf Q} \newcommand{\bfR}{\mathbf R} \newcommand{\bfS}{\mathbf S} \newcommand{\bfT}{\mathbf T} \newcommand{\bfU}{\mathbf U} \newcommand{\bfV}{\mathbf V} \newcommand{\bfW}{\mathbf W} \newcommand{\bfX}{\mathbf X} \newcommand{\bfY}{\mathbf Y} \newcommand{\bfZ}{\mathbf Z} \newcommand{\rmA}{\mathrm A} \newcommand{\rmB}{\mathrm B} \newcommand{\rmC}{\mathrm C} \newcommand{\rmD}{\mathrm D} \newcommand{\rmE}{\mathrm E} \newcommand{\rmF}{\mathrm F} \newcommand{\rmG}{\mathrm G} \newcommand{\rmH}{\mathrm H} \newcommand{\rmI}{\mathrm I} \newcommand{\rmJ}{\mathrm J} \newcommand{\rmK}{\mathrm K} \newcommand{\rmL}{\mathrm L} \newcommand{\rmM}{\mathrm M} \newcommand{\rmN}{\mathrm N} \newcommand{\rmO}{\mathrm O} \newcommand{\rmP}{\mathrm P} \newcommand{\rmQ}{\mathrm Q} \newcommand{\rmR}{\mathrm R} \newcommand{\rmS}{\mathrm S} \newcommand{\rmT}{\mathrm T} \newcommand{\rmU}{\mathrm U} \newcommand{\rmV}{\mathrm V} \newcommand{\rmW}{\mathrm W} \newcommand{\rmX}{\mathrm X} \newcommand{\rmY}{\mathrm Y} \newcommand{\rmZ}{\mathrm Z} \newcommand{\paren}[1]{( #1 )} \newcommand{\Paren}[1]{\left( #1 \right)} \newcommand{\bigparen}[1]{\bigl( #1 \bigr)} \newcommand{\Bigparen}[1]{\Bigl( #1 \Bigr)} \newcommand{\biggparen}[1]{\biggl( #1 \biggr)} \newcommand{\Biggparen}[1]{\Biggl( #1 \Biggr)} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\Abs}[1]{\left\lvert #1 \right\rvert} \newcommand{\bigabs}[1]{\bigl\lvert #1 \bigr\rvert} \newcommand{\Bigabs}[1]{\Bigl\lvert #1 \Bigr\rvert} \newcommand{\biggabs}[1]{\biggl\lvert #1 \biggr\rvert} \newcommand{\Biggabs}[1]{\Biggl\lvert #1 \Biggr\rvert} \newcommand{\card}[1]{\lvert #1 \rvert} \newcommand{\Card}[1]{\left\lvert #1 \right\rvert} \newcommand{\bigcard}[1]{\bigl\lvert #1 \bigr\rvert} \newcommand{\Bigcard}[1]{\Bigl\lvert #1 \Bigr\rvert} \newcommand{\biggcard}[1]{\biggl\lvert #1 \biggr\rvert} \newcommand{\Biggcard}[1]{\Biggl\lvert #1 \Biggr\rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\Norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\bignorm}[1]{\bigl\lVert #1 \bigr\rVert} \newcommand{\Bignorm}[1]{\Bigl\lVert #1 \Bigr\rVert} \newcommand{\biggnorm}[1]{\biggl\lVert #1 \biggr\rVert} \newcommand{\Biggnorm}[1]{\Biggl\lVert #1 \Biggr\rVert} \newcommand{\iprod}[1]{\langle #1 \rangle} \newcommand{\Iprod}[1]{\left\langle #1 \right\rangle} \newcommand{\bigiprod}[1]{\bigl\langle #1 \bigr\rangle} \newcommand{\Bigiprod}[1]{\Bigl\langle #1 \Bigr\rangle} \newcommand{\biggiprod}[1]{\biggl\langle #1 \biggr\rangle} \newcommand{\Biggiprod}[1]{\Biggl\langle #1 \Biggr\rangle} \newcommand{\set}[1]{\lbrace #1 \rbrace} \newcommand{\Set}[1]{\left\lbrace #1 \right\rbrace} \newcommand{\bigset}[1]{\bigl\lbrace #1 \bigr\rbrace} \newcommand{\Bigset}[1]{\Bigl\lbrace #1 \Bigr\rbrace} \newcommand{\biggset}[1]{\biggl\lbrace #1 \biggr\rbrace} \newcommand{\Biggset}[1]{\Biggl\lbrace #1 \Biggr\rbrace} \newcommand{\bracket}[1]{\lbrack #1 \rbrack} \newcommand{\Bracket}[1]{\left\lbrack #1 \right\rbrack} \newcommand{\bigbracket}[1]{\bigl\lbrack #1 \bigr\rbrack} \newcommand{\Bigbracket}[1]{\Bigl\lbrack #1 \Bigr\rbrack} \newcommand{\biggbracket}[1]{\biggl\lbrack #1 \biggr\rbrack} \newcommand{\Biggbracket}[1]{\Biggl\lbrack #1 \Biggr\rbrack} \newcommand{\ucorner}[1]{\ulcorner #1 \urcorner} \newcommand{\Ucorner}[1]{\left\ulcorner #1 \right\urcorner} \newcommand{\bigucorner}[1]{\bigl\ulcorner #1 \bigr\urcorner} \newcommand{\Bigucorner}[1]{\Bigl\ulcorner #1 \Bigr\urcorner} \newcommand{\biggucorner}[1]{\biggl\ulcorner #1 \biggr\urcorner} \newcommand{\Biggucorner}[1]{\Biggl\ulcorner #1 \Biggr\urcorner} \newcommand{\ceil}[1]{\lceil #1 \rceil} \newcommand{\Ceil}[1]{\left\lceil #1 \right\rceil} \newcommand{\bigceil}[1]{\bigl\lceil #1 \bigr\rceil} \newcommand{\Bigceil}[1]{\Bigl\lceil #1 \Bigr\rceil} \newcommand{\biggceil}[1]{\biggl\lceil #1 \biggr\rceil} \newcommand{\Biggceil}[1]{\Biggl\lceil #1 \Biggr\rceil} \newcommand{\floor}[1]{\lfloor #1 \rfloor} \newcommand{\Floor}[1]{\left\lfloor #1 \right\rfloor} \newcommand{\bigfloor}[1]{\bigl\lfloor #1 \bigr\rfloor} \newcommand{\Bigfloor}[1]{\Bigl\lfloor #1 \Bigr\rfloor} \newcommand{\biggfloor}[1]{\biggl\lfloor #1 \biggr\rfloor} \newcommand{\Biggfloor}[1]{\Biggl\lfloor #1 \Biggr\rfloor} \newcommand{\lcorner}[1]{\llcorner #1 \lrcorner} \newcommand{\Lcorner}[1]{\left\llcorner #1 \right\lrcorner} \newcommand{\biglcorner}[1]{\bigl\llcorner #1 \bigr\lrcorner} \newcommand{\Biglcorner}[1]{\Bigl\llcorner #1 \Bigr\lrcorner} \newcommand{\bigglcorner}[1]{\biggl\llcorner #1 \biggr\lrcorner} \newcommand{\Bigglcorner}[1]{\Biggl\llcorner #1 \Biggr\lrcorner} \newcommand{\expr}[1]{\langle #1 \rangle} \newcommand{\Expr}[1]{\left\langle #1 \right\rangle} \newcommand{\bigexpr}[1]{\bigl\langle #1 \bigr\rangle} \newcommand{\Bigexpr}[1]{\Bigl\langle #1 \Bigr\rangle} \newcommand{\biggexpr}[1]{\biggl\langle #1 \biggr\rangle} \newcommand{\Biggexpr}[1]{\Biggl\langle #1 \Biggr\rangle} \newcommand{\e}{\varepsilon} \newcommand{\eps}{\varepsilon} \newcommand{\from}{\colon} \newcommand{\super}[2]{#1^{(#2)}} \newcommand{\varsuper}[2]{#1^{\scriptscriptstyle (#2)}} \newcommand{\tensor}{\otimes} \newcommand{\eset}{\emptyset} \newcommand{\sse}{\subseteq} \newcommand{\sst}{\substack} \newcommand{\ot}{\otimes} \newcommand{\Esst}[1]{\bbE_{\substack{#1}}} \newcommand{\vbig}{\vphantom{\bigoplus}} \newcommand{\seteq}{\mathrel{\mathop:}=} \newcommand{\defeq}{\stackrel{\mathrm{def}}=} \newcommand{\Mid}{\mathrel{}\middle|\mathrel{}} \newcommand{\Ind}{\mathbf 1} \newcommand{\bits}{\{0,1\}} \newcommand{\sbits}{\{\pm 1\}} \newcommand{\R}{\mathbb R} \newcommand{\Rnn}{\R_{\ge 0}} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\Q}{\mathbb Q} \newcommand{\mper}{\,.} \newcommand{\mcom}{\,,} \DeclareMathOperator{\Id}{Id} \DeclareMathOperator{\cone}{cone} \DeclareMathOperator{\vol}{vol} \DeclareMathOperator{\val}{val} \DeclareMathOperator{\opt}{opt} \DeclareMathOperator{\Opt}{Opt} \DeclareMathOperator{\Val}{Val} \DeclareMathOperator{\LP}{LP} \DeclareMathOperator{\SDP}{SDP} \DeclareMathOperator{\Tr}{Tr} \DeclareMathOperator{\Inf}{Inf} \DeclareMathOperator{\poly}{poly} \DeclareMathOperator{\polylog}{polylog} \DeclareMathOperator{\argmax}{arg\,max} \DeclareMathOperator{\argmin}{arg\,min} \DeclareMathOperator{\qpoly}{qpoly} \DeclareMathOperator{\qqpoly}{qqpoly} \DeclareMathOperator{\conv}{conv} \DeclareMathOperator{\Conv}{Conv} \DeclareMathOperator{\supp}{supp} \DeclareMathOperator{\sign}{sign} \DeclareMathOperator{\mspan}{span} \DeclareMathOperator{\mrank}{rank} \DeclareMathOperator{\E}{\mathbb E} \DeclareMathOperator{\pE}{\tilde{\mathbb E}} \DeclareMathOperator{\Pr}{\mathbb P} \DeclareMathOperator{\Span}{Span} \DeclareMathOperator{\Cone}{Cone} \DeclareMathOperator{\junta}{junta} \DeclareMathOperator{\NSS}{NSS} \DeclareMathOperator{\SA}{SA} \DeclareMathOperator{\SOS}{SOS} \newcommand{\iprod}[1]{\langle #1 \rangle} \newcommand{\R}{\mathbb{R}} \newcommand{\cE}{\mathcal{E}} \newcommand{\E}{\mathbb{E}} \newcommand{\pE}{\tilde{\mathbb{E}}} \newcommand{\N}{\mathbb{N}} \renewcommand{\P}{\mathcal{P}} \notag \]
\[ \newcommand{\sleq}{\ensuremath{\preceq}} \newcommand{\sgeq}{\ensuremath{\succeq}} \newcommand{\diag}{\ensuremath{\mathrm{diag}}} \newcommand{\support}{\ensuremath{\mathrm{support}}} \newcommand{\zo}{\ensuremath{\{0,1\}}} \newcommand{\pmo}{\ensuremath{\{\pm 1\}}} \newcommand{\uppersos}{\ensuremath{\overline{\mathrm{sos}}}} \newcommand{\lambdamax}{\ensuremath{\lambda_{\mathrm{max}}}} \newcommand{\rank}{\ensuremath{\mathrm{rank}}} \newcommand{\Mslow}{\ensuremath{M_{\mathrm{slow}}}} \newcommand{\Mfast}{\ensuremath{M_{\mathrm{fast}}}} \newcommand{\Mdiag}{\ensuremath{M_{\mathrm{diag}}}} \newcommand{\Mcross}{\ensuremath{M_{\mathrm{cross}}}} \newcommand{\eqdef}{\ensuremath{ =^{def}}} \newcommand{\threshold}{\ensuremath{\mathrm{threshold}}} \newcommand{\vbls}{\ensuremath{\mathrm{vbls}}} \newcommand{\cons}{\ensuremath{\mathrm{cons}}} \newcommand{\edges}{\ensuremath{\mathrm{edges}}} \newcommand{\cl}{\ensuremath{\mathrm{cl}}} \newcommand{\xor}{\ensuremath{\oplus}} \newcommand{\1}{\ensuremath{\mathrm{1}}} \notag \]
\[ \newcommand{\transpose}[1]{\ensuremath{#1{}^{\mkern-2mu\intercal}}} \newcommand{\dyad}[1]{\ensuremath{#1#1{}^{\mkern-2mu\intercal}}} \newcommand{\nchoose}[1]{\ensuremath{{n \choose #1}}} \newcommand{\generated}[1]{\ensuremath{\langle #1 \rangle}} \notag \]

Physical implementations of NAND programs

  • Understand how NAND programs can map to physical processes in a variety of ways.
  • Learn the model of Boolean circuits and get proficient in moving between description of a NAND program as a code and as a circuit or labeled graph.
  • See that NAND is a universal basis for circuits, and examples for universal and non-universal families of gates.
  • Understand the physical extended Church-Turing thesis that NAND programs capture all feasible computation in the physical world, and its physical and philosophical implications.

“In existing digital computing devices various mechanical or electrical devices have been used as elements: Wheels, which can be locked … which on moving from one position to another transmit electric pulses that may cause other similar wheels to move; single or combined telegraph relays, actuated by an electromagnet and opening or closing electric circuits; combinations of these two elements;—and finally there exists the plausible and tempting possibility of using vacuum tubes”, John von Neumann, first draft of a report on the EDVAC, 1945

We have defined NAND programs as a model for computation, but is this model only a mathematical abstraction, or is it connected in some way to physical reality? For example, if a function \(F:\{0,1\}^n \rightarrow \{0,1\}\) can be computed by a NAND program of \(s\) lines, is it possible, given an actual input \(x\in \{0,1\}^n\), to compute \(F(x)\) in the real world using an amount of resources that is roughly proportional to \(s\)?

In some sense, we already know that the answer to this question is Yes. We have seen a Python program that can evaluate NAND programs, and so if we have a NAND program \(P\), we can use any computer with Python installed on it to evaluate \(P\) on inputs of our choice. But do we really need modern computers and programming languages to run NAND programs? And can we understand more directly how we can map such programs to actual physical processes that produce an output from an input? This is the content of this lecture.

We will also talk about the following “dual” question. Suppose we have some way to compute a function \(F:\{0,1\}^n \rightarrow \{0,1\}\) using roughly an \(s\) amount of “physical resources” such as material, energy, time, etc.. Does this mean that there is also a NAND program to compute \(F\) using a number of lines that is not much bigger than \(s\)? This might seem like a wishful fantasy, but we will see that the answer to this question might be (up to some important caveats) essentially Yes as well.

Physical implementation of computing devices.

Computation is an abstract notion, that is distinct from its physical implementations. While most modern computing devices are obtained by mapping logical gates to semi-conductor based transistors, over history people have computed using a huge variety of mechanisms, including mechanical systems, gas and liquid (known as fluidics), biological and chemical processes, and even living creatures (e.g., see Reference:crabfig or this video for how crabs or slime mold can be used to do computations).

In this lecture we review some of these implementations, both so you can get an appreciation of how it is possible to directly translate NAND programs to the physical world, without going through the entire stack of architecture, operating systems, compilers, etc… as well as to emphasize that silicon-based processors are by no means the only way to perform computation. Indeed, as we will see much later in this course, a very exciting recent line of works involves using different media for computation that would allow us to take advantage of quantum mechanical effects to enable different types of algorithms.

Crab-based logic gates from the paper “Robust soldier-crab ball gate” by Gunji, Nishiyama and Adamatzky. This is an example of an AND gate that relies on the tendency of two swarms of crabs arriving from different directions to combine to a single swarm that continues in the average of the directions.

Transistors and physical logic gates

A transistor can be thought of as an electric circuit with two inputs, known as source and gate and an output, known as the sink. The gate controls whether current flows from the source to the sink. In a standard transistor, if the gate is “ON” then current can flow from the source to the sink and if it is “OFF” then it can’t. In a complementary transistor this is reversed: if the gate is “OFF” then current can flow from the source to the sink and if it is “ON” then it can’t.

We can implement the logic of transistors using water. The water pressure from the gate closes or opens a faucet between the source and the sink.

There are several ways to implement the logic of a transistor. For example, we can use faucets to implement it using water pressure (e.g. Reference:transistor-water-fig).This might seem as curiosity but there is a field known as fluidics concerned with implementing logical operations using liquids or gasses. Some of the motivations include operating in extreme environmental conditions such as in space or a battlefield, where standard electronic equipment would not survive. However, the standard implementation uses electrical current. One of the original implementations used vacuum tubes. As its name implies, a vacuum tube is a tube containing nothing (i.e., vacuum) and where a priori electrons could freely flow from source (a wire) to the sink (a plate). However, there is a gate (a grid) between the two, where modulating its voltage can block the flow of electrons.

Early vacuum tubes were roughly the size of lightbulbs (and looked very much like them too). In the 1950’s they were supplanted by transistors, which implement the same logic using semiconductors which are materials that normally do not conduct electricity but whose conductivity can be modified and controlled by inserting impurities (“doping”) and an external electric field (this is known as the field effect). In the 1960’s computers were started to be implemented using integrated circuits which enabled much greater density. In 1965, Gordon Moore predicted that the number of transistors per circuit would double every year (see Reference:moorefig), and that this would lead to “such wonders as home computers —or at least terminals connected to a central computer— automatic controls for automobiles, and personal portable communications equipment”. Since then, (adjusted versions of) this so-called “Moore’s law” has been running strong, though exponential growth cannot be sustained forever, and some physical limitations are already becoming apparent.

The number of transistors per integrated circuits from 1959 till 1965 and a prediction that exponential growth will continue at least another decade. Figure taken from “Cramming More Components onto Integrated Circuits”, Gordon Moore, 1965
Gordon Moore’s cartoon “predicting” the implications of radically improving transistor density.
The exponential growth in computing power over the last 120 years. Graph by Steve Jurvetson, extending a prior graph of Ray Kurzweil.

Gates and circuits

We can use transistors to implement a NAND gate, which would be a system with two input wires \(x,y\) and one output wire \(z\), such that if we identify high voltage with “\(1\)” and low voltage with “\(0\)”, then the wire \(z\) will equal to “\(1\)” if and only if the NAND of the values of the wires \(x\) and \(y\) is \(1\) (see Reference:transistor-nand-fig).

Implementing a NAND gate using transistors.

More generally, we can use transistors to implement the model of Boolean circuits. We list the formal definition below, but let us start with the informal one:

Let \(B\) be some set of functions (known as “gates”) from \(\bits^k\) to \(\{0,1\}\). A Boolean circuit with the basis \(B\) is obtained by connecting “gates” which compute functions in \(B\) together by “wires” where each gate has \(k\) wires going into it and one wire going out of it. We have \(n\) special wires known as the “input wires” and \(m\) special wires known as the “output wires”. To compute a function \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) using a circuit, we feed the bits of \(x\) to the \(n\) input wires, and then each gate computes the corresponding function, and we “read off” the output \(y\in \{0,1\}^m\) from the \(m\) output wires.

The number \(k\) is known as the arity of the basis \(B\). We think of \(k\) as a small number (such as \(k=2\) or \(k=3\)) and so the idea behind a Boolean circuit is that we can compute complex functions by combining together the simple components which are the functions in \(B\). It turns out that NAND programs correspond to circuits where the basis is the single function \(NAND:\{0,1\}^2 \rightarrow \{0,1\}\). We now show this more formally.

Representing programs as graphs

We now define NAND programs as circuits, using the notion of directed acyclic graphs (DAGs).

If you are not comfortable with the definitions of graphs, and in particular directed acyclic graphs (DAGs), now would be a great time to go back to the “mathematical background” lecture, as well as some of the resources here, and review these notions.

A NAND circuit with \(n\) inputs and \(m\) outputs is a labeled directed acyclic graph (DAG) in which every vertex has in-degree at most two. We require that there are \(n\) vertices with in-degree zero, known as input variables, that are labeled with x_\(\expr{i}\) for \(i\in [n]\). Every vertex apart from the input variables is known as a gate. We require that there are \(m\) vertices of out-degree zero, denoted as the output gates, and that are labeled with y_\(\expr{j}\) for \(j\in [m]\). While not all vertices are labeled, no two vertices get the same label. We denote the circuit as \(C=(V,E,L)\) where \(V,E\) are the vertices and edges of the circuit, and \(L:V \rightarrow_p S\) is the (partial) one-to-one labeling function that maps vertices into the set \(S=\{\) x_0,\(\ldots\),x_\(\expr{n-1}\),y_0,\(\ldots\), y_\(\expr{m-1}\),\(\}\). The size of a circuit \(C\), denoted by \(|C|\), is the number of gates that it contains.

The definition of NAND circuits is not ultimately that complicated, but may take a second or third read to fully parse. It might help to look at Reference:XORcircuitfig, which describes the NAND circuit that corresponds to the 4-line NAND program we presented above for the \(XOR_2\) function.

A NAND circuit for computing the \(XOR_2\) function. Note that it has exactly four gates, corresponding to the four lines of the NAND program we presented above. The green labels \(u,v,w\) for non-output gates are just for illustration and comparison with the NAND program, and are not formally part of the circuit.

A NAND circuit corresponds to computation in the following way. To compute some output on an input \(x\in \{0,1\}^n\), we start by assigning to the input vertex labeled with x_\(\expr{i}\) the value \(x_i\), and then proceed by assigning for every gate \(v\) the value that is the NAND of the values assigned to its in-neighbors (if it has less than two in-neighbors, we replace the value of the missing neighbors by zero). The output \(y\in \{0,1\}^m\) corresponds to the value assigned to the output gates, with \(y_j\) equal to the value assigned to the value assigned to the gate labeled y_\(\expr{j}\) for every \(j\in [m]\). Formally, this is defined as follows:

Let \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and let \(C=(V,E,L)\) be a NAND circuit with \(n\) inputs and \(m\) outputs. We say that \(C\) computes \(F\) if there is a map \(Z:V \rightarrow \{0,1\}\), such that for every \(x\in \{0,1\}^n\), if \(y=F(x)\) then:
* For every \(i\in [n]\), if \(v\) is labeled with x_\(\expr{i}\) then \(Z(v)=x_i\).
* For every \(j\in[m]\), if \(v\) is labeled with y_\(\expr{j}\) then \(Z(v)=y_j\).
* For every gate \(v\) with in-neighbors \(u,w\), if \(a=Z(u)\) and \(b=Z(w)\), then \(Z(v)=NAND(a,b)\). (If \(v\) has fewer than two neighbors then we replace either \(b\) or both \(a\) and \(b\) with zero in the condition above.)

You should make sure you understand why Reference:NANDcirccomputedef captures the informal description above. This might require reading the definition a second or third time, but would be crucial for the rest of this course. Moreover, a priori it is not clear that for every circuit \(C\) and assignment \(x\) there is a map \(Z:V \rightarrow \{0,1\}\) that satisfies the conditions of Reference:NANDcirccomputedef. However, this follows from Reference:circuitprogequivthm

The following theorem says that these two notions of computing a function are actually equivalent: we can transform a NAND program into a NAND circuit computing the same function, and vice versa.

For every \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(s\in \N\), \(F\) can be computed by an \(s\)-line NAND program if and only if \(F\) can be computed by an \(n\)-input \(m\)-output NAND circuit of \(s\) gates.

The transformation of a NAND program to a circuit demonstrated on the program for the \(ATLEASTTWO\) function. Given a program of \(s\) lines and \(n\) inputs, we construct a circuit that is a graph with \(s+n\) vertices, \(s\) of which are gates and \(n\) of which are inputs. The neighbors of the vertex corresponding to a line of the program are the vertex corresponding to the lines in which the two variables on the righthand side of the assignment operators were last written to.

The idea behind the proof is simple (see Reference:NANDcircuit_transfig for an example). Just like we did to the XOR program, if we have a NAND program \(P\) of \(s\) lines, \(n\) inputs, and \(m\) outputs, we can transform it into a NAND circuit with \(n\) inputs and \(s\) gates (i.e., a graph of \(n+s\) vertices, \(n\) of which are sources), where each gate corresponds to a line in the program \(P\). If line \(\ell\) involves the NAND of two variables foo and bar then if \(\ell'\) and \(\ell''\) are the lines where foo and bar were last assigned a value, then we add edges going into the gate corresponding to \(\ell\) from the gates corresponding to \(\ell',\ell''\). (If one of the variables was an input variable, then we add an edge from that variable, if one of them was an uninitialized then we add no edge, and use the convention that it amounts to defaulting to zero.) In the other direction, we can transform a NAND circuit \(C\) of \(n\) inputs, \(m\) outputs and \(s\) gates to an \(s\)-line program by essentially inverting this process. For every gate in the program, we will have a line in the program which assigns to a variable the NAND of the variables corresponding to the in-neighbors of this gate. If the gate is an output gate labeled with y_\(\expr{j}\) then the corresponding line will assign the value to the variable y_\(\expr{j}\). Otherwise we will assign the value to a fresh “workspace” variable. We now show the formal proof.

We start with the “only if” direction. That is, we show how to transform a NAND program to a circuit. Suppose that \(P\) is an \(S\) line program that computes \(F\). We will build a NAND circuit \(C=(V,E,L)\) that computes \(F\) as follows. The vertex set \(V\) will have the \(n+s\) elements \(\{ (0,0), \ldots, (0,n-1),(1,0),\ldots,(1,s-1) \}\). That is, it will have \(n\) vertices of the form \((0,i)\) for \(i\in [n]\) (corresponding to the \(n\) inputs), and \(S\) vertices of the form \((1,\ell)\) (corresponding to the lines in the program). For every line \(\ell\) in the program \(P\) of the form foo := bar NAND baz, we put edges in the graph of the form \(\overrightarrow{(1,\ell')\;(1,\ell)}\) and \(\overrightarrow{(1,\ell'')\;(1,\ell)}\) where \(\ell'\) and \(\ell''\) are the last lines before \(\ell\) in which the variables bar and baz were assigned a value. If the variable bar and/or baz was not assigned a value prior to the \(\ell\)-th line and is not an input variable then we don’t add a corresponding edge. If the variable bar and/or baz is an input variable x_\(\expr{i}\) then we add the edge \(\overrightarrow{(0,i)\;(1,\ell)}\). We label the vertices of the form \((0,i)\) with x_\(\expr{i}\) for every \(i\in [n]\). For every \(j\in[m]\), let \(\ell\) be the last line in which the variable y_\(\expr{j}\) is assigned a value,As noted in the appendix, valid NAND programs must assign a value to all their output variables. and label the vertex \((1,\ell)\) with y_\(\expr{j}\). Note that the vertices of the form \((0,i)\) have in-degree zero, and all edges of the form \(\overrightarrow{(1,\ell')\;(1,\ell)}\) satisfy \(\ell>\ell'\). Hence this graph is a DAG, as in any cycle there would have to be at least on edge going from a vertex of the form \((1,\ell)\) to a vertex of the form \((1,\ell')\) for \(\ell'<\ell\) (can you see why?). Also, since we don’t allow a variable of the form y_\(\expr{j}\) on the right-hand side of a NAND operation, the output vertices have out-degree zero.

To complete the proof of the “only if” direction, we need to show that the circuit \(C\) we constructed computes the same function \(F\) as the program \(P\) we were given. Indeed, let \(x\in \{0,1\}^n\) and \(y = F(x)\). For every \(\ell\), let \(z_\ell\) be the value that is assigned by the \(\ell\)-th line in the execution of \(P\) on input \(x\). Now, as per Reference:NANDcirccomputedef, define the map \(Z:V \rightarrow \{0,1\}\) as follows: \(Z((0,i))=x_i\) for \(i\in [n]\) and \(Z((1,\ell))=z_\ell\) for every \(\ell \in [s]\). Then, by our construction of the circuit, the map satisfies the condition that for vertex \(v\) with in-neighbors \(u\) and \(w\), the value \(Z(v)\) is the NAND of \(Z(u)\) and \(Z(w)\) (replacing missing neighbors with the value \(0\)), and hence in particular for every \(j\in [m]\), the value assigned in the last line that touches y_\(\expr{j}\) equals \(y_j\). Thus the circuit \(C\) does compute the same function \(F\).

For the “if” direction, we need to transform an \(s\)-gate circuit \(C=(V,E,L)\) that computes \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) into an \(S\)-line NAND program \(P\) that computes the same function. We start by doing a topological sort of the graph \(C\). That is we sort the vertex set \(V\) as \(\{v_0,\ldots,v_{n+S-1} \}\) such that \(\overrightarrow{v_i v_j} \in E\), \(v_i < v_j\). Such a sorting can be found for every DAG (see also Reference:topologicalsortex). Moreover, because the input vertices of \(C\) are “sources” (have in-degree zero), we can ensure they are placed first in this sorting and moreover for every \(i\in [n]\), \(v_i\) is the input vertex labeled with x_\(\expr{i}\).

Now for \(\ell=0,1,\ldots,n+s-1\) we will define a variable \(var(\ell)\) in our resulting program as follows: If \(\ell<n\) then \(var(\ell)\) equals x_\(\expr{i}\). If \(v_\ell\) is an output gate labeled with y_\(\expr{j}\) then \(var(\ell)\) equals y_\(\expr{j}\). otherwise \(var(\ell)\) will be a temporary workspace variable temp_\(\expr{\ell-n}\). Our program \(P\) will have \(s\) lines, where for every \(k\in [s]\), if the in-neighbors of \(v_{n+k}\) are \(v_i\) and \(v_j\) then the \(k\)-th line in the program will be \(var(n+k)\) := \(var(i)\) NAND \(var(j)\). If \(v_k\) has fewer than two in-neighbors then we replace the corresponding variable with the variable zero (which is never set to any value and hence retains its default value of \(0\).

To complete the proof of the “if” direction we need to show that the program \(P\) we constructed computes the same function \(F\) as the circuit \(C\) we were given. Indeed, let \(x\in \{0,1\}^n\) and \(y=F(x)\). Since \(C\) computes \(F\), there is a map \(Z:V \rightarrow \{0,1\}\) as per Reference:NANDcirccomputedef. We claim that if we run the program \(P\) on input \(x\), then for every \(k\in [s]\) the value assigned by the \(k\)-th line corresponds to \(Z(v_{n+k})\). Indeed by construction the value assigned in the \(k\)-th line corresponds to the NAND of the value assigned to the in-neighbors of \(v_{n+k}\). Hence in particular if \(v_{n+k}\) is the output gate labeled y_\(\expr{j}\) then this value will equal \(y_j\), meaning that on input \(x\) our program will output \(y=F(x)\).

Composition from graphs

Given Reference:circuitprogequivthm, we can reprove our composition theorems in the circuit formalism, which has the advantage of making them more intuitive. That is, we can prove Reference:seqcompositionthm and Reference:parcompositionthm by showing how to transform a circuits for \(F\) and \(G\) into circuits for \(F \circ G\) and \(F \oplus G\). This is what we do now:

If \(C,D\) are NAND circuits such that \(C\) computes \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(D\) computes \(G:\{0,1\}^m \rightarrow \{0,1\}^k\) then there is a circuit \(E\) of size \(|C|+|D|\) computing the function \(G\circ F:\{0,1\}^n \rightarrow \{0,1\}^k\).

Given a circuit \(C\) computing \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and a circuit \(D\) computing \(G:\{0,1\}^m \rightarrow \{0,1\}^k\), we obtain a circuit \(E\) computing \(G\circ F\) by identifying the inputs of \(D\) with the outputs of \(C\). That is, the resulting circuit consists of the gates of both \(C\) and \(D\), where we replace every in-neighbor of \(D\) that was an input gate with the corresponding output gate of \(C\).

Let \(C\) be the \(n\)-input \(m\)-output circuit computing \(F\) and \(D\) be the \(m\)-input \(k\)-output circuit computing \(G\). The circuit to compute \(G \circ F\) is illustrated in Reference:serialcompfig. We simply “stack” \(D\) after \(C\), by obtaining a combined circuit with \(n\) inputs and \(|C|+|D|\) gates. The gates of \(C\) remain the same, except that we identify the output gates of \(C\) with the input gates of \(D\). That is, for every edge that connected the \(i\)-th input of \(D\) to a gate \(v\) of \(D\), we now connect to \(v\) the output gate of \(C\) corresponding to y_\(\expr{i}\) instead. After doing so, we remove the output labels from \(C\) and keep only the outputs of \(D\). For every input \(x\), if we execute the composed circuits on \(x\) (i.e., compute a map \(Z\) from the vertices to \(\{0,1\}\) as per Reference:NANDcirccomputedef), then the output gates of \(C\) will get the values corresponding to \(F(x)\) and hence the output gates of \(D\) will have the value \(G(F(x))\).

If \(C,D\) are NAND circuits such that \(C\) computes \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(D\) computes \(G:\{0,1\}^{n'} \rightarrow \{0,1\}^{m'}\) then there is a circuit \(E\) of size \(|C|+|D|\) computing the function \(G\oplus F : \{0,1\}^{n+n'} \rightarrow \{0,1\}^{m+m'}\).

Given a circuit \(C\) computing \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and a circuit \(D\) computing \(G:\{0,1\}^{n'}\rightarrow \{0,1\}^{m'}\) we obtain a circuit \(E\) computing \(F \oplus G\) be simply putting the circuits “side by side”, and renaming the labels of the inputs and outputs of \(D\) to x_\(n\),..,x_\(n+n'-1\) and y_\(m\),..,y_\(m+m'-1\).

If \(C,D\) are circuits that compute \(F,G\) then we can transform them to a circuit \(E\) that computes \(F \oplus G\) as in Reference:parallelcompositioncircfig. The circuit \(E\) simply consists of two disjoint copies of the circuits \(C\) and \(D\), where we modify the labelling of the inputs of \(D\) from x_\(0\),\(\ldots\),x_\(n'-1\) to x_\(n\),\(\ldots\),x_\(n+n'-1\) and the labelling of the outputs of \(D\) from y_\(0\),\(\ldots\),y_\(m'-1\) to y_\(m\),\(\ldots\),y_\(m+m'-1\). By the fact that \(C\) and \(D\) compute \(F\) and \(G\) respectively, we see that \(E\) computes the function \(F \oplus G: \{0,1\}^{n+n'}\rightarrow \{0,1\}^{m+m'}\) that on input \(x \in \{0,1\}^{n+n'}\) outputs \(F(x_0,\ldots,x_{n-1})G(x_n,\ldots,x_{n+n'-1})\).

General Boolean circuits: a formal definition

We now define the notion of general Boolean circuits that can use any set \(B\) of gates and not just the NAND gate.

A basis for Boolean circuits is a finite set \(B = \{ g_0 , \ldots , g_{c-1} \}\) of finite Boolean functions, where each function \(g \in B\) maps strings of some finite length (which we denote by \(in(g)\)) to \(\{0,1\}\).

We now define the notion of a general Boolean circuit with gates from \(B\).Just as we defined canonical variables in Reference:NANDcanonical, it will be convenient for us to assume that the vertex set of such a circuit is an interval of the form \(\{0,1,2,\ldots,n+s \}\) for \(n,s \in \N\), where the first \(n\) vertices correspond to the inputs and the last \(m\) vertices correspond to the outputs.

Let \(B\) be a basis for Boolean circuits. A circuit over the basis \(B\) (or \(B\)-circuit for short) with \(n\) inputs and \(m\) outputs is a labeled directed acyclic graph (DAG) over the vertex set \([n+s]\) for \(s\in \N\). The vertices \(\{0,\ldots, n-1\}\) are known as the “input variables” and have in-degree zero. Every vertex apart from the input variables is known as a gate. Each such vertex is labeled with a function \(g \in B\) and has in-degree \(in(g)\). The last \(m\) vertices \(\{ n+s-m,\ldots, n+s-1 \}\) have out-degree zero and are known as the output gates. We denote the circuit as \(C=([n+s],E,L)\) where \([n+s],E\) are the vertices and edges of the circuit, and \(L:\{n,\ldots,n+s-1\} \rightarrow B\) is the labeling function that maps vertices into the set \(B\).

To make sure you understand this definition, stop and think how a Boolean circuits with AND, OR, and NOT gates corresponds to a \(B\)-circuit per Reference:circuits-def, where \(B= \{ AND, OR, NOT \}\) and \(AND:\{0,1\}^2 \rightarrow \{0,1\}\) is the function \(AND(a,b)=a \cdot b\), \(OR(a,b) \rightarrow \{0,1\}\) is the function \(OR(a,b) = 1-(1-a)(1-b)\) and \(NOT:\{0,1\} \rightarrow \{0,1\}\) is the function \(NOT(a)=1-a\).Another commonly used notation \(x \wedge y\) for \(AND(x,y)\), \(x \vee y\) for \(OR(x,y)\) and \(\overline{x}\) or \(\neg x\) for \(NOT(x)\).

The size of a circuit \(C\), denoted by \(|C|\), is the number of gates that it contains. An \(n\)-input \(m\)-output circuit \(C=([n+s],E,L)\) computes a function \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) as follows. For every input \(x\in \{0,1\}^n\), we inductively define the value of every vertex based on its incoming edges:

  1. For the source vertices \(\{0,\ldots,i-1\}\) we define \(val(i) =x_i\) for all \(i\in [n]\).
  2. For a non source vertex \(v\) that is labeled with \(g\in B\), if its incoming neighbors are vertices \(v_1,\ldots,v_k\) (sorted in order) and their values have all been set then we let \(\val(v)=f(\val(v_1),\ldots,\val(v_k))\).
  3. Go back to step 2 until all vertices have values.
  4. Output \(\val(n+s-m),\ldots,\val(n+s-1)\).

The output of the circuit \(C\) on input \(x\), denoted by \(C(x)\), is the string \(y\in \{0,1\}^m\) outputted by this process. We say that the circuit \(C\) computes the function \(F\) if for every \(x\in \{0,1\}^n\), \(C(x)=F(x)\).

We have seen in Reference:NAND-univ-thm that every function \(f:\{0,1\}^k \rightarrow \{0,1\}\) has a NAND program with at most \(10\cdot 2^k\) lines, and hence Reference:NAND-circ-thm implies the following theorem (see Reference:NAND-all-circ-thm-ex):The bound that comes out of the proof of Reference:NAND-univ-thm is \(5\cdot 2^k\) and in fact can be easily optimized further. As \(k\) grows, we can also use the bound of \(O(2^k/k)\) mentioned in Reference:tight-upper-bound.

For every function \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) and \(B\) a subset of the functions from \(\{0,1\}^k\) to \(\{0,1\}\), if we let \(S_{NAND}(f)\) denote the smallest number of lines in a NAND program that computes \(F\) and \(S_B(f)\) denote the smallest number of vertices in a Boolean circuit with the basis \(B\), then \[ S_{NAND}(f) \leq (10\cdot 2^k)S_B(f) \]

One can ask whether there is an equivalence here as well. However, this is not the case. For example if the set \(B\) only consists of constant functions, then clearly a circuit whose gates are in \(B\) cannot compute any non-constant function. A slightly less boring example is if \(B\) consists of the \(\wedge\) (i.e. AND) function (as opposed to the \(NAND\) function). One can show that such a circuit will always output \(0\) on the all zero inputs, and hence it can never compute the simple negation function \(\neg:\{0,1\} \rightarrow \{0,1\}\) such that \(\neg(x)=1-x\).

We say that a subset \(B\) of functions from \(k\) bits to a single bit is a universal basis if there is a “\(B\)-circuit” (i.e., circuit all whose gates are labeled with functions in \(B\)) that computes the \(NAND\) function. Reference:universal-basis asks you to explore some examples of universal and non-universal bases.

The depth of a Boolean circuit is the length of the longest path in it. The notion of depth is tightly connected to the parallelism complexity of the circuit. “Shallow” circuits are easier to parallelize, since a \(k\) long path we mean that we have a sequence of \(k\) gates that each needs to wait for the output of the other until it completes its computation.

It is a good exercise for you to verify that every function \(F:\{0,1\}^n \rightarrow \{0,1\}\) has a circuit that computes it which is of \(O(2^n)\) (in fact even \(O(2^n/n)\)) size and \(O(\log n)\) depth. However, there are functions that require at least \(\log n/10\) depth (can you see why?). There are also function for which the smallest size known circuits that compute them requires a much larger depth.

Neural networks

One particular basis we can use are threshold gates. For every vector \(w= (w_0,\ldots,w_{k-1})\) of integers and integer \(t\) (some or all of whom could be negative), the threshold function corresponding to \(w,t\) is the function \(T_{w,t}:\{0,1\}^k \rightarrow \{0,1\}\) that maps \(x\in \{0,1\}^k\) to \(1\) if and only if \(\sum_{i=0}^{k-1} w_i x_i \geq t\). For example, the threshold function \(T_{w,t}\) corresponding to \(w=(1,1,1,1,1)\) and \(t=3\) is simply the majority function \(MAJ_5\) on \(\{0,1\}^5\). The function \(NAND:\{0,1\}^2 \rightarrow \{0,1\}\) is the threshold function corresponding to \(w=(-1,-1)\) and \(t=-1\), since \(NAND(x_0,x_1)=1\) if and only if \(x_0 + x_1 \leq 1\) or equivalently, \(-x_0 - x_1 \geq -1\).

Threshold gates can be thought of as an approximation for neuron cells that make up the core of human and animal brains. To a first approximation, a neuron has \(k\) inputs and a single output and the neurons “fires” or “turns on” its output when those signals pass some threshold.Typically we think of an input to neurons as being a real number rather than a binary string, but we can reduce to the binary case by representing a real number in the binary basis, and multiplying the weight of the bit corresponding to the \(i^{th}\) digit by \(2^i\). Hence circuits with threshold gates are sometimes known as neural networks. Unlike the cases above, when we considered \(k\) to be a small constant, in such neural networks we often do not put any bound on the number of inputs. However, since any threshold function on \(k\) inputs can be computed by a NAND program of \(poly(k)\) lines (see Reference:threshold-nand-ex), the power of NAND programs and neural networks is not very different.

Biological computing

Computation can be based on biological or chemical systems. For example the lac operon produces the enzymes needed to digest lactose only if the conditions \(x \wedge (\neg y)\) hold where \(x\) is “lactose is present” and \(y\) is “glucose is present”. Researchers have managed to create transistors, and from them the NAND function and other logic gates, based on DNA molecules (see also Reference:transcriptorfig). One motivation for DNA computing is to achieve increased parallelism or storage density; another is to create “smart biological agents” that could perhaps be injected into bodies, replicate themselves, and fix or kill cells that were damaged by a disease such as cancer. Computing in biological systems is not restricted of course to DNA. Even larger systems such as flocks of birds can be considered as computational processes.

Performance of DNA-based logic gates. Figure taken from paper of Bonnet et al, Science, 2013.

Cellular automata and the game of life

As we will discuss later, cellular automata such as Conway’s “Game of Life” can be used to simulate computation gates, see Reference:gameoflifefig.

An AND gate using a “Game of Life” configuration. Figure taken from Jean-Philippe Rennard’s paper.

Circuit evaluation algorithm

A Boolean circuit is a labeled graph, and hence we can use the adjacency list representation to represent an \(s\)-vertex circuit over an arity-\(k\) basis \(B\) by \(s\) elements of \(B\) (that can be identified with numbers in \([|B|]\)) and \(s\) lists of \(k\) numbers in \([s]\). Hence for every fixed basis \(B\) we can represent such a circuit by a string of length \(O(s \log s)\).The implicit constant in the \(O\) notation can depend on the basis \(B\). We can define \(CIRCEVAL_{B,s,n,m}\) to be the function that takes as input a pair \((C,x)\) where \(C\) is string describing an \(s\)-size \(n\)-input \(m\)-output circuit over \(B\), and an input \(x\in \{0,1\}^n\), and returns the evaluation of \(C\) on the input \(x\).

Reference:NAND-all-circ-thm implies that every circuit \(C\) of \(s\) gates over a \(k\)-ary basis \(B\) can be transformed into a NAND program of \(s'=O(s\cdot 2^k)\) lines, and hence we can combine this transformation with last lecture’s evaluation procedure for NAND programs to conclude that \(CIRCEVAL\) for circuits of \(s\) gates over \(B\) can be computed by a NAND program of \(O(s'^2 \log s)= O(s^2 2^{2k}(\log s + k))\) lines.In fact, as we mentioned, it is possible to improve this to \(O(s' \log^2 s')=O(s2^k(\log s + k)^2)\) lines.

Advanced note: evaluating circuits in quasilinear time.

We can improve the evaluation procedure, and evaluate \(s\)-size constant fan-in circuits (or NAND programs) in \(O(s polylog(s))\) lines.

The physical extended Church-Turing thesis

We’ve seen that NAND gates can be implemented using very different systems in the physical world. What about the reverse direction? Can NAND programs simulate any physical computer?

We can take a leap of faith and stipulate that NAND programs do actually encapsulate every computation that we can think of. Such a statement (in the realm of infinite functions, which we’ll encounter in a couple of lectures) is typically attributed to Alonzo Church and Alan Turing, and in that context is known as the Church Turing Thesis. As we will discuss in future lectures, the Church-Turing Thesis is not a mathematical theorem or conjecture. Rather, like theories in physics, the Church-Turing Thesis is about mathematically modelling the real world. In the context of finite functions, we can make the following informal hypothesis or prediction:

If a function \(F:\{0,1\}^n \rightarrow \{0,1\}^m\) can be computed in the physical world using \(s\) amount of “physical resources” then it can be computed by a NAND program of roughly \(s\) lines.

We call this hypothesis the “Physical Extended Church-Turing Thesis” or PECTT for short. A priori it might seem rather extreme to hypothesize that our meager NAND model captures all possible physical computation. But yet, in more than a century of computing technologies, no one has yet built any scalable computing device that challenges this hypothesis.

We now discuss the “fine print” of the PECTT in more detail, as well as the (so far unsuccessful) challenges that have been raised against it. There is no single universally-agreed-upon formalization of “roughly \(s\) physical resources”, but we can approximate this notion by considering the size of any physical computing device and the time it takes to compute the output, and ask that any such device can be simulated by a NAND program with a number of lines that is a polynomial (with not too large exponent) in the size of the system and the time it takes it to operate.

In other words, we can phrase the PECTT as stipulating that any function that can be computed by a device of volume \(V\) and time \(t\), must be computable by a NAND program that has at most \(\alpha(Vt)^\beta\) lines for some constants \(\alpha,\beta\). The exact values for \(\alpha,\beta\) are not so clear, but it is generally accepted that if \(F:\{0,1\}^n \rightarrow \{0,1\}\) is an exponentially hard function, in the sense that it has no NAND program of fewer than, say, \(2^{n/2}\) lines, then a demonstration of a physical device that can compute \(F\) for moderate input lengths (e.g., \(n=500\)) would be a violation of the PECTT.

Advanced note: making things concrete: We can attempt at a more exact phrasing of the PECTT as follows. Suppose that \(Z\) is a physical system that accepts \(n\) binary stimuli and has a binary output, and can be enclosed in a sphere of volume \(V\). We say that the system \(Z\) computes a function \(F:\{0,1\}^n \rightarrow \{0,1\}\) within \(t\) seconds if whenever we set the stimuli to some value \(x\in \{0,1\}^n\), if we measure the output after \(t\) seconds. We can phrase the PECTT as stipulating that whenever there exists such a system \(Z\) computes \(F\) within \(t\) seconds, there exists a NAND program that computes \(F\) of at most \(\alpha(Vt)^2\) lines, where \(\alpha\) is some normalization constant.We can also consider variants where we use surface area instead of volume, or use a different power than \(2\). However, none of these choices makes a qualitative difference to the discussion below. In particular, suppose that \(F:\{0,1\}^n \rightarrow \{0,1\}\) is a function that requires \(2^n/(100n)>2^{0.8n}\) lines for any NAND program (we have seen that such functions exist in the last lecture). Then the PECTT would imply that either the volume or the time of a system that computes \(F\) will have to be at least \(2^{0.2 n}/\sqrt{\alpha}\). To fully make it concrete, we need to decide on the units for measuring time and volume, and the normalization constant \(\alpha\). One conservative choice is to assume that we could squeeze computation to the absolute physical limits (which are many orders of magnitude beyond current technology). This corresponds to setting \(\alpha=1\) and using the Planck units for volume and time. The Planck length \(\ell_P\) (which is, roughly speaking, the shortest distance that can theoretically be measured) is roughly \(2^{-120}\) meters. The Planck time \(t_P\) (which is the time it takes for light to travel one Planck length) is about \(2^{-150}\) seconds. In the above setting, if a function \(F\) takes, say, 1KB of input (e.g., roughly \(10^4\) bits, which can encode a \(100\) by \(100\) bitmap image), and requires at least \(2^{0.8 n}= 2^{0.8 \cdot 10^4}\) NAND lines to compute, then any physical system that computes it would require either volume of \(2^{0.2\cdot 10^4}\) Planck length cubed, which is more than \(2^{1500}\) meters cubed or take at least \(2^{0.2 \cdot 10^4}\) Planck Time units, which is larger than \(2^{1500}\) seconds. To get a sense of how big that number is, note that the universe is only about \(2^{60}\) seconds old, and its observable radius is only roughly \(2^{90}\) meters. This suggests that it is possible to empirically falsify the PECTT by presenting a smaller-than-universe-size system that solves such a function.There are of course several hurdles to refuting the PECTT in this way, one of which is that we can’t actually test the system on all possible inputs. However, it turns we can get around this issue using notions such as interactive proofs and program checking that we will see later in this course. Another, perhaps more salient problem, is that while we know many hard functions exist, at the moment there is no single explicit function \(F:\{0,1\}^n \rightarrow \{0,1\}\) for which we can prove an \(\omega(n)\) (let alone \(\Omega(2^n/n)\)) lower bound on the number of lines that a NAND program needs to compute it.

Attempts at refuting the PECTT

One of the admirable traits of mankind is the refusal to accept limitations. In the best case this is manifested by people achieving longstanding “impossible” challenges such as heavier-than-air flight, putting a person on the moon, circumnavigating the globe, or even resolving Fermat’s Last Theorem. In the worst case it is manifested by people continually following the footsteps of previous failures to try to do proven-impossible tasks such as build a perpetual motion machine, trisect an angle with a compass and straightedge, or refute Bell’s inequality. The Physical Extended Church Turing thesis (in its various forms) has attracted both types of people. Here are some physical devices that have been speculated to achieve computational tasks that cannot be done by not-too-large NAND programs:

Scott Aaronson tests a candidate device for computing Steiner trees using soap bubbles.

While even the precise phrasing of the PECTT, let alone understanding its correctness, is still a subject of research, some variant of it is already implicitly assumed in practice. A statement such as “this cryptosystem provides 128 bits of security” really means that (a) it is conjectured that there is no Boolean circuit (or, equivalently, a NAND gate) of size much smaller than \(2^{128}\) that can break the system,We say “conjectured” and not “proved” because, while we can phrase such a statement as a precise mathematical conjecture, at the moment we are unable to prove such a statement for any cryptosystem. This is related to the P vs NP question we will discuss in future lectures and (b) we assume that no other physical mechanism can do better, and hence it would take roughly a \(2^{128}\) amount of “resources” to break the system.

Lecture summary

Exercises

For every one of the following sets, either prove that it is a universal basis or prove that it is not. 1. \(B = \{ \wedge, \vee, \neg \}\). (To make all of them be function on two inputs, define \(\neg(x,y)=\overline{x}\).)
2. \(B = \{ \wedge, \vee \}\).
3. \(B= \{ \oplus,0,1 \}\) where \(\oplus:\{0,1\}^2 \rightarrow \{0,1\}\) is the XOR function and \(0\) and \(1\) are the constant functions that output \(0\) and \(1\).
4. \(B = \{ LOOKUP_1,0,1 \}\) where \(0\) and \(1\) are the constant functions as above and \(LOOKUP_1:\{0,1\}^3 \rightarrow \{0,1\}\) satisfies \(LOOKUP_1(a,b,c)\) equals \(a\) if \(c=0\) and equals \(b\) if \(c=1\).

Prove that for every subset \(B\) of the functions from \(\{0,1\}^k\) to \(\{0,1\}\), if \(B\) is universal then there is a \(B\)-circuit of at most \(O(k)\) gates to compute the \(NAND\) function (you can start by showing that there is a \(B\) circuit of at most \(O(k^{16})\) gates).Thanks to Alec Sun for solving this problem.

Prove that for every \(w,t\), the function \(T_{w,t}\) can be computed by a NAND program of at most \(O(k^3)\) lines.

Bibliographical notes

Scott Aaronson’s blog post on how information is physical is a good discussion on issues related to the physical extended Church-Turing Physics. Aaronson’s survey on NP complete problems and physical reality is also a great source for some of these issues, though might be easier to read after we reach the lectures on NP and NP completeness.

Further explorations

Some topics related to this lecture that might be accessible to advanced students include:

Acknowledgements