Introduction to Theoretical Computer Science — Boaz Barak


\[ \newcommand{\undefined}{} \newcommand{\hfill}{} \newcommand{\qedhere}{\square} \newcommand{\qed}{\square} \newcommand{\ensuremath}[1]{#1} \newcommand{\bbA}{\mathbb A} \newcommand{\bbB}{\mathbb B} \newcommand{\bbC}{\mathbb C} \newcommand{\bbD}{\mathbb D} \newcommand{\bbE}{\mathbb E} \newcommand{\bbF}{\mathbb F} \newcommand{\bbG}{\mathbb G} \newcommand{\bbH}{\mathbb H} \newcommand{\bbI}{\mathbb I} \newcommand{\bbJ}{\mathbb J} \newcommand{\bbK}{\mathbb K} \newcommand{\bbL}{\mathbb L} \newcommand{\bbM}{\mathbb M} \newcommand{\bbN}{\mathbb N} \newcommand{\bbO}{\mathbb O} \newcommand{\bbP}{\mathbb P} \newcommand{\bbQ}{\mathbb Q} \newcommand{\bbR}{\mathbb R} \newcommand{\bbS}{\mathbb S} \newcommand{\bbT}{\mathbb T} \newcommand{\bbU}{\mathbb U} \newcommand{\bbV}{\mathbb V} \newcommand{\bbW}{\mathbb W} \newcommand{\bbX}{\mathbb X} \newcommand{\bbY}{\mathbb Y} \newcommand{\bbZ}{\mathbb Z} \newcommand{\sA}{\mathscr A} \newcommand{\sB}{\mathscr B} \newcommand{\sC}{\mathscr C} \newcommand{\sD}{\mathscr D} \newcommand{\sE}{\mathscr E} \newcommand{\sF}{\mathscr F} \newcommand{\sG}{\mathscr G} \newcommand{\sH}{\mathscr H} \newcommand{\sI}{\mathscr I} \newcommand{\sJ}{\mathscr J} \newcommand{\sK}{\mathscr K} \newcommand{\sL}{\mathscr L} \newcommand{\sM}{\mathscr M} \newcommand{\sN}{\mathscr N} \newcommand{\sO}{\mathscr O} \newcommand{\sP}{\mathscr P} \newcommand{\sQ}{\mathscr Q} \newcommand{\sR}{\mathscr R} \newcommand{\sS}{\mathscr S} \newcommand{\sT}{\mathscr T} \newcommand{\sU}{\mathscr U} \newcommand{\sV}{\mathscr V} \newcommand{\sW}{\mathscr W} \newcommand{\sX}{\mathscr X} \newcommand{\sY}{\mathscr Y} \newcommand{\sZ}{\mathscr Z} \newcommand{\sfA}{\mathsf A} \newcommand{\sfB}{\mathsf B} \newcommand{\sfC}{\mathsf C} \newcommand{\sfD}{\mathsf D} \newcommand{\sfE}{\mathsf E} \newcommand{\sfF}{\mathsf F} \newcommand{\sfG}{\mathsf G} \newcommand{\sfH}{\mathsf H} \newcommand{\sfI}{\mathsf I} \newcommand{\sfJ}{\mathsf J} \newcommand{\sfK}{\mathsf K} \newcommand{\sfL}{\mathsf L} \newcommand{\sfM}{\mathsf M} \newcommand{\sfN}{\mathsf N} \newcommand{\sfO}{\mathsf O} \newcommand{\sfP}{\mathsf P} \newcommand{\sfQ}{\mathsf Q} \newcommand{\sfR}{\mathsf R} \newcommand{\sfS}{\mathsf S} \newcommand{\sfT}{\mathsf T} \newcommand{\sfU}{\mathsf U} \newcommand{\sfV}{\mathsf V} \newcommand{\sfW}{\mathsf W} \newcommand{\sfX}{\mathsf X} \newcommand{\sfY}{\mathsf Y} \newcommand{\sfZ}{\mathsf Z} \newcommand{\cA}{\mathcal A} \newcommand{\cB}{\mathcal B} \newcommand{\cC}{\mathcal C} \newcommand{\cD}{\mathcal D} \newcommand{\cE}{\mathcal E} \newcommand{\cF}{\mathcal F} \newcommand{\cG}{\mathcal G} \newcommand{\cH}{\mathcal H} \newcommand{\cI}{\mathcal I} \newcommand{\cJ}{\mathcal J} \newcommand{\cK}{\mathcal K} \newcommand{\cL}{\mathcal L} \newcommand{\cM}{\mathcal M} \newcommand{\cN}{\mathcal N} \newcommand{\cO}{\mathcal O} \newcommand{\cP}{\mathcal P} \newcommand{\cQ}{\mathcal Q} \newcommand{\cR}{\mathcal R} \newcommand{\cS}{\mathcal S} \newcommand{\cT}{\mathcal T} \newcommand{\cU}{\mathcal U} \newcommand{\cV}{\mathcal V} \newcommand{\cW}{\mathcal W} \newcommand{\cX}{\mathcal X} \newcommand{\cY}{\mathcal Y} \newcommand{\cZ}{\mathcal Z} \newcommand{\bfA}{\mathbf A} \newcommand{\bfB}{\mathbf B} \newcommand{\bfC}{\mathbf C} \newcommand{\bfD}{\mathbf D} \newcommand{\bfE}{\mathbf E} \newcommand{\bfF}{\mathbf F} \newcommand{\bfG}{\mathbf G} \newcommand{\bfH}{\mathbf H} \newcommand{\bfI}{\mathbf I} \newcommand{\bfJ}{\mathbf J} \newcommand{\bfK}{\mathbf K} \newcommand{\bfL}{\mathbf L} \newcommand{\bfM}{\mathbf M} \newcommand{\bfN}{\mathbf N} \newcommand{\bfO}{\mathbf O} \newcommand{\bfP}{\mathbf P} \newcommand{\bfQ}{\mathbf Q} \newcommand{\bfR}{\mathbf R} \newcommand{\bfS}{\mathbf S} \newcommand{\bfT}{\mathbf T} \newcommand{\bfU}{\mathbf U} \newcommand{\bfV}{\mathbf V} \newcommand{\bfW}{\mathbf W} \newcommand{\bfX}{\mathbf X} \newcommand{\bfY}{\mathbf Y} \newcommand{\bfZ}{\mathbf Z} \newcommand{\rmA}{\mathrm A} \newcommand{\rmB}{\mathrm B} \newcommand{\rmC}{\mathrm C} \newcommand{\rmD}{\mathrm D} \newcommand{\rmE}{\mathrm E} \newcommand{\rmF}{\mathrm F} \newcommand{\rmG}{\mathrm G} \newcommand{\rmH}{\mathrm H} \newcommand{\rmI}{\mathrm I} \newcommand{\rmJ}{\mathrm J} \newcommand{\rmK}{\mathrm K} \newcommand{\rmL}{\mathrm L} \newcommand{\rmM}{\mathrm M} \newcommand{\rmN}{\mathrm N} \newcommand{\rmO}{\mathrm O} \newcommand{\rmP}{\mathrm P} \newcommand{\rmQ}{\mathrm Q} \newcommand{\rmR}{\mathrm R} \newcommand{\rmS}{\mathrm S} \newcommand{\rmT}{\mathrm T} \newcommand{\rmU}{\mathrm U} \newcommand{\rmV}{\mathrm V} \newcommand{\rmW}{\mathrm W} \newcommand{\rmX}{\mathrm X} \newcommand{\rmY}{\mathrm Y} \newcommand{\rmZ}{\mathrm Z} \newcommand{\paren}[1]{( #1 )} \newcommand{\Paren}[1]{\left( #1 \right)} \newcommand{\bigparen}[1]{\bigl( #1 \bigr)} \newcommand{\Bigparen}[1]{\Bigl( #1 \Bigr)} \newcommand{\biggparen}[1]{\biggl( #1 \biggr)} \newcommand{\Biggparen}[1]{\Biggl( #1 \Biggr)} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\Abs}[1]{\left\lvert #1 \right\rvert} \newcommand{\bigabs}[1]{\bigl\lvert #1 \bigr\rvert} \newcommand{\Bigabs}[1]{\Bigl\lvert #1 \Bigr\rvert} \newcommand{\biggabs}[1]{\biggl\lvert #1 \biggr\rvert} \newcommand{\Biggabs}[1]{\Biggl\lvert #1 \Biggr\rvert} \newcommand{\card}[1]{\lvert #1 \rvert} \newcommand{\Card}[1]{\left\lvert #1 \right\rvert} \newcommand{\bigcard}[1]{\bigl\lvert #1 \bigr\rvert} \newcommand{\Bigcard}[1]{\Bigl\lvert #1 \Bigr\rvert} \newcommand{\biggcard}[1]{\biggl\lvert #1 \biggr\rvert} \newcommand{\Biggcard}[1]{\Biggl\lvert #1 \Biggr\rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\Norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\bignorm}[1]{\bigl\lVert #1 \bigr\rVert} \newcommand{\Bignorm}[1]{\Bigl\lVert #1 \Bigr\rVert} \newcommand{\biggnorm}[1]{\biggl\lVert #1 \biggr\rVert} \newcommand{\Biggnorm}[1]{\Biggl\lVert #1 \Biggr\rVert} \newcommand{\iprod}[1]{\langle #1 \rangle} \newcommand{\Iprod}[1]{\left\langle #1 \right\rangle} \newcommand{\bigiprod}[1]{\bigl\langle #1 \bigr\rangle} \newcommand{\Bigiprod}[1]{\Bigl\langle #1 \Bigr\rangle} \newcommand{\biggiprod}[1]{\biggl\langle #1 \biggr\rangle} \newcommand{\Biggiprod}[1]{\Biggl\langle #1 \Biggr\rangle} \newcommand{\set}[1]{\lbrace #1 \rbrace} \newcommand{\Set}[1]{\left\lbrace #1 \right\rbrace} \newcommand{\bigset}[1]{\bigl\lbrace #1 \bigr\rbrace} \newcommand{\Bigset}[1]{\Bigl\lbrace #1 \Bigr\rbrace} \newcommand{\biggset}[1]{\biggl\lbrace #1 \biggr\rbrace} \newcommand{\Biggset}[1]{\Biggl\lbrace #1 \Biggr\rbrace} \newcommand{\bracket}[1]{\lbrack #1 \rbrack} \newcommand{\Bracket}[1]{\left\lbrack #1 \right\rbrack} \newcommand{\bigbracket}[1]{\bigl\lbrack #1 \bigr\rbrack} \newcommand{\Bigbracket}[1]{\Bigl\lbrack #1 \Bigr\rbrack} \newcommand{\biggbracket}[1]{\biggl\lbrack #1 \biggr\rbrack} \newcommand{\Biggbracket}[1]{\Biggl\lbrack #1 \Biggr\rbrack} \newcommand{\ucorner}[1]{\ulcorner #1 \urcorner} \newcommand{\Ucorner}[1]{\left\ulcorner #1 \right\urcorner} \newcommand{\bigucorner}[1]{\bigl\ulcorner #1 \bigr\urcorner} \newcommand{\Bigucorner}[1]{\Bigl\ulcorner #1 \Bigr\urcorner} \newcommand{\biggucorner}[1]{\biggl\ulcorner #1 \biggr\urcorner} \newcommand{\Biggucorner}[1]{\Biggl\ulcorner #1 \Biggr\urcorner} \newcommand{\ceil}[1]{\lceil #1 \rceil} \newcommand{\Ceil}[1]{\left\lceil #1 \right\rceil} \newcommand{\bigceil}[1]{\bigl\lceil #1 \bigr\rceil} \newcommand{\Bigceil}[1]{\Bigl\lceil #1 \Bigr\rceil} \newcommand{\biggceil}[1]{\biggl\lceil #1 \biggr\rceil} \newcommand{\Biggceil}[1]{\Biggl\lceil #1 \Biggr\rceil} \newcommand{\floor}[1]{\lfloor #1 \rfloor} \newcommand{\Floor}[1]{\left\lfloor #1 \right\rfloor} \newcommand{\bigfloor}[1]{\bigl\lfloor #1 \bigr\rfloor} \newcommand{\Bigfloor}[1]{\Bigl\lfloor #1 \Bigr\rfloor} \newcommand{\biggfloor}[1]{\biggl\lfloor #1 \biggr\rfloor} \newcommand{\Biggfloor}[1]{\Biggl\lfloor #1 \Biggr\rfloor} \newcommand{\lcorner}[1]{\llcorner #1 \lrcorner} \newcommand{\Lcorner}[1]{\left\llcorner #1 \right\lrcorner} \newcommand{\biglcorner}[1]{\bigl\llcorner #1 \bigr\lrcorner} \newcommand{\Biglcorner}[1]{\Bigl\llcorner #1 \Bigr\lrcorner} \newcommand{\bigglcorner}[1]{\biggl\llcorner #1 \biggr\lrcorner} \newcommand{\Bigglcorner}[1]{\Biggl\llcorner #1 \Biggr\lrcorner} \newcommand{\expr}[1]{\langle #1 \rangle} \newcommand{\Expr}[1]{\left\langle #1 \right\rangle} \newcommand{\bigexpr}[1]{\bigl\langle #1 \bigr\rangle} \newcommand{\Bigexpr}[1]{\Bigl\langle #1 \Bigr\rangle} \newcommand{\biggexpr}[1]{\biggl\langle #1 \biggr\rangle} \newcommand{\Biggexpr}[1]{\Biggl\langle #1 \Biggr\rangle} \newcommand{\e}{\varepsilon} \newcommand{\eps}{\varepsilon} \newcommand{\from}{\colon} \newcommand{\super}[2]{#1^{(#2)}} \newcommand{\varsuper}[2]{#1^{\scriptscriptstyle (#2)}} \newcommand{\tensor}{\otimes} \newcommand{\eset}{\emptyset} \newcommand{\sse}{\subseteq} \newcommand{\sst}{\substack} \newcommand{\ot}{\otimes} \newcommand{\Esst}[1]{\bbE_{\substack{#1}}} \newcommand{\vbig}{\vphantom{\bigoplus}} \newcommand{\seteq}{\mathrel{\mathop:}=} \newcommand{\defeq}{\stackrel{\mathrm{def}}=} \newcommand{\Mid}{\mathrel{}\middle|\mathrel{}} \newcommand{\Ind}{\mathbf 1} \newcommand{\bits}{\{0,1\}} \newcommand{\sbits}{\{\pm 1\}} \newcommand{\R}{\mathbb R} \newcommand{\Rnn}{\R_{\ge 0}} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\Q}{\mathbb Q} \newcommand{\mper}{\,.} \newcommand{\mcom}{\,,} \DeclareMathOperator{\Id}{Id} \DeclareMathOperator{\cone}{cone} \DeclareMathOperator{\vol}{vol} \DeclareMathOperator{\val}{val} \DeclareMathOperator{\opt}{opt} \DeclareMathOperator{\Opt}{Opt} \DeclareMathOperator{\Val}{Val} \DeclareMathOperator{\LP}{LP} \DeclareMathOperator{\SDP}{SDP} \DeclareMathOperator{\Tr}{Tr} \DeclareMathOperator{\Inf}{Inf} \DeclareMathOperator{\poly}{poly} \DeclareMathOperator{\polylog}{polylog} \DeclareMathOperator{\argmax}{arg\,max} \DeclareMathOperator{\argmin}{arg\,min} \DeclareMathOperator{\qpoly}{qpoly} \DeclareMathOperator{\qqpoly}{qqpoly} \DeclareMathOperator{\conv}{conv} \DeclareMathOperator{\Conv}{Conv} \DeclareMathOperator{\supp}{supp} \DeclareMathOperator{\sign}{sign} \DeclareMathOperator{\mspan}{span} \DeclareMathOperator{\mrank}{rank} \DeclareMathOperator{\E}{\mathbb E} \DeclareMathOperator{\pE}{\tilde{\mathbb E}} \DeclareMathOperator{\Pr}{\mathbb P} \DeclareMathOperator{\Span}{Span} \DeclareMathOperator{\Cone}{Cone} \DeclareMathOperator{\junta}{junta} \DeclareMathOperator{\NSS}{NSS} \DeclareMathOperator{\SA}{SA} \DeclareMathOperator{\SOS}{SOS} \newcommand{\iprod}[1]{\langle #1 \rangle} \newcommand{\R}{\mathbb{R}} \newcommand{\cE}{\mathcal{E}} \newcommand{\E}{\mathbb{E}} \newcommand{\pE}{\tilde{\mathbb{E}}} \newcommand{\N}{\mathbb{N}} \renewcommand{\P}{\mathcal{P}} \notag \]
\[ \newcommand{\sleq}{\ensuremath{\preceq}} \newcommand{\sgeq}{\ensuremath{\succeq}} \newcommand{\diag}{\ensuremath{\mathrm{diag}}} \newcommand{\support}{\ensuremath{\mathrm{support}}} \newcommand{\zo}{\ensuremath{\{0,1\}}} \newcommand{\pmo}{\ensuremath{\{\pm 1\}}} \newcommand{\uppersos}{\ensuremath{\overline{\mathrm{sos}}}} \newcommand{\lambdamax}{\ensuremath{\lambda_{\mathrm{max}}}} \newcommand{\rank}{\ensuremath{\mathrm{rank}}} \newcommand{\Mslow}{\ensuremath{M_{\mathrm{slow}}}} \newcommand{\Mfast}{\ensuremath{M_{\mathrm{fast}}}} \newcommand{\Mdiag}{\ensuremath{M_{\mathrm{diag}}}} \newcommand{\Mcross}{\ensuremath{M_{\mathrm{cross}}}} \newcommand{\eqdef}{\ensuremath{ =^{def}}} \newcommand{\threshold}{\ensuremath{\mathrm{threshold}}} \newcommand{\vbls}{\ensuremath{\mathrm{vbls}}} \newcommand{\cons}{\ensuremath{\mathrm{cons}}} \newcommand{\edges}{\ensuremath{\mathrm{edges}}} \newcommand{\cl}{\ensuremath{\mathrm{cl}}} \newcommand{\xor}{\ensuremath{\oplus}} \newcommand{\1}{\ensuremath{\mathrm{1}}} \notag \]
\[ \newcommand{\transpose}[1]{\ensuremath{#1{}^{\mkern-2mu\intercal}}} \newcommand{\dyad}[1]{\ensuremath{#1#1{}^{\mkern-2mu\intercal}}} \newcommand{\nchoose}[1]{\ensuremath{{n \choose #1}}} \newcommand{\generated}[1]{\ensuremath{\langle #1 \rangle}} \notag \]

Code as data, data as code

  • Understand one of the most important concepts in computing: duality between code and data.
  • Build up comfort in moving between different representations of programs.
  • Follow the construction of a “universal NAND program” that can evaluate other NAND programs given their representation.
  • See and understand the proof of a major result that compliments the result last lecture: some functions require an exponential number of NAND lines to compute.

“The term code script is, of course, too narrow. The chromosomal structures are at the same time instrumental in bringing about the development they foreshadow. They are law-code and executive power - or, to use another simile, they are architect’s plan and builder’s craft - in one.” , Erwin Schrödinger, 1944.

“The importance of the universal machine is clear. We do not need to have an infinity of different machines doing different jobs. … The engineering problem of producing various machines for various jobs is replaced by the office work of “programming” the universal machine”, Alan Turing, 1948

A NAND program can be thought of as simply a sequence of symbols, each of which can be encoded with zeros and ones using (for example) the ASCII standard. Thus we can represent every NAND program as a binary string. This statement seems obvious but it is actually quite profound. It means that we can treat a NAND program both as instructions to carrying computation and also as data that could potentially be input to other computations.

This correspondence between code and data is one of the most fundamental aspects of computing. It underlies the notion of general purpose computers, that are not pre-wired to compute only one task, and it is also the basis of our hope for obtaining general artificial intelligence. This concept finds immense use in all areas of computing, from scripting languages to machine learning, but it is fair to say that we haven’t yet fully mastered it. Indeed many security exploits involve cases such as “buffer overflows” when attackers manage to inject code where the system expected only “passive” data. The idea of code as data reaches beyond the realm of electronic computers. For example, DNA can be thought of as both a program and data (in the words of Schrödinger, who wrote before DNA’s discovery a book that inspired Watson and Crick, it is both “architect’s plan and builder’s craft”).

As illustrated in this xkcd cartoon, many exploits, including buffer overflow, SQL injections, and more, utilize the blurry line between “active programs” and “static strings”.

A NAND interpreter in NAND

One of the most interesting consequences of the fact that we can represent programs as strings is the following theorem:

For every \(S,n,m \in \N\) there is a NAND program that computes the function \[ EVAL_{S,n,m}:\{0,1\}^{S+n} \rightarrow \{0,1\}^m \] defined as follows: For every string \((P,x)\) where \(P \in \{0,1\}^S\) and \(x\in\{0,1\}^n\), if \(P\) describes a NAND program with \(n\) input bits and \(m\) outputs bits, then \(EVAL_{S,n,m}(P,x)\) is the output of this program on input \(x\).If \(P\) does not describe a program then we don’t care what \(EVAL_{S,n,m}(P,x)\) is, but for concreteness we will set it to be \(0^m\). Note that in this theorem we use \(S\) to denote the number of bits describing the program, rather than the number of lines in it. However, these two quantities are very closely related.

Of course to fully specify \(EVAL_{S,n,m}\), we need to fix a precise representation scheme for NAND programs as binary strings. We can simply use the ASCII representation, though we will use a more convenient representation. But regardless of the choice of representation, Reference:bounded-univ is an immediate corollary of the fact that every finite function, and so in particular the function \(EVAL_{S,n,m}\) above, can be computed by some NAND program.

Reference:bounded-univ can be thought of as providing a “NAND interpreter in NAND”. That is, for a particular size bound, we give a single NAND program that can evaluate all NAND programs of that size. We call this NAND program \(U\) that computes \(EVAL_{S,n,m}\) a bounded universal program. “Universal” stands for the fact that this is a single program that can evaluate arbitrary code, where “bounded” stands for the fact that \(U\) only evaluates programs of bounded size. Of course this limitation is inherent for the NAND programming language where an \(N\)-line program can never compute a function with more than \(N\) inputs. (We will later on introduce the concept of loops, that allows to escape this limitation.)

It turns out that we don’t even need to pay that much of an overhead for universality

For every \(S,n,m \in \N\) there is a NAND program of at most \(O(S \log S)\) lines that computes the function \(EVAL_{S,n,m}:\{0,1\}^{S+n} \rightarrow \{0,1\}^m\) defined above.

We will prove a weaker version of Reference:eff-bounded-univ, that will use a large number of \(O(S^2)\) lines instead of \(O(S \log S)\) as stated in the theorem. We will sketch how we can improve this proof and get the \(O(S \log S)\) bound in a future lecture. Unlike Reference:bounded-univ, Reference:eff-bounded-univ is not a trivial corollary of the fact that every function can be computed, and takes much more effort to prove. It requires us to present a concrete NAND program for the \(EVAL_{S,n,m}\) function. We will do so in several stages.

First, we will spell out precisely how to represent NAND programs as strings. We can prove Reference:eff-bounded-univ using the ASCII representation, but a “cleaner” representation will be more convenient for us. Then, we will show how we can write a program to compute \(EVAL_{S,n,m}\) in Python.We will not use much about Python, and a reader that has familiarity with programming in any language should be able to follow along. Finally, we will show how we can transform this Python program into a NAND program.

Concrete representation for NAND programs

In the Harvard Mark I computer, a program was represented as a list of triples of numbers, which were then encoded by perforating holes in a control card.

We can use the canonical form of NAND program (as per Reference:NANDcanonical) to represent it as a string. That is, if a NAND program has \(s\) lines and \(t\) distinct variables (where \(t \leq 3s\)) then we encode every a line of the program such as foo_54 := baz NAND blah_22 as the triple \((a,b,c)\) where \(a,b,c\) are the numbers corresponding to foo_54,bar,blah_22 respectively. We choose the ordering such that the numbers \(0,1,\ldots,n-1\) encode the variables x_0,\(\ldots\),x_\(\expr{n-1}\) and the numbers \(t-m,\ldots,t-1\) encode the variables y_0,\(\ldots\),y_\(\expr{m-1}\). Thus the representation of a program \(P\) of \(n\) inputes and \(m\) outputs is simply the list of triples of \(P\) in its canonical form. For example, the XOR program:

u_0   := x_0 NAND x_1
v_0   := x_0 NAND u_0
w_0   := x_1 NAND u_0
y_0   := v_0 NAND w_0

is represented by the following list of four triples:

[[2, 0, 1], [3, 0, 2], [4, 1, 2], [5, 3, 4]]

Note that even if we renamed u, v and w to foo, bar and blah then the representation of the program will remain the same (which is fine, since it does not change its semantics).
It is very easy to transform a string containing the program code to a the list-of-tuples representation; for example, it can be done in 15 lines of Python.If you’re curious what these 15 lines are, see the appendix or the website

To evaluate a NAND program \(P\) given in this representation, on an input \(x\), we will do the following:

The following is a Python function EVAL that on input \(L,n,m,x\) where \(L\) is a list of triples representing an \(n\)-input \(m\)-output program, and \(x\) is list of \(0/1\) values, returns the result of the execution of the NAND program represented by \(P\) on \(x\):To keep things simple, we will not worry about the case that \(L\) does not represent a valid program of \(n\) inputs and \(m\) outputs. Also, there is nothing special about Python. We could have easily presented a corresponding function in JavaScript, C, OCaml, or any other programming language.

# Evaluates an n-input, m-output NAND program L on input x
# L is given in the canonical list of triples representation
# (first n variables are inputs and last m variables are outputs)
def EVAL(L,n,m,x):
    s = len(L)
    avars = [0]*(3*s) # initialize variable array to zeroes, 3s is large enough to hold all variables
    avars[:n] = x # set first n vars to x

    for (a,b,c) in L:  # evaluate each triple
        u = avars[b]
        v = avars[c]
        val = 1-u*v # i.e., the NAND of u and v
        avars[a] = val

    t = max([max(triple) for triple in L])+1 # num of vars in L

    return avars[t-m:] # output last m variables

For example, if we run

[[2, 0, 1], [3, 0, 2], [4, 1, 2], [5, 3, 4]],

then this corresponds to running our XOR program on the input \((0,1)\) and hence the resulting output is [1].

Accessing an element of the array avars at a given index takes a constant number of basic operations.Python does not distinguish between lists and arrays, but allows constant time random access to an indexed elements to both of them. One could argue that if we allowed programs of truly unbounded length (e.g., larger than \(2^{64}\)) then the price would not be constant but logarithmic in the length of the array/lists, but the difference between \(O(1)\) and \(O(\log s)\) will not be important for our discussions. Hence (since \(n,m \leq s\) and \(t \leq 3s\)), the program above will use \(O(s)\) basic operations.

A NAND interpreter in NAND

We now turn to actually proving Reference:eff-bounded-univ. To do this, it is of course not enough to give a Python program. We need to (a) give a precise representation of programs as binary strings, and (b) show how we compute the \(EVAL_{S,n,m}\) function on this representation by a NAND program.

First, if a NAND program has \(s\) lines, then since it can have at most \(3s\) distinct variables, it can be represented by a string of size \(S=3s\lambda\) where \(\lambda = \lceil \log(3s) \rceil\), by simply concatenating the binary representations of all the \(3s\) numbers (adding leading zeroes as needed to make each number represented by a string of exactly \(\lambda\) bits). So, our job is to transform, for every \(s,n,m\), the Python code above to a NAND program \(U_{s,n,m}\) that computes the function \(EVAL_{S,n,m}\) for \(S=3s\lambda\). That is, given any representation \(r \in \{0,1\}^S\) of an \(s\)-line \(n\)-input \(m\)-output NAND program \(P\), and string \(w \in \{0,1\}^n\), \(U_{s,n,m}(rw)\) outputs \(P(w)\).

Before reading further, try to think how you could give a “constructive proof” of Reference:eff-bounded-univ. That is, think of how you would write, in the programming language of your choice, a function universal(s,n,m) that on input \(s,n,m\) outputs the code for the NAND program \(U_{s,n,m}\) such that \(U_{s,n,m}\) computes \(EVAL_{S,n,m}\). Note that there is a subtle but crucial difference between this function and the Python EVAL program described above. Rather than actually evaluating a given program \(P\) on some input \(w\), the function universal should output the code of a NAND program that computes the map \((P,w) \mapsto P(w)\).

Let \(n,m,s \in \N\) be some numbers satisfying \(s \geq n\) and \(s \geq m\). We now describe the NAND program \(U_{n,m,s}\) that computes \(EVAL_{S,n,m}\) for \(S = 3s\lambda\) and \(\lambda = \lceil \log(3s) \rceil\). Our construction will follow very closely the Python implementation of EVAL above:We allow ourselves use of syntactic sugar in describing the program. We can always “unsweeten” the program later.

  1. \(U_{s,n,m}\) will contain variables avars_0,\(\ldots\),avars_\(\expr{2^\lambda-1}\). (This corresponds to the line avars = [0]*t in the Python function EVAL.)
  2. For \(i=0,\ldots,n-1\), we add the line avars_\(\expr{i}\) := x_\(\expr{3s\lambda+i}\) to \(U_{s,n,m}\). Recall that the input to \(EVAL_{S,n,m}\) is a string \(rw \in \{0,1\}^{3s\lambda + n}\) where \(r\in \{0,1\}^{3s\lambda}\) is the representation of the program \(P\) and \(w\in \{0,1\}^n\) is the input that the program should be applied on. Hence this step copies the input to the variables avars_0,\(\ldots\),avars_\(\expr{n-1}\). (This corresponds to the line avars[:n] = x in EVAL.)
  3. For \(\ell=0,\ldots,s-1\) we add the following code to \(U_{s,n,m}\):

    1. For all \(j\in [\lambda]\), add the code a_\(\expr{j}\) := x_\(\expr{3\ell\lambda+j}\), b_\(\expr{j}\) := x_\(\expr{3\ell\lambda+\lambda+j}\) and c_\(\expr{j}\) := x_\(\expr{3\ell\lambda+2\lambda+j}\). In other words, we add the code to copy to a, b, c the three \(\lambda\)-bit long strings containing the binary representation the \(\ell\)-th triple \((a,b,c)\) in the input program. (This corresponds to the line for (a,b,c) in L: in EVAL.)
    2. Add the code u := LOOKUP(avars_0 ,\(\ldots\), avars_\(\expr{2^\lambda-1}\),b_0,\(\ldots\),b_\(\expr{\lambda-1}\)) and v := LOOKUP(avars_0 , \(\ldots\),avars_\(\expr{2^\lambda-1}\),c_0,\(\ldots\),c_\(\expr{\lambda-1}\)) where LOOKUP is the macro that computes \(LOOKUP_\lambda:\{0,1\}^{2^\lambda+\lambda}\rightarrow \{0,1\}\). Recall that we defined \(LOOKUP_\lambda(A,i)=A_i\) for every \(A\in \{0,1\}^{2^\lambda}\) and \(i\in \{0,1\}^\lambda\) (using the binary representation to identify \(i\) with an index in \([2^\lambda]\)). Hence this code means that u gets the value of avars_\(\expr{b}\) and v gets the value of avars_\(\expr{c}\). (This corresponds to the lines u = avars[b] and v = avars[c] in EVAL.)
    3. Add the code val := u NAND v (i.e., w gets the value that should be stored in avars_\(\expr{a}\)). (This corresponds to the line val = 1-u*v in EVAL.)
    4. Add the code newvars_0,\(\ldots\),newvars_\(\expr{2^\lambda-1}\) := UPDATE(avars_0,\(\ldots\),avars_\(\expr{2^\lambda-1}\), a_0,\(\ldots\),a_\(\expr{\lambda-1}\),val``), where UPDATE is a macro that computes the function \(UPDATE_\lambda:\{0,1\}^{2^\lambda +\lambda +1} \rightarrow \{0,1\}^{2^\lambda}\) defined as follows: for every \(A \in \{0,1\}^{2^\lambda}\), \(i\in \{0,1\}^\lambda\) and \(v\in \{0,1\}\), \(UPDATE_\lambda(A,i,v)=A'\) such that \(A'_j = A_j\) for all \(j \neq i\) and \(A'_i = v\) (identifying \(i\) with an index in \([2^\lambda]\)). See below for discussions on how to implement UPDATE and other macros.
    5. Add the code avars_\(\expr{j}\) := newvars_\(\expr{j}\) for every \(j \in [2^\lambda]\) (i.e., update avars to newvars). (Steps 3.c and 3.d together correspond to the line avars[a] = val in EVAL.)
  4. After adding all the \(s\) snippets above in Step 3, we add to the program the code t_0,\(\ldots\),t_\(\expr{\lambda-1}\) := INC(MAX(avars_0,\(\ldots\),avars_\(2^\lambda\))) where MAX is a macro that computes the function \(MAX_{2^\lambda,\lambda}\) and we define \(MAX_{s,\lambda}:\{0,1\}^{s\lambda} \rightarrow \{0,1\}^\lambda\) to take the concatenation of the representation of \(s\) numbers in \([2^\lambda]\) and output the representation of the maximum number, and INC is a macro that computes the function \(INC_\lambda\) that increments a given number in \([2^\lambda]\) by one. (This corresponds to the line t = max([max(triple) for triple in L])+1 in EVAL.) We leave coming up with NAND programs for computing \(MAX_{s,\lambda}\) and \(INC_\lambda\) as an exercise for the reader.
  5. Finally we add for every \(j\in [m]\):

    1. The code idx_0,\(\ldots\),idx_\(\expr{\lambda-1}\) := SUBTRACT(t_0,\(\ldots\),t_\(\expr{\lambda}\),\(z_0\),\ldots,\(z_{\lambda-1}\)) where SUBTRACT is the code for subtracting two numbers in \([2^\lambda]\) given in their binary representation, and each \(z_h\) is equal to either zero or one depending on the \(h\)-th digit in the binary representation of the number \(m-j\).
    2. y_\(\expr{j}\) := LOOKUP( avars_0,\(\ldots\), avars_\(\expr{2^\lambda-1}\), idx_0,\(\ldots\), idx_\(\expr{\lambda-1}\) ). (Steps 5.a and 5.b together correspond to the line return avars[t-m:] in EVAL.)

To complete the description of this program, we need to show that we can implement the macros for LOOKUP,UPDATE,MAX,INC and SUBTRACT:

The total number of lines in \(U_{s,n,m}\) is dominated by the cost of step 3 above,It is a good exercise to verify that steps 1,2,4 and 5 above can be implemented in \(O(s \log s)\) lines. where we repeat \(s\) times the following:

  1. Copying the \(\ell\)-th triple to the variables a,b,c. Cost: \(O(\lambda)\) lines.
  2. Perform LOOKUP on a \(2^\lambda=O(s)\) variables avars_0,\(\ldots\), avars_\(\expr{2^\lambda-1}\). Cost: \(O(2^\lambda)=O(s)\) lines.
  3. Perform the UPDATE to update the \(2^\lambda\) variables avar_0,\(\ldots\), avars_\(\expr{2^\lambda-1}\) to newvars_0,\(\ldots\), newvars_\(\expr{2^\lambda-1}\). Since UPDATE makes \(O(2^\lambda)\) calls to \(EQUAL_\lambda\), and each such call costs \(O(\lambda)\) lines, the total cost for \(UPDATE\) is \(O(2^\lambda \lambda) = O(s \log s)\) lines.
  4. Copy newvars_0,\(\ldots\), newvars_\(\expr{2^\lambda-1}\) to avar_0,\(\ldots\), avars_\(\expr{2^\lambda-1}\). Cost: \(O(2^\lambda)\) lines.

Since the loop of step 3 is repeated \(s\) times, the total number of lines in \(U_{s,n,m}\) is \(O(s^2 \log s)\) which (since \(S=\Omega(s \log s)\)) is \(O(S^2)\).The website will (hopefully) eventually contain the implementation of the NAND program \(U_{s,n,m}\) where you can also play with it by feeding it various other programs as inputs. The NAND program above is less efficient that its Python counterpart, since NAND does not offer arrays with efficient random access. Hence for example the LOOKUP operation on an array of \(s\) bits takes \(\Omega(s)\) lines in NAND even though it takes \(O(1)\) steps (or maybe \(O(\log s)\) steps, depending how we count) in Python. We might see in a future lecture how to improve this to \(O(s \log s)\).

A Python interpreter in NAND

To prove Reference:eff-bounded-univ we essentially translated every line of the Python program for EVAL into an equivalent NAND snippet. It turns out that none of our reasoning was specific to the particular function \(EVAL\). It is possible to translate every Python program into an equivalent NAND program of comparable efficiency.More concretely, if the Python program takes \(T(n)\) operations on inputs of length at most \(n\) then we can find a NAND program of \(O(T(n) \log T(n))\) lines that agrees with the Python program on inputs of length \(n\). Actually doing so requires taking care of many details and is beyond the scope of this course, but let me convince you why you should believe it is possible in principle. We can use CPython (the reference implementation for Python), to evaluate every Python program using a C program. We can combine this with a C compiler to transform a Python program to various flavors of “machine language”.

So, to transform a Python program into an equivalent NAND program, it is enough to show how to transform a machine language program into an equivalent NAND program. One minimalistic (and hence convenient) family of machine languages is known as the ARM architecture which powers a great many mobile devices including essentially all Android devices.ARM stands for “Advanced RISC Machine” where RISC in turn stands for “Reduced instruction set computer”.
There are even simpler machine languages, such as the LEG acrhitecture for which a backend for the LLVM compiler was implemented (and hence can be the target of compiling any of large and growing list of languages that this compiler supports). Other examples include the TinyRAM architecture (motivated by interactive proof systems that we will discuss much later in this course) and the teaching-oriented Ridiculously Simple Computer architecture.The reverse direction of compiling NAND to C code, is much easier. We show code for a NAND2C function in the appendix.

Going one by one over the instruction sets of such computers and translating them to NAND snippets is no fun, but it is a feasible thing to do. In fact, ultimately this is very similar to the transformation that takes place in converting our high level code to actual silicon gates that (as we will see in the next lecture) are not so different from the operations of a NAND program.
Indeed, tools such as MyHDL that transform “Python to Silicon” can be used to convert a Python program to a NAND program.

The NAND programming language is just a teaching tool, and by no means do I suggest that writing NAND programs, or compilers to NAND, is a practical, useful, or even enjoyable activity. What I do want is to make sure you understand why it can be done, and to have the confidence that if your life (or at least your grade in this course) depended on it, then you would be able to do this. Understanding how programs in high level languages such as Python are eventually transformed into concrete low-level representation such as NAND is fundamental to computer science.

The astute reader might notice that the above paragraphs only outlined why it should be possible to find for every particular Python-computable function \(F\), a particular comparably efficient NAND program \(P\) that computes \(F\). But this still seems to fall short of our goal of writing a “Python interpreter in NAND” which would mean that for every parameter \(n\), we come up with a single NAND program \(UNIV_n\) such that given a description of a Python program \(P\), a particular input \(x\), and a bound \(T\) on the number of operations (where the length of \(P\), \(x\) and the magnitude of \(T\) are all at most \(n\)) would return the result of executing \(P\) on \(x\) for at most \(T\) steps. After all, the transformation above would transform every Python program into a different NAND program, but would not yield “one NAND program to rule them all” that can evaluate every Python program up to some given complexity. However, it turns out that it is enough to show such a transformation for a single Python program. The reason is that we can write a Python interpreter in Python: a Python program \(U\) that takes a bit string, interprets it as Python code, and then runs that code. Hence, we only need to show a NAND program \(U^*\) that computes the same function as the particular Python program \(U\), and this will give us a way to evaluate all Python programs.

What we are seeing time and again is the notion of universality or self reference of computation, which is the sense that all reasonably rich models of computation are expressive enough that they can “simulate themselves”. The importance of this phenomena to both the theory and practice of computing, as well as far beyond it, including the foundations of mathematics and basic questions in science, cannot be overstated.

Counting programs, and lower bounds on the size of NAND programs

One of the consequences of our representation is the following:

\[|Size(s)| \leq 2^{O(s \log s)}.\] That is, there are at most \(2^{O(s\log s)}\) functions computed by NAND programs of at most \(s\) lines.

Moreover, the implicit constant in the \(O(\cdot)\) notation in Reference:program-count is at most \(10\).By this we mean that for all sufficiently large \(s\), \(|Size(s)|\leq 2^{10s\log s}\). The idea behind the proof is that we can represent every \(s\) line program by a binary string of \(O(s \log s)\) bits. Therefore the number of functions computed by \(s\)-line programs cannot be larger than the number of such strings, which is \(2^{O(s \log s)}\). In the actual proof, given below, we count the number of representations a little more carefully, talking directly about triples rather than binary strings, although the idea remains the same.

Every NAND program \(P\) with \(s\) lines has at most \(3s\) variables. Hence, using our canonical representation, \(P\) can be represented by the numbers \(n,m\) of \(P\)’s inputs and outputs, as well as by the list \(L\) of \(s\) triples of natural numbers, each of which is smaller or equal to \(3s\).

If two programs compute distinct functions then they have distinct representations. So we will simply count the number of such representations: for every \(s' \leq s\), the number of \(s'\)-long lists of triples of numbers in \([3s]\) is \((3s)^{3s'}\), which in particular is smaller than \((3s)^{3s}\). So, for every \(s' \leq s\) and \(n,m\), the total number of representations of \(s'\)-line programs with \(n\) inputs and \(m\) outputs is smaller than \((3s)^{3s}\).

Since a program of at most \(s\) lines has at most \(s\) inputs and outputs, the total number of representations of all programs of at most \(s\) lines is smaller than \[ s\times s \times s \times (3s)^{3s} = (3s)^{3s+3} \label{eqcountbound} \] (the factor \(s\times s\ times s\) arises from taking all of the at most \(s\) options for the number of inputs \(n\), all of the at most \(s\) options for the number of outputs \(m\), and all of the at most \(s\) options for the number of lines \(s'\)). We claim that for \(s\) large enough, the righthand side of \eqref{eqcountbound} (and hence the total number of representations of programs of at most \(s\) lines) is smaller than \(2^{4 s \log s}\). Indeed, we can write \(3s = 2^{\log(3s)}=2^{\log 3 + \log s} \leq 2^{2+\log s}\), and hence the righthand side of \eqref{eqcountbound} is at most \(\left(2^{2+ \log s}\right)^{3s+3} = 2^{(2+\log s)(3s+3)} \leq 2^{4s\log s}\) for \(s\) large enough.

For every function \(F \in Size(s)\) there is a program \(P\) of at most \(s\) lines that computes it, and we can map \(F\) to its representation as a tuple \((n,m,L)\). If \(F \neq F'\) then a program \(P\) that computes \(F\) must have an input on which it disagrees with any program \(P'\) that computes \(F'\), and hence in particular \(P\) and \(P'\) have distinct representations. Thus we see that the map of \(Size(s)\) to its representation is one to one, and so in particular \(|Size(s)|\) is at most the number of distinct representations which is it at most \(2^{4s\log s}\).

We can also establish Reference:program-count directly from the ASCII representation of the source code. Since an \(s\)-line NAND program has at most \(3s\) distinct variables, we can change all the workspace variables of such a program to have the form work_\(\expr{i}\) for \(i\) between \(0\) and \(3s-1\) without changing the function that it computes. This means that after removing comments and extra whitespaces, every line of such a program (which will the form var := var' NAND var'' for variable identifiers which will be either x_###,y_### or work_### where ### is some number smaller than \(3s\)) will require at most, say, \(20 + 3\log_{10} (3s) \leq O(\log s)\) characters. Since each one of those characters can be encoded using seven bits in the ASCII representation, we see that the number of functions computed by \(s\)-line NAND programs is at most \(2^{O(s \log s)}\).

A function mapping \(\{0,1\}^2\) to \(\{0,1\}\) can be identified with the table of its four values on the inputs \(00,01,10,11\); a function mapping \(\{0,1\}^3\) to \(\{0,1\}\) can be identified with the table of its eight values on the inputs \(000,001,010,011,100,101,110,111\). More generally, every function \(F:\{0,1\}^n \rightarrow \{0,1\}\) can be identified with the table of its \(2^n\) values on the inputs \(\{0,1\}^n\). Hence the number of functions mapping \(\{0,1\}^n\) to \(\{0,1\}\) is equal to the number of such tables which (since we can choose either \(0\) or \(1\) for every row) is exactly \(2^{2^n}\). Note that this is double exponential in \(n\), and hence even for small values of \(n\) (e.g., \(n=10\)) the number of functions from \(\{0,1\}^n\) to \(\{0,1\}\) is truly astronomical.“Astronomical” here is an understatement: there are much fewer than \(2^{2^{10}}\) stars, or even particles, in the observable universe. This has the following interesting corollary:

There is a function \(F:\{0,1\}^n\rightarrow \{0,1\}\) such that the shortest NAND program to compute \(F\) requires \(2^n/(100n)\) lines.

Suppose, towards the sake of contradiction, that every function \(F:\{0,1\}^n\rightarrow\{0,1\}\) can be computed by a NAND program of at most \(s=2^n/(100n)\) lines. Then the by Reference:program-count the total number of such functions would be at most \(2^{10s\log s} \leq 2^{10 \log s \cdot 2^n/(100 n)}\). Since \(\log s = n - \log (100 n) \leq n\) this means that the total number of such functions would be at most \(2^{2^n/10}\), contradicting the fact that there are \(2^{2^n}\) of them.

We have seen before that every function mapping \(\{0,1\}^n\) to \(\{0,1\}\) can be computed by an \(O(2^n /n)\) line program. We now see that this is tight in the sense that some functions do require such an astronomical number of lines to compute. In fact, as we explore in the exercises below, this is the case for most functions. Hence functions that can be computed in a small number of lines (such as addition, multiplication, finding short paths in graphs, or even the \(EVAL\) function) are the exception, rather than the rule.

All functions mapping \(n\) bits to \(m\) bits can be computed by NAND programs of \(O(m 2^n/n)\) lines, but most functions cannot be computed using much smaller programs. However there are many important exceptions which are functions such as addition, multiplication, program evaluation, and many others, that can be computed in polynomial time with a small exponent.

The list of triples is not the shortest representation for NAND programs. As we will see in the next lecture, every NAND program of \(s\) lines and \(n\) inputs can be represented by a directed graph of \(s+n\) vertices, of which \(n\) have in-degree zero, and the \(s\) others have in-degree at most two. Using the adjacency list representation, such a graph can be represented using roughly \(2s\log(s+n) \leq 2s (\log s + O(1))\) bits. Using this representation we can reduce the implicit constant in Reference:program-count arbitrarily close to \(2\).

Lecture summary


Which one of the following statements is false:
a. There is an \(O(s^3)\) line NAND program that given as input program \(P\) of \(s\) lines in the list-of-tuples representation computes the output of \(P\) when all its input are equal to \(1\).
b. There is an \(O(s^3)\) line NAND program that given as input program \(P\) of \(s\) characters encoded as a string of \(7s\) bits using the ASCII encoding, computes the output of \(P\) when all its input are equal to \(1\).
c. There is an \(O(\sqrt{s})\) line NAND program that given as input program \(P\) of \(s\) lines in the list-of-tuples representation computes the output of \(P\) when all its input are equal to \(1\).

For every \(k\), show that there is an \(O(k)\) line NAND program that computes the function \(EQUALS_k:\{0,1\}^{2k} \rightarrow \{0,1\}\) where \(EQUALS(x,x')=1\) if and only if \(x=x'\).

Suppose \(n>1000\) and that we choose a function \(F:\{0,1\}^n \rightarrow \{0,1\}\) at random, choosing for every \(x\in \{0,1\}^n\) the value \(F(x)\) to be the result of tossing an independent unbiased coin. Prove that the probability that there is a \(2^n/(1000n)\) line program that computes \(F\) is at most \(2^{-100}\).Hint: An equivalent way to say this is that you need to prove that the set of functions that can be computed using at most \(2^n/(1000n)\) has fewer than \(2^{-100}2^{2^n}\) elements. Can you see why?

Prove that there is a constant \(c\) such that for every \(n\), there is some function \(F:\{0,1\}^n \rightarrow \{0,1\}\) s.t. (1) \(F\) can be computed by a NAND program of at most \(c n^5\) lines, but (2) \(F\) can not be computed by a NAND program of at most \(n^4 /c\) lines.Hint: Find an approriate value of \(t\) and a function \(G:\{0,1\}^t \rightarrow \{0,1\}\) that can be computed in \(O(2^t/t)\) lines but can’t be computed in \(\Omega(2^t/t)\) lines, and then extend this to a function mapping \(\{0,1\}^n\) to \(\{0,1\}\).

Bibliographical notes

Further explorations

Some topics related to this lecture that might be accessible to advanced students include: