Introduction to Theoretical Computer Science — Boaz Barak

# Loops and infinity

• Learn the model of NAND++ program that involve loops.
• See some basic syntactic sugar for NAND++
• Get comfort with switching between representation of NAND++ programs as code and as tuples.
• Learn the notion of configurations for NAND++ programs.
• Understand the relation between NAND++ and NAND programs.

“We thus see that when $$n=1$$, nine operation-cards are used; that when $$n=2$$, fourteen Operation-cards are used; and that when $$n>2$$, twenty-five operation-cards are used; but that no more are needed, however great $$n$$ may be; and not only this, but that these same twenty-five cards suffice for the successive computation of all the numbers”, Ada Augusta, countess of Lovelace, 1843Translation of “Sketch of the Analytical Engine” by L. F. Menabrea, Note G.

“It is found in practice that (Turing machines) can do anything that could be described as “rule of thumb” or “purely mechanical”… (Indeed,) it is now agreed amongst logicians that “calculable by means of (a Turing Machine)” is the correct accurate rendering of such phrases.”, Alan Turing, 1948

The NAND programming language has one very significant drawback: a finite NAND program $$P$$ can only compute a finite function $$F$$, and in particular the number of inputs of $$F$$ is always smaller than the number of lines of $$P$$. This does not capture our intuitive notion of an algorithm as a single recipe to compute a potentially infinite function. For example, the standard elementary school multiplication algorithm is a single algorithm that multiplies numbers of all lengths, but yet we cannot express this algorithm as a single NAND program, but rather need a different NAND program for every input length.

Let us consider the case of the simple parity or XOR function $$XOR:\{0,1\}^* \rightarrow \{0,1\}$$, where $$XOR(x)$$ equals $$1$$ iff the number of $$1$$’s in $$x$$ is odd. As simple as it is, the $$XOR$$ function cannot be computed by a NAND program. Rather, for every $$n$$, we can compute $$XOR_n$$ (the restriction of $$XOR$$ to $$\{0,1\}^n$$) using a different NAND program. For example, here is the NAND program to compute $$XOR_5$$:

u   := x_0 NAND x_1
v   := x_0 NAND u
w   := x_1 NAND u
s   := v   NAND w
u   := s   NAND x_2
v   := s   NAND u
w   := x_2 NAND u
s   := v   NAND w
u   := s   NAND x_3
v   := s   NAND u
w   := x_3 NAND u
s   := v   NAND w
u   := s   NAND x_4
v   := s   NAND u
w   := x_4 NAND u
y_0 := v   NAND w

This is rather repetitive, and more importantly, does not capture the fact that there is a single algorithm to compute the parity on all inputs. Typical programming language use the notion of loops to express such an algorithm, and so we might have wanted to use code such as:

# s is the "running parity", initalized to 0
while i < length(x):
u   := x_i NAND s
v   := s   NAND u
w   := x_i NAND u
s  := v   NAND w
i++
ns  := s  NAND s
y_0 := ns NAND ns

We will now discuss how we can extend the NAND programming language so that it can capture this kind of a construct.

## The NAND++ Programming language

Keeping to our minimalist form, we will not add a while keyword to the NAND programming language. But we will extend this language in a way that allows for executing loops and accessing arrays of arbitrary length.
The main new ingredients are the following:

• We add a special variable loop with the following semantics: after executing the last line of the program, if loop is equal to one, then instead of halting, the program goes back to the first line. If loop is equal to zero after executing the last line then the program halts as is usual with NAND.This corresponds to wrapping the entire program in one big loop that is executed at least once and continues as long as loop is equal to $$1$$. For example, in the C programming language this would correspond with wrapping the entire program with the construct do { ...} while (loop);.
• We add a special integer valued variable i, and allow expressions of the form foo_i (for every variable identifier foo) which are evaluated to equal foo_$$\expr{i}$$ (where $$\expr{i}$$ denotes the current value of the variable i). For example, if the current value of i is equal to 15, then foo_i corresponds to foo_15.Note that the variable i, like all variables in NAND, is a global variable, and hence all expressions of the form foo_i, bar_i etc. refer to the same value of i. In the first loop of the program, i is assigned the value $$0$$, but each time the program loops back to the first line, the value of i is updated in the following manner: in the $$k$$-th iteration the value of i equals $$I(k)$$ where $$I=(I(0),I(1),I(2),\ldots)$$ is the following sequence (see Reference:indextimefig):

$0,1,0,1,2,1,0,1,2,3,2,1,0,\ldots$

• Because the input to NAND++ programs can have variable length, we also add a special read-only array validx such that validx_$$\expr{n}$$ is equal to $$1$$ if and only if the $$n$$ is smaller than the length of the input. In particular, validx_i will equal to $$1$$ if and only if the value of i is smaller than the length of the input.
• Like NAND programs, the output of a NAND++ program is the string y_$$0$$, $$\ldots$$, y_$$\expr{k}$$ where $$k$$ is the largest integer such that y_$$\expr{k}$$ was assigned a value.To allow control of the output length, we also add a write-only array invalidy. If there exist $$j<k$$ such that invalidy_$$\expr{j}$$=1 then we reduce the output length to $$j-1$$. However, we will hardly use this array in this course, since we will almost always be interested in programs with a fixed output length (and in fact most often in programs with one bit of output).

See the appendix for a more formal specification of the NAND++ programming language, and the website http://nandpl.org for an implementation. Here is the NAND++ program to compute parity of arbitrary length: (It is a good idea for you to see why this program does indeed compute the parity)

# compute sum x_i (mod 2)
# s = running parity
# seen_i = 1 if this index has been seen before

# Do val := (NOT seen_i) AND x_i
tmp_1  := seen_i NAND seen_i
tmp_2  := x_i NAND tmp_1
val   :=  tmp_2 NAND tmp_2

# Do s := s XOR val
ns   := s   NAND s
y_0  := ns  NAND ns
u    := val NAND s
v    := s   NAND u
w    := val NAND u
s    := v   NAND w

seen_i := zero NAND zero
stop := validx_i NAND validx_i
loop := stop     NAND stop

When we invoke this program on the input $$010$$, we get the following execution trace:

... (complete this here)
End of iteration 0, loop = 1, continuing to iteration 1
...
End of iteration 2, loop = 0, halting program  

### Computing the index location

We say that a NAND program completed its $$r$$-th round when the index variable i completed the sequence:

$0,1,0,1,2,1,0,1,2,3,2,1,0,\ldots,0,1,\ldots,r,r-1,\ldots,0$

This happens when the program completed

$1+2+4+6+\cdots+2r =r^2 +r + 1$

iterations of its main loop. (The last equality is obtained by applying the formula for the sum of an arithmetic progression.) This means that if we keep a “loop counter” $$k$$ that is initially set to $$0$$ and increases by one at the end of any iteration, then the “round” $$r$$ is the largest integer such that $$r(r+1) \leq k$$, which (as you can verify) equals $$\floor{\sqrt{k+1/4}-1/2}$$.

Thus the value of i in the $$k$$-th loop equals:

$index(k) = \begin{cases} k- r(r+1) & k \leq (r+1)^2 \\ (r+1)(r+2)-k & \text{otherwise} \end{cases} \label{eqindex}$

where $$r= \floor{\sqrt{k+1/4}-1/2}$$. (We ask you to prove this in Reference:computeidx-ex.)

In NAND we allowed variables to have names such as foo_17 but the numerical part of the identifier played essentially the same role as alphabetical part. In particular, NAND would be just as powerful if we didn’t allow any numbers in the variable identifiers. With the introduction of the special index variable i, in NAND++ things are different. It is best to think of each NAND++ variable foo as an array, with its $$j$$-th position corresponding to foo_$$\expr{j}$$ (which in other programming languages would often be written as foo[$$\expr{j}$$]). Recall also our convention that a variable without an index such as bar is equivalent to bar_0, or the first position of the corresponding array. Of course we can think of variables as arrays in NAND as well, but since in NAND all indices are absolute numerical constants, this viewpoint does not make much of a difference as it does in NAND++.

### Infinite loops and computing a function

One crucial difference between NAND and NAND++ programs is the following. Looking at a NAND program $$P$$, we can always tell how many inputs and how many outputs it has (by simply counting the number of x_ and y_ variables). Furthermore, we are guaranteed that if we invoke $$P$$ on any input then some output will be produced.
In contrast, given any particular NAND++ program $$P'$$, we cannot determine a priori the length of the output. In fact, we don’t even know if an output would be produced at all! For example, the following NAND++ program would go into an infinite loop if the first bit of the input is zero:

loop := x_0 NAND x_0

For a NAND++ program $$P$$ and string $$x\in \{0,1\}^*$$, if $$P$$ produces an output when executed with input $$x$$ then we denote this output by $$P(x)$$. If $$P$$ does not produce an output on $$x$$ then we say that $$P(x)$$ is undefined and denote this as $$P(x) = \bot$$.

We say that a NAND++ program $$P$$ computes a function $$F:\{0,1\}^* :\rightarrow \{0,1\}^*$$ if $$P(x)=F(x)$$ for every $$x\in \{0,1\}^*$$.

If $$F$$ is a partial function then we say that $$P$$ computes $$F$$ if $$P(x)=F(x)$$ for every $$x$$ on which $$F$$ is defined.

We say that a function $$F$$ is NAND++ computable if there is a NAND++ program that computes it.

We will often drop the “NAND++” qualifier and simply call a function computable if it is NAND++ computable. This may seem “reckless” but, as we’ll see in future lectures, it turns out that being NAND++-computable is equivalent to being computable in essentially any reasonable model of computation.

If $$F:\{0,1\}^* \rightarrow \{0,1\}$$ is a Boolean function, then computing $$F$$ is equivalent to deciding membership in the set $$L=\{ x\in \{0,1\}^* \;|\; F(x)=1 \}$$. Subsets of $$\{0,1\}^*$$ are known as languages in the literature. Such a language $$L \subseteq \{0,1\}^*$$ is known as decidable or recursive if the corresponding function $$F$$ is computable.

## A spoonful of sugar

Just like NAND, we can add a bit of “syntactic sugar” to NAND++ as well. These are constructs that can help us in expressing programs, though ultimately do not change the computational power of the model, since any program using these constructs can be “unsweetened” to obtain a program without them.

### Inner loops via syntactic sugar

While NAND+ only has a single “outer loop”, we can use conditionals to implement inner loops as well. That is, we can replace code such as

PRELOOP_CODE
while (cond) {
LOOP_CODE
}
POSTLOOP_CODE

by

// startedloop is initialized to 0
// finishedloop is initalized to 0
if NOT(startedloop)  {
PRELOOP_CODE
startedloop := 1
temploop := loop
}
if NOT(finishedloop) {
if (cond) {
LOOP_CODE
loop :=1
}
if NOT(cond) {
finishedloop := 1
loop := temploop
}
}
if (finishedloop) {
POSTLOOP_CODE
}

(Applying the standard syntactic sugar transformations to convert the conditionals into NAND code.) We can apply this transformation repeatedly to convert programs with multiple loops, and even nested loops, into a standard NAND++ program.

Please stop and verify that you understand why this transformation will result in a program that computes the same function as the original code with an inner loop.

### Controlling the index variable

NAND++ is an oblivious programming model, in the sense that it gives us no means of controlling the index variable i. Rather to read, for example, the 1017-th index of the array foo (i.e., foo_1017) we need to wait until i will equal $$1017$$.Note that we can use variables with absolute numerical indices in the program, but they can only let us access a fixed number of locations (in particular smaller than the number of lines in the program). Since in NAND++ we typically think of inputs that are much longer than the number of lines, in general we will have to use the index variable i to access most of the memory locations. However we can use syntactic sugar to simulate the effect of incrementing and decrementing i. That is, rather than having i move according to a fixed schedule, we can assume that we have the operation i++ (foo) that increments i if foo is equal to $$1$$ (and otherwise leaves i in place), and similarly the operation i-- (bar) that decrements i if bar is $$1$$ and otherwise leaves i in place.

To achieve this, we start with the observation that in a NAND++ program we can know whether the index is increasing or decreasing. We achieve this using the Hansel and Gretel technique of leaving “breadcrumbs”. Specifically, we create an array atstart such that atstart_0 equals $$1$$ but atstart_$$\expr{j}$$ equals $$0$$ for all $$j>0$$, and an array breadcrumb where we set breadcrumb_i to $$1$$ in every iteration. Then we can setup a variable indexincreasing and set it to $$1$$ when we reach the zero index (i.e., when atstart_i is equal to $$1$$) and set it to $$0$$ when we reach the end point (i.e., when we see an index for which breadcrumb_i is $$0$$ and hence we have reached it for the first time). We can also maintain an array arridx that contains $$0$$ in all positions except the current value of i.

Now we can simulate incrementing and decrementing i by one by simply waiting until our desired outcome happens naturally. (This is similar to the observation that a bus is like a taxi if you’re willing to wait long enough.) That is, if we want to increment i and indexincreasing equals $$1$$ then we simply wait one step. Otherwise (if indexincreasing is $$0$$) then we go into an inner loop in which we do nothing until we reach again the point when arridx_i is $$1$$ and indexincreasing is equal to $$1$$. Decrementing i is done in the analogous way.It can be verified that this transformation converts a program with $$T$$ steps that used the i++ (foo) and i-- (bar) operations into a program with $$O(T^2)$$ that doesn’t use them.

### “Simple” NAND++ programs

When analyzing NAND++ programs, it will sometimes be convenient for us to restrict our attention to programs of a somewhat nicer form.

We say that a NAND++ program $$P$$ is simple if it has the following properties:

• The only output variable it ever writes to is y_0 (and so it computes a Boolean function).
• The last line of the program has the form halted := loop NAND loop and so the variable halted gets the value $$1$$ when the program halts. Moreover, there is no other line in the program that writes to the variable halted.
• All lines that write to the variable loop or y_0 are “guarded” by halted in the sense that we replace a line of the form y_0 := foo NAND bar with the (unsweetened equivalent to) if NOT(halted) { y_0 := foo NAND bar } and similarly loop := blah NAND baz is replaced with if NOT(halted) {loop := blah NAND baz }.
• It has an indexincreasing variable that is equal to $$1$$ if and only if in the next iteration the value of i will increase by $$1$$.
• It contains variables zero and one that are initialized to be $$0$$ and $$1$$ respectively, by having the first line be one := zero NAND zero and having no other lines that assign values to them.

Note that if $$P$$ is a simple program then even if we continue its execution beyond the point it should have halted in, the value of the y_0 and loop variables will not change. The following theorem shows that, in the context of Boolean functions, we can assume that every program is simple:The restriction to Boolean functions is not very significant, as we can always encode a non Boolean function $$F:\{0,1\}^* \rightarrow \{0,1\}^*$$ by the Boolean function $$G(x,i)=F(x)_i$$ where we treat the second input $$i$$ as representing an integer. The crucial point is that we still allow the functions to have an unbounded input length and hence in particular these are functions that cannot be computed by plain “loop less” NAND programs.

Let $$F:\{0,1\}^* \rightarrow \{0,1\}$$ be a (possibly partial) Boolean function. If there is a NAND++ program that computes $$F$$ then there is a simple NAND++ program $$P'$$ that computes $$F$$ as well.

We only sketch the proof, leaving verifying the full details to the reader. We prove the theorem by transforming the code of the program $$P$$ to achieve a simple program $$P'$$ without modifying the functionality of $$P$$. If $$P$$ computes a Boolean function then it cannot write to any y_$$\expr{j}$$ variable other than y_0. If $$P$$ already used a variable named halted then we rename it. We then we add the line halted := loop NAND loop to the end of the program, and replace all lines writing to the variables y_0 and loop with their “guarded” equivalents. Finally, we ensure the existence of the variable indexincreasing using the “breadcrumbs” technique discussed above.

## Uniformity, and NAND vs NAND++

While NAND++ adds an extra operation over NAND, it is not exactly accurate to say that NAND++ programs are “more powerful” than NAND programs. NAND programs, having no loops, are simply not applicable for computing functions with more inputs than they have lines. The key difference between NAND and NAND++ is that NAND++ allows us to express the fact that the algorithm for computing parities of length-$$100$$ strings is really the same one as the algorithm for computing parities of length-$$5$$ strings (or similarly the fact that the algorithm for adding $$n$$-bit numbers is the same for every $$n$$, etc.). That is, one can think of the NAND++ program for general parity as the “seed” out of which we can grow NAND programs for length $$10$$, length $$100$$, or length $$1000$$ parities as needed. This notion of a single algorithm that can compute functions of all input lengths is known as uniformity of computation and hence we think of NAND++ as uniform model of computation, as opposed to NAND which is a nonuniform model, where we have to specify a different program for every input length.

Looking ahead, we will see that this uniformity leads to another crucial difference between NAND++ and NAND programs. NAND++ programs can have inputs and outputs that are longer than the description of the program and in particular we can have a NAND++ program that “self replicates” in the sense that it can print its own code.
This notion of “self replication”, and the related notion of “self reference” is crucial to many aspects of computation, as well of course to life itself, whether in the form of digital or biological programs.

### Growing a NAND tree

If $$P$$ is a NAND++ program and $$n,T\in \N$$ are some numbers, then we can easily obtain a NAND program $$P'=expand_{T,n}(P)$$ that, given any $$x\in \{0,1\}^n$$, runs $$T$$ loop iterations of the program $$P$$ and outputs the result. If $$P$$ is a simple program, then we are guaranteed that, if $$P$$ does not enter an infinite loop on $$x$$, then as long as we make $$T$$ large enough, $$P'(x)$$ will equal $$P(x)$$. To obtain the program $$P'$$ we can simply place $$T$$ copies of the program $$P$$ one after the other, doing a “search and replace” in the $$k$$-th copy of any instances of _i with the value $$index(k)$$, where the function $$index$$ is defined as in \eqref{eqindex}. For example, Reference:expandnandpng illustrates the expansion of the NAND++ program for parity.

We can also obtain such an expansion by using the for .. do { .. } syntactic sugar. For example, the NAND program below corresponds to running the parity program for 17 iterations, and computing $$XOR_5:\{0,1\}^5 \rightarrow \{0,1\}$$. Its standard “unsweetened” version will have $$17 \cdot 10$$ lines.This is of course not the most efficient way to compute $$XOR_5$$. Generally, the NAND program to compute $$XOR_n$$ obtained by expanding out the NAND++ program will require $$\Theta(n^2)$$ lines, as opposed to the $$O(n)$$ lines that is possible to achieve directly in NAND. However, in most cases this difference will not be so crucial for us.

for i in [0,1,0,1,2,1,0,1,2,3,2,1,0,1,2,3,4] do {
tmp1  := seen_i NAND seen_i
tmp2  := x_i NAND tmp1
val   :=  tmp2 NAND tmp2
ns   := s   NAND s
y_0  := ns  NAND ns
u    := val NAND s
v    := s   NAND u
w    := val NAND u
s    := v   NAND w
seen_i := zero NAND zero
}

In particular we have the following theorem

For every simple NAND++ program $$P$$ and function $$F:\{0,1\}^* \rightarrow \{0,1\}$$, if $$P$$ computes $$F$$ then for every $$n\in\N$$ there exists $$T\in \N$$ such that $$expand_{T,n}(P)$$ computes $$F_n$$.

# Expand a NAND++ program and a given time bound T and n to an n-input T-line NAND program
def expand(P,T,n):
result = ""

for k in range(T):
i=index(k)
validx = ('one' if i<n else 'zero')
result += P.replace('validx_i',validx).replace('x_i',('x_i' if i<n else 'zero')).replace('_i','_'+str(i))

return result

def index(k):
r = math.floor(math.sqrt(k+1/4)-1/2)
return (k-r*(r+1) if k <= (r+1)*(r+1) else (r+1)*(r+2)-k)

We’ll start with a “proof by code”. Above is a Python program expand to compute $$expand_{T,n}(P)$$. On input the code $$P$$ of a NAND++ program and numbers $$T,n$$, expand outputs the code of the NAND program $$P'$$ that works on length $$n$$ inputs and is obtained by running $$T$$ iterations of $$P$$:

If the original program had $$s$$ lines, then for every $$\ell \in [sT]$$, line $$\ell$$ in the output of expand(P,T,n) corresponds exactly to the line executed in step $$\ell$$ of the execution $$P(x)$$.In the notation above (as elsewhere), we index both lines and steps from $$0$$. Indeed, in step $$\ell$$ of the execution of $$P(x)$$, the line executed is $$k=\ell \bmod s$$, and line $$\ell$$ in the output of expand(P,T,n) is a copy of line $$k$$ in $$P$$. If that line involved unindexed variables, then it is copied as is in the returned program result. Otherwise, if it involved the index _i then we replace i with the current value of $$i$$. Moreover, we replace the variable validx_i with either one or zero depending on whether $$i < n$$.

Now, if a simple NAND++ program $$P$$ computes some function $$F:\{0,1\}^* \rightarrow \{0,1\}$$, then for every $$x\in \{0,1\}^*$$ there is some number $$T_P(x)$$ such that on input $$x$$ halts within $$T(x)$$ iterations of its main loop and outputs $$F(x)$$. Moreover, since $$P$$ is simple, even if we run it for more iterations than that, the output value will not change. For every $$n \in \N$$, define $$T_P(n) = \max_{x\in \{0,1\}^n} T(x)$$. Then $$P'=expand_{T_P(n),n}(P)$$ computes the function $$F_n:\{0,1\}^n \rightarrow \{0,1\}$$ which is the restriction of $$F$$ to $$\{0,1\}^n$$.

## NAND++ Programs as tuples

Just like we did with NAND programs, we can represent NAND++ programs as tuples. A minor difference is that since in NAND++ it makes sense to keep track of indices, we will represent a variable foo_$$\expr{j}$$ as a pair of numbers $$(a,j)$$ where $$a$$ corresponds to the identifier foo. Thus we will use a 6-tuple of the form $$(a,j,b,k,c,\ell)$$ to represent each line of the form foo_$$\expr{j}$$ := bar_$$\expr{k}$$ NAND baz_$$\expr{\ell}$$, where $$a,b,c$$ correspond to the variable identifiers foo, bar and baz respectively.This difference between three tuples and six tuples is made for convenience and is not particularly important. We could have also represented NAND programs using six-tuples and NAND++ using three-tuples. Also recall that we use the convention that an unindexed variable identifier foo is equivalent to foo_0. If one of the indices is the special variable i then we will use the number $$s$$ for it where $$s$$ is the number of lines (as no index is allowed to be this large in a NAND++ program). We can now define NAND++ programs in a way analogous to Reference:NANDprogram:

A NAND++ program is a 6-tuple $$P=(V,X,Y,VALIDX,LOOP,L)$$ of the following form:

• $$V$$ (called the variable identifiers) is some finite set.
• $$X\in V$$ is called the input identifier.
• $$Y\in V$$ is called the output identifier.
• $$VALIDX \in V$$ is the input length identifier.
• $$LOOP \in V$$ is the loop variable.
• $$L \in (V\times [s+1] \times V \times [s+1] \times V \times [s+1])^*$$ is a list of 6-tuples of the form $$(a,j,b,k,c,\ell)$$ where $$a,b,c \in V$$ and $$j,k,\ell \in [s+1]$$ for $$s=|L|$$. That is, $$L= ( (a_0,j_0,b_0,k_0,c_0,\ell_0),\ldots,(a_{s-1},j_{s-1},b_{s-1},k_{s-1},c_{k-1},\ell_{s-1}))$$ where for every $$t\in \{0,\ldots, s-1\}$$, $$a_t,b_t,c_t \in V$$ and $$j_t,k_t,\ell_t \in [s+1]$$. Moreover $$a_t \not\in \{X,VALIDX\}$$ for every $$t\in [s]$$ and $$b_t,c_t \not\in \{ Y,LOOP\}$$ for every $$t \in [s]$$.

This definition is long but ultimately translating a NAND++ program from code to tuples can be done in a fairly straightforward way. Please read the definition again to see that you can follow this transformation. Note that there is a difference between the way we represent NAND++ and NAND programs. In NAND programs, we used a different element of $$V$$ to represent, for example, x_17 and x_35. For NAND++ we will represent these two variables by $$(X,17)$$ and $$(X,35)$$ respectively where $$X$$ is the input identifier. For this reason, in our definition of NAND++, $$X$$ is a single element of $$V$$ as opposed to a tuple of elements as in Reference:NANDprogram. For the same reason, $$Y$$ is a single element and not a tuple as well.

Just as was the case for NAND programs, we can define a canonical form for NAND++ variables. Specifically in the canonical form we will use $$V=[t]$$ for some $$t>3$$, $$X=0$$,$$Y=1$$,$$VALIDX=2$$ and $$LOOP=3$$. Moreover, if $$P$$ is simple in the sense of Reference:simpleNANDpp then we will assume that the halted variable is encoded by $$4$$, and the indexincreasing variable is encoded by $$5$$. The canonical form representation of a NAND++ program is specified simply by a length $$s$$ list of $$6$$-tuples of natural numbers $$(a,j,b,k,c,\ell)$$ where $$a,b,c \in [t]$$ and $$j,k,\ell \in [s+1]$$.

Here is a Python code to evaluate a NAND++ program given the list of 6-tuples representation:

# Evaluates a  NAND++ program P on input x
# P is given in the list of tuples representation
# untested code
def EVALpp(P,x):
vars = { 0:x , 2: [1]*len(x) } # vars[var][idx] is value of var_idx.
# special variables: 0:X, 1:Y, 2:VALIDX, 3:LOOP
t = len(P)

def index(k): # compute i at loop j
r = math.floor(math.sqrt(k+1/4)-1/2)
return (k-r*(r+1) if k <= (r+1)*(r+1) else (r+1)*(r+2)-k)

def getval(var,idx): # returns current value of var_idx
if idx== t: idx = index(k)
l = vars.getdefault(var,[])
return l[idx] if idx<len(l) else 0

def setval(var,idx,v): # sets var_idx := v
l = vars.setdefault(var,[])
l.append([0]*(1+idx-len(l)))
l[idx]=v
vars[var] = l

k = 0
while True:
for t in P:
setval(t[0],t[1], 1-getval(t[2],t[3])*getval(t[4],t[5]))
if not getval(3,0): break
k += 1

return vars[1]

### Configurations

Just like we did for NAND programs, we can define the notion of a configuration and a next step function for NAND++ programs. That is, a configuration of a program $$P$$ records all the state of $$P$$ at a given point in the execution, and contains everything we need to know in order to continue from this state. The next step function of $$P$$ maps a configuration of $$P$$ into the configuration that occurs after executing one more line of $$P$$.

Before reading onwards, try to think how you would define the notion of a configuration of a NAND++ program.

While we can define configurations in full generality, for concreteness we will restrict our attention to configurations of “simple” programs NAND++ programs in the sense of Reference:simpleNANDpp, that are given in a canonical form. Let $$P$$ be a canonical form simple program, represented as a list of $$6$$ tuples $$L=((a_0,j_0,b_0,k_0,c_0,\ell_0),\ldots,(a_{s-1},j_{s-1},b_{s-1},k_{s-1},c_{s-1},\ell_{s-1}))$$. Let $$s$$ be the number of lines and $$t$$ be one more than the largest number appearing among the $$a$$’s, $$b$$’s or $$c$$’s.

Just like we did for NAND, a configuration of the program $$P$$ will denote the current line being executed and the current value of all variables. For our convenience we will use a somewhat different encoding than we did for NAND. We will encode the configuration as a string $$\sigma \in \{0,1\}^*$$, which is composed of blocks, that is, $$\sigma$$ will be the concatenation of $$\sigma^0,\ldots,\sigma^{r-1}$$ for some $$r\in \N$$ (that will represent the maximum among $$n-1$$, where $$n$$ is the input length, the largest numerical index appearing in the program, and the largest index that the program has ever reached in the execution). Each block $$\sigma^i$$ will be a string of length $$B$$ (for some constant $$B$$ depending on $$t,s$$) that encodes the following:

• The values of variables indexed by $$i$$ (e.g., foo_$$\expr{i}$$, bar_$$\expr{i}$$, etc.).
• Whether or not the block is “active” (i.e., whether the current value of the index variable i is $$i$$), and in the latter case, the current line that is being executed.
• Whether this is the first or last block.

For the sake of completeness, we will describe below precisely how configurations of NAND++ programs and the next-step function are defined. However, the details are as important as the high level points, which are the following: A configuration encodes all the information of the state of the program at a given step in the computation, including the values of all variables (both the Boolean variables and the special index variable i) and the current line number that is to be executed. The next step function of a program $$P$$ updates that configuration by computing one line of the program, and updating the value of the variable that is assigned a value in this program. The variables involved in that line either have absolute numerical indices (in which case they are encoded in one of the first $$s$$ blocks, as numerical indices can’t be larger than the number of lines) or are indexed by the special variable i (in which case they are encoded in the active block). If the line is the last one in the program, the next step function also determines whether to halt based on the loop variable, and updates the active block based on whether the index will be increasing or decreasing.

We now describe a precise encoding for the configurations of a NAND++ program. Many of the choices below are made for convenience and other choices would be just as valid. We will think of encoding each block as using the alphabet $$\Sigma = \{ \mathtt{BB}, \mathtt{EB}, 0 , 1 \}$$. ($$\mathtt{BB}$$ and $$\mathtt{EB}$$ stand for “begin block” and “end block” respectively; we can later encode this as a binary string using the map $$0 \mapsto 00, 1\mapsto 11, \mathtt{BB} \mapsto 01, \mathtt{EB} \mapsto 10$$.) In this alphebt $$\Sigma$$, every block $$\sigma^i$$ will have the form

$\sigma^i = \mathtt{BB}\;\hat{\sigma}^{i} \; first \; last \; active \; p \; \mathtt{EB}$

where $$\hat{\sigma}^i$$ is a string in $$\{0,1\}^t$$ that encodes the values of all the variables in the program indexed by $$i$$. That is, the $$a$$-th coordinate of $$\hat{\sigma}^i$$ corresponds to the value of the variable represented by $$(a,i)$$. For example, if we encode foo by the number $$11$$ then $$\hat{\sigma}^{17}_{11}$$ corresponds to the value of foo_17 at the given point in the execution. We use the same indexing of variables as in representations and so in particular coordinates $$0,1,2,3,4,5$$ of $$\hat{\sigma}^i$$ correspond to the variables x_i,y_i,validx_i,loop_i,halted_i,indexincreasing_i respectively.Recall that we identify an unindexed variable identifier such as foo with foo_0, and so in particular the values of loop, halted and indexincreasing are encoded in the block $$\sigma^0$$.

The values $$active$$, $$first$$, and $$last$$ are each bits that are set to $$1$$ or $$0$$ depending on whether the current block is active (i.e. the current value of i is $$i$$), is the first block in the configuration and the last block, respectively. The parameter $$p$$ is a string in $$\{0,1\}^{\ceil{\log(s+1)}}$$, which (via the binary representation) we think of also as number in $$[s+1]$$. The value of $$p$$ is equal to the current line that is about to be executed if the block is active, and to $$0$$ if the block is not active. If $$p=s$$ then this means that we have halted.

Note that in the alphabet $$\Sigma$$, our encoding takes $$2$$ symbols for $$\mathtt{BB}$$ and $$\mathtt{EB}$$, $$t$$ symbols for $$\hat{\sigma}^i$$, three symbols for $$first$$,$$last$$,$$active$$, and $$\log \ceil{s+1}$$ symbols for encoding $$p$$. Hence in the binary alphabet, each block $$\sigma^i$$ will be encoded as a string of length $$B=2(5+t+\log(\ceil{s+1}))$$ bits, and a configuration will be encoded as a binary string of length $$(r+1)B$$ where $$r$$ is the largest index that the variable i has reached so far in the execution. See Reference:configurationsnandpppng for an illustration of the configuration.

For a simple $$s$$-line $$t$$-variable NAND++ program $$P$$ the next configuration function $$NEXT_P:\{0,1\}^* \rightarrow \{0,1\}^*$$ is defined in the natural way.We define $$NEXT_P$$ as a partial function, that is only defined on strings that are valid encoding of a configuration, and in particular have only a single block with its active bit set, and where the initial and final bits are also only set for the first and last block respectively. It is of course possible to extend $$NEXT_P$$ to be a total function by defining it on invalid configurations in some way. That is, on input a configuration $$\sigma$$, one can compute $$\sigma'=NEXT_P(\sigma)$$ as follows:

1. Scan the configuration $$\sigma$$ to find the index $$i$$ of the active block (block where the active bit is set to $$1$$) and the current line $$p$$ that needs to be executed (which is enc). We denote the new active block and current line in the configuration $$\sigma'$$ by $$(i',p')$$.
2. If $$p=s$$ then this $$\sigma$$ a halting configuration and $$NEXT_p(\sigma) = \sigma$$. Otherwise we continue to the following steps:
3. Execute the line $$p$$: if the $$p$$-th tuple in the program is $$(a,j,b,k,c,\ell)$$ then we update $$\sigma$$ to $$\sigma'$$ based on the value of this program. That is, in the configuration $$\sigma'$$, we encode the value of of the variable corresponding to $$(a,j)$$ as the NAND of the values of variables corresponding to $$(b,k)$$ and $$(c,\ell)$$.Recall that according to the way we represent NAND++ programs as 6-tuples, if $$a$$ is the number corresponding to the identifier foo then $$(a,j)$$ corresponds to foo_$$\expr{j}$$ if $$j<s$$, and corresponds to foo_$$\expr{i}$$ if $$j=s$$ where $$i$$ is the current value of the index variable i.
4. Updating the value of $$i$$: if $$p=s-1$$ (i.e., $$p$$ corresponds to the last line of the program), then we check whether the value of the loop or loop_0 variable (which by our convention is encoded as the variable with index $$3$$ in the first block) and if so set in $$\sigma'$$ the value $$p'=s$$ which corresponds to a halting configuration. Otherwise, $$i$$ is either incremented and decremented based on indexincreasing (which we can read from the first block). That is, we let $$i'$$ be either $$i+1$$ and $$i-1$$ based on indexincreasing and modify the active block in $$\sigma'$$ to be $$i'$$. (If $$i$$ is the final block and $$i'=i+1$$ then we create a new block and mark it to be the last one.)
5. We update $$p'= p+1 \mod s$$, and encode $$p'$$ in the active block of $$\sigma'$$.

One important property of $$NEXT_P$$ is that to compute it we only need to access the blocks $$0,\ldots,s-1$$ (since the largest absolute numerical index in the program is at most $$s-1$$) as well as the current active block and its immediate neighbors. Thus in each step, $$NEXT_P$$ only reads or modifies a constant number of blocks.

Here is some Python code for the next step function:

# compute the next-step configuration
# Inputs:
# P: NAND++ program in list of 6-tuples representation  (assuming it has an "indexincreasing" variable)
# conf: encoding of configuration as a string using the alphabet "B","E","0","1".
def next_step(P,conf):
s = len(P) # numer of lines
t = max([max(tup[0],tup[2],tup[4]) for tup in P])+1 # number of variables
line_enc_length = math.ceil(math.log(s+1,2)) # num of bits to encode a line
block_enc_length = t+3+line_enc_length # num of bits to encode a block (without bookends of "E","B")
LOOP = 3
INDEXINCREASING = 5
ACTIVEIDX = block_enc_length -line_enc_length-1 # position of active flag
FINALIDX =  block_enc_length  -line_enc_length-2 # position of final flag

def getval(var,idx):
if idx<s: return int(blocks[idx][var])
return int(active[var])

def setval(var,idx,v):
nonlocal blocks, i
if idx<s: blocks[idx][var]=str(v)
blocks[i][var]=str(v)

blocks = [list(b[1:]) for b in conf.split("E")[:-1]] # list of blocks w/o initial "B" and final "E"

i = [j for j in range(len(blocks))  if blocks[j][ACTIVEIDX]=="1" ][0]
active = blocks[i]

p = int("".join(active[-line_enc_length:]),2) # current line to be executed

if p==s: return conf # halting configuration

(a,j,b,k,c,l) = P[p] #  6-tuple corresponding to current line#  6-tuple corresponding to current line
setval(a,j,1-getval(b,k)*getval(c,l))

new_p = p+1
new_i = i
if p==s-1: # last line
new_p = (s if getval(LOOP,0)==0 else 0)
new_i = (i+1 if getval(INDEXINCREASING,0) else i-1)
if new_i==len(blocks): # need to add another block and make it final
blocks[len(blocks)-1][FINALIDX]="0"
new_final = ["0"]*block_enc_length
new_final[FINALIDX]="1"
blocks.append(new_final)

blocks[i][ACTIVEIDX]="0" # turn off "active" flag in old active block
blocks[i][ACTIVEIDX+1:ACTIVEIDX+1+line_enc_length]=["0"]*line_enc_length # zero out line counter in old active block
blocks[new_i][ACTIVEIDX]="1" # turn on "active" flag in new active block
new_p_s = bin(new_p)[2:]
new_p_s = "0"*(line_enc_length-len(new_p_s))+new_p_s
blocks[new_i][ACTIVEIDX+1:ACTIVEIDX+1+line_enc_length] = list(new_p_s) # add binary representation of next line in new active block

return "".join(["B"+"".join(block)+"E" for block in blocks]) # return new configuration

### Deltas

Sometimes it is easier to keep track of merely the changes (sometimes known as “deltas”) in the state of a NAND++ program, rather than the full configuration. Since every step of a NAND++ program assigns a value to a single variable, this motivates the following definition:

The modification log (or “deltas”) of an $$s$$-line simple NAND++ program $$P$$ on an input $$x\in \{0,1\}^n$$ is the string $$\Delta$$ of length $$sT+n$$ whose first $$n$$ bits are equal to $$x$$ and the last $$sT$$ bits correspond to the value assigned in each step of the program. That is, for every $$i\in [n]$$, $$\Delta_i=x_i$$ and for every $$\ell \in [sT]$$, $$\Delta_{\ell+n}$$ equals to the value that is assigned by the line executed in step $$\ell$$ of the execution of $$P$$ on input $$x$$, where $$T$$ is the number of iterations of the loop that $$P$$ does on input $$x$$.

If $$\Delta$$ is the “deltas” of $$P$$ on input $$x \in \{0,1\}^n$$, then for every $$\ell\in [Ts]$$, $$\Delta_\ell$$ is the same as the value assigned by line $$\ell$$ of the NAND program $$expand_{T',n}(P)$$ where $$s$$ is the number of lines in $$P$$, and for every $$T'$$ which is at least the number of loop iterations that $$P$$ takes on input $$x$$.

The details of the definitions of configuration and deltas are not as important as the main points which are:
* A configuration is the full state of the program at a certain point in the computation. Applying the $$NEXT_P$$ function to the current configuration yields the next configuration.
* Each configuration can be thought of as a string which is a sequence of constant-size blocks. The $$NEXT_P$$ function only depends and modifies a constant number of blocks: the $$t$$ first ones, the current active block, and its two adjacent neighbors.
* The “delta” or “modification log” of computation is a succinct description of how the configuration changed in each step of the computation. It is simply the string $$\Delta$$ of length $$T$$ such that for every $$\ell \in T$$, $$\Delta_\ell$$ is denotes the value assigned in the $$\ell$$-th step of the computation.

Both configurations and Deltas are technical ways to capture the fact that computation is a complex process that is obtained as the result of a long sequence of simple steps.

## Lecture summary

• NAND++ programs introduce the notion of loops, and allow us to capture a single algorithm that can evaluate functions of any input length.
• Running a NAND++ program for any finite number of steps corresponds to a NAND program. However, the key feature of NAND++ is that the number of iterations can depend on the input, rather than being a fixed upper bound in advance.
• A configuration of a NAND++ program encodes the state of the program at a given point in the computation. The next step function of the program maps the current configuration to the next one.

## Exercises

Suppose that $$t$$ is the “iteration counter” of a NAND++ program, in the sense that $$t$$ is initialized to zero, and is incremented by one each time the program finishes an iteration and goes back to the first line. Prove that the value of the variable i is equal to $$t-r(r+1)$$ if $$t \leq (r+1)^2$$ and equals $$(r+2)(r+1)-t$$ otherwise, where $$r = \floor{\sqrt{t+1/4}-1/2}$$.

## Bibliographical notes

The notion of “NAND++ programs” we use is nonstandard but (as we will see) they are equivalent to standard models used in the literature. Specifically, NAND++ programs are closely related (though not identical) to oblivious one-tape Turing machines, while NAND<< programs are essentially the same as RAM machines. As we’ve seen in these lectures, in a qualitative sense these two models are also equivalent to one another, though the distinctions between them matter if one cares (as is typically the case in algorithms research) about polynomial factors in the running time.

## Further explorations

Some topics related to this lecture that might be accessible to advanced students include: (to be completed)