Introduction to Theoretical Computer Science — Boaz Barak


\[ \newcommand{\undefined}{} \newcommand{\hfill}{} \newcommand{\qedhere}{\square} \newcommand{\qed}{\square} \newcommand{\ensuremath}[1]{#1} \newcommand{\bbA}{\mathbb A} \newcommand{\bbB}{\mathbb B} \newcommand{\bbC}{\mathbb C} \newcommand{\bbD}{\mathbb D} \newcommand{\bbE}{\mathbb E} \newcommand{\bbF}{\mathbb F} \newcommand{\bbG}{\mathbb G} \newcommand{\bbH}{\mathbb H} \newcommand{\bbI}{\mathbb I} \newcommand{\bbJ}{\mathbb J} \newcommand{\bbK}{\mathbb K} \newcommand{\bbL}{\mathbb L} \newcommand{\bbM}{\mathbb M} \newcommand{\bbN}{\mathbb N} \newcommand{\bbO}{\mathbb O} \newcommand{\bbP}{\mathbb P} \newcommand{\bbQ}{\mathbb Q} \newcommand{\bbR}{\mathbb R} \newcommand{\bbS}{\mathbb S} \newcommand{\bbT}{\mathbb T} \newcommand{\bbU}{\mathbb U} \newcommand{\bbV}{\mathbb V} \newcommand{\bbW}{\mathbb W} \newcommand{\bbX}{\mathbb X} \newcommand{\bbY}{\mathbb Y} \newcommand{\bbZ}{\mathbb Z} \newcommand{\sA}{\mathscr A} \newcommand{\sB}{\mathscr B} \newcommand{\sC}{\mathscr C} \newcommand{\sD}{\mathscr D} \newcommand{\sE}{\mathscr E} \newcommand{\sF}{\mathscr F} \newcommand{\sG}{\mathscr G} \newcommand{\sH}{\mathscr H} \newcommand{\sI}{\mathscr I} \newcommand{\sJ}{\mathscr J} \newcommand{\sK}{\mathscr K} \newcommand{\sL}{\mathscr L} \newcommand{\sM}{\mathscr M} \newcommand{\sN}{\mathscr N} \newcommand{\sO}{\mathscr O} \newcommand{\sP}{\mathscr P} \newcommand{\sQ}{\mathscr Q} \newcommand{\sR}{\mathscr R} \newcommand{\sS}{\mathscr S} \newcommand{\sT}{\mathscr T} \newcommand{\sU}{\mathscr U} \newcommand{\sV}{\mathscr V} \newcommand{\sW}{\mathscr W} \newcommand{\sX}{\mathscr X} \newcommand{\sY}{\mathscr Y} \newcommand{\sZ}{\mathscr Z} \newcommand{\sfA}{\mathsf A} \newcommand{\sfB}{\mathsf B} \newcommand{\sfC}{\mathsf C} \newcommand{\sfD}{\mathsf D} \newcommand{\sfE}{\mathsf E} \newcommand{\sfF}{\mathsf F} \newcommand{\sfG}{\mathsf G} \newcommand{\sfH}{\mathsf H} \newcommand{\sfI}{\mathsf I} \newcommand{\sfJ}{\mathsf J} \newcommand{\sfK}{\mathsf K} \newcommand{\sfL}{\mathsf L} \newcommand{\sfM}{\mathsf M} \newcommand{\sfN}{\mathsf N} \newcommand{\sfO}{\mathsf O} \newcommand{\sfP}{\mathsf P} \newcommand{\sfQ}{\mathsf Q} \newcommand{\sfR}{\mathsf R} \newcommand{\sfS}{\mathsf S} \newcommand{\sfT}{\mathsf T} \newcommand{\sfU}{\mathsf U} \newcommand{\sfV}{\mathsf V} \newcommand{\sfW}{\mathsf W} \newcommand{\sfX}{\mathsf X} \newcommand{\sfY}{\mathsf Y} \newcommand{\sfZ}{\mathsf Z} \newcommand{\cA}{\mathcal A} \newcommand{\cB}{\mathcal B} \newcommand{\cC}{\mathcal C} \newcommand{\cD}{\mathcal D} \newcommand{\cE}{\mathcal E} \newcommand{\cF}{\mathcal F} \newcommand{\cG}{\mathcal G} \newcommand{\cH}{\mathcal H} \newcommand{\cI}{\mathcal I} \newcommand{\cJ}{\mathcal J} \newcommand{\cK}{\mathcal K} \newcommand{\cL}{\mathcal L} \newcommand{\cM}{\mathcal M} \newcommand{\cN}{\mathcal N} \newcommand{\cO}{\mathcal O} \newcommand{\cP}{\mathcal P} \newcommand{\cQ}{\mathcal Q} \newcommand{\cR}{\mathcal R} \newcommand{\cS}{\mathcal S} \newcommand{\cT}{\mathcal T} \newcommand{\cU}{\mathcal U} \newcommand{\cV}{\mathcal V} \newcommand{\cW}{\mathcal W} \newcommand{\cX}{\mathcal X} \newcommand{\cY}{\mathcal Y} \newcommand{\cZ}{\mathcal Z} \newcommand{\bfA}{\mathbf A} \newcommand{\bfB}{\mathbf B} \newcommand{\bfC}{\mathbf C} \newcommand{\bfD}{\mathbf D} \newcommand{\bfE}{\mathbf E} \newcommand{\bfF}{\mathbf F} \newcommand{\bfG}{\mathbf G} \newcommand{\bfH}{\mathbf H} \newcommand{\bfI}{\mathbf I} \newcommand{\bfJ}{\mathbf J} \newcommand{\bfK}{\mathbf K} \newcommand{\bfL}{\mathbf L} \newcommand{\bfM}{\mathbf M} \newcommand{\bfN}{\mathbf N} \newcommand{\bfO}{\mathbf O} \newcommand{\bfP}{\mathbf P} \newcommand{\bfQ}{\mathbf Q} \newcommand{\bfR}{\mathbf R} \newcommand{\bfS}{\mathbf S} \newcommand{\bfT}{\mathbf T} \newcommand{\bfU}{\mathbf U} \newcommand{\bfV}{\mathbf V} \newcommand{\bfW}{\mathbf W} \newcommand{\bfX}{\mathbf X} \newcommand{\bfY}{\mathbf Y} \newcommand{\bfZ}{\mathbf Z} \newcommand{\rmA}{\mathrm A} \newcommand{\rmB}{\mathrm B} \newcommand{\rmC}{\mathrm C} \newcommand{\rmD}{\mathrm D} \newcommand{\rmE}{\mathrm E} \newcommand{\rmF}{\mathrm F} \newcommand{\rmG}{\mathrm G} \newcommand{\rmH}{\mathrm H} \newcommand{\rmI}{\mathrm I} \newcommand{\rmJ}{\mathrm J} \newcommand{\rmK}{\mathrm K} \newcommand{\rmL}{\mathrm L} \newcommand{\rmM}{\mathrm M} \newcommand{\rmN}{\mathrm N} \newcommand{\rmO}{\mathrm O} \newcommand{\rmP}{\mathrm P} \newcommand{\rmQ}{\mathrm Q} \newcommand{\rmR}{\mathrm R} \newcommand{\rmS}{\mathrm S} \newcommand{\rmT}{\mathrm T} \newcommand{\rmU}{\mathrm U} \newcommand{\rmV}{\mathrm V} \newcommand{\rmW}{\mathrm W} \newcommand{\rmX}{\mathrm X} \newcommand{\rmY}{\mathrm Y} \newcommand{\rmZ}{\mathrm Z} \newcommand{\paren}[1]{( #1 )} \newcommand{\Paren}[1]{\left( #1 \right)} \newcommand{\bigparen}[1]{\bigl( #1 \bigr)} \newcommand{\Bigparen}[1]{\Bigl( #1 \Bigr)} \newcommand{\biggparen}[1]{\biggl( #1 \biggr)} \newcommand{\Biggparen}[1]{\Biggl( #1 \Biggr)} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\Abs}[1]{\left\lvert #1 \right\rvert} \newcommand{\bigabs}[1]{\bigl\lvert #1 \bigr\rvert} \newcommand{\Bigabs}[1]{\Bigl\lvert #1 \Bigr\rvert} \newcommand{\biggabs}[1]{\biggl\lvert #1 \biggr\rvert} \newcommand{\Biggabs}[1]{\Biggl\lvert #1 \Biggr\rvert} \newcommand{\card}[1]{\lvert #1 \rvert} \newcommand{\Card}[1]{\left\lvert #1 \right\rvert} \newcommand{\bigcard}[1]{\bigl\lvert #1 \bigr\rvert} \newcommand{\Bigcard}[1]{\Bigl\lvert #1 \Bigr\rvert} \newcommand{\biggcard}[1]{\biggl\lvert #1 \biggr\rvert} \newcommand{\Biggcard}[1]{\Biggl\lvert #1 \Biggr\rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\Norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\bignorm}[1]{\bigl\lVert #1 \bigr\rVert} \newcommand{\Bignorm}[1]{\Bigl\lVert #1 \Bigr\rVert} \newcommand{\biggnorm}[1]{\biggl\lVert #1 \biggr\rVert} \newcommand{\Biggnorm}[1]{\Biggl\lVert #1 \Biggr\rVert} \newcommand{\iprod}[1]{\langle #1 \rangle} \newcommand{\Iprod}[1]{\left\langle #1 \right\rangle} \newcommand{\bigiprod}[1]{\bigl\langle #1 \bigr\rangle} \newcommand{\Bigiprod}[1]{\Bigl\langle #1 \Bigr\rangle} \newcommand{\biggiprod}[1]{\biggl\langle #1 \biggr\rangle} \newcommand{\Biggiprod}[1]{\Biggl\langle #1 \Biggr\rangle} \newcommand{\set}[1]{\lbrace #1 \rbrace} \newcommand{\Set}[1]{\left\lbrace #1 \right\rbrace} \newcommand{\bigset}[1]{\bigl\lbrace #1 \bigr\rbrace} \newcommand{\Bigset}[1]{\Bigl\lbrace #1 \Bigr\rbrace} \newcommand{\biggset}[1]{\biggl\lbrace #1 \biggr\rbrace} \newcommand{\Biggset}[1]{\Biggl\lbrace #1 \Biggr\rbrace} \newcommand{\bracket}[1]{\lbrack #1 \rbrack} \newcommand{\Bracket}[1]{\left\lbrack #1 \right\rbrack} \newcommand{\bigbracket}[1]{\bigl\lbrack #1 \bigr\rbrack} \newcommand{\Bigbracket}[1]{\Bigl\lbrack #1 \Bigr\rbrack} \newcommand{\biggbracket}[1]{\biggl\lbrack #1 \biggr\rbrack} \newcommand{\Biggbracket}[1]{\Biggl\lbrack #1 \Biggr\rbrack} \newcommand{\ucorner}[1]{\ulcorner #1 \urcorner} \newcommand{\Ucorner}[1]{\left\ulcorner #1 \right\urcorner} \newcommand{\bigucorner}[1]{\bigl\ulcorner #1 \bigr\urcorner} \newcommand{\Bigucorner}[1]{\Bigl\ulcorner #1 \Bigr\urcorner} \newcommand{\biggucorner}[1]{\biggl\ulcorner #1 \biggr\urcorner} \newcommand{\Biggucorner}[1]{\Biggl\ulcorner #1 \Biggr\urcorner} \newcommand{\ceil}[1]{\lceil #1 \rceil} \newcommand{\Ceil}[1]{\left\lceil #1 \right\rceil} \newcommand{\bigceil}[1]{\bigl\lceil #1 \bigr\rceil} \newcommand{\Bigceil}[1]{\Bigl\lceil #1 \Bigr\rceil} \newcommand{\biggceil}[1]{\biggl\lceil #1 \biggr\rceil} \newcommand{\Biggceil}[1]{\Biggl\lceil #1 \Biggr\rceil} \newcommand{\floor}[1]{\lfloor #1 \rfloor} \newcommand{\Floor}[1]{\left\lfloor #1 \right\rfloor} \newcommand{\bigfloor}[1]{\bigl\lfloor #1 \bigr\rfloor} \newcommand{\Bigfloor}[1]{\Bigl\lfloor #1 \Bigr\rfloor} \newcommand{\biggfloor}[1]{\biggl\lfloor #1 \biggr\rfloor} \newcommand{\Biggfloor}[1]{\Biggl\lfloor #1 \Biggr\rfloor} \newcommand{\lcorner}[1]{\llcorner #1 \lrcorner} \newcommand{\Lcorner}[1]{\left\llcorner #1 \right\lrcorner} \newcommand{\biglcorner}[1]{\bigl\llcorner #1 \bigr\lrcorner} \newcommand{\Biglcorner}[1]{\Bigl\llcorner #1 \Bigr\lrcorner} \newcommand{\bigglcorner}[1]{\biggl\llcorner #1 \biggr\lrcorner} \newcommand{\Bigglcorner}[1]{\Biggl\llcorner #1 \Biggr\lrcorner} \newcommand{\expr}[1]{\langle #1 \rangle} \newcommand{\Expr}[1]{\left\langle #1 \right\rangle} \newcommand{\bigexpr}[1]{\bigl\langle #1 \bigr\rangle} \newcommand{\Bigexpr}[1]{\Bigl\langle #1 \Bigr\rangle} \newcommand{\biggexpr}[1]{\biggl\langle #1 \biggr\rangle} \newcommand{\Biggexpr}[1]{\Biggl\langle #1 \Biggr\rangle} \newcommand{\e}{\varepsilon} \newcommand{\eps}{\varepsilon} \newcommand{\from}{\colon} \newcommand{\super}[2]{#1^{(#2)}} \newcommand{\varsuper}[2]{#1^{\scriptscriptstyle (#2)}} \newcommand{\tensor}{\otimes} \newcommand{\eset}{\emptyset} \newcommand{\sse}{\subseteq} \newcommand{\sst}{\substack} \newcommand{\ot}{\otimes} \newcommand{\Esst}[1]{\bbE_{\substack{#1}}} \newcommand{\vbig}{\vphantom{\bigoplus}} \newcommand{\seteq}{\mathrel{\mathop:}=} \newcommand{\defeq}{\stackrel{\mathrm{def}}=} \newcommand{\Mid}{\mathrel{}\middle|\mathrel{}} \newcommand{\Ind}{\mathbf 1} \newcommand{\bits}{\{0,1\}} \newcommand{\sbits}{\{\pm 1\}} \newcommand{\R}{\mathbb R} \newcommand{\Rnn}{\R_{\ge 0}} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\Q}{\mathbb Q} \newcommand{\mper}{\,.} \newcommand{\mcom}{\,,} \DeclareMathOperator{\Id}{Id} \DeclareMathOperator{\cone}{cone} \DeclareMathOperator{\vol}{vol} \DeclareMathOperator{\val}{val} \DeclareMathOperator{\opt}{opt} \DeclareMathOperator{\Opt}{Opt} \DeclareMathOperator{\Val}{Val} \DeclareMathOperator{\LP}{LP} \DeclareMathOperator{\SDP}{SDP} \DeclareMathOperator{\Tr}{Tr} \DeclareMathOperator{\Inf}{Inf} \DeclareMathOperator{\poly}{poly} \DeclareMathOperator{\polylog}{polylog} \DeclareMathOperator{\argmax}{arg\,max} \DeclareMathOperator{\argmin}{arg\,min} \DeclareMathOperator{\qpoly}{qpoly} \DeclareMathOperator{\qqpoly}{qqpoly} \DeclareMathOperator{\conv}{conv} \DeclareMathOperator{\Conv}{Conv} \DeclareMathOperator{\supp}{supp} \DeclareMathOperator{\sign}{sign} \DeclareMathOperator{\mspan}{span} \DeclareMathOperator{\mrank}{rank} \DeclareMathOperator{\E}{\mathbb E} \DeclareMathOperator{\pE}{\tilde{\mathbb E}} \DeclareMathOperator{\Pr}{\mathbb P} \DeclareMathOperator{\Span}{Span} \DeclareMathOperator{\Cone}{Cone} \DeclareMathOperator{\junta}{junta} \DeclareMathOperator{\NSS}{NSS} \DeclareMathOperator{\SA}{SA} \DeclareMathOperator{\SOS}{SOS} \newcommand{\iprod}[1]{\langle #1 \rangle} \newcommand{\R}{\mathbb{R}} \newcommand{\cE}{\mathcal{E}} \newcommand{\E}{\mathbb{E}} \newcommand{\pE}{\tilde{\mathbb{E}}} \newcommand{\N}{\mathbb{N}} \renewcommand{\P}{\mathcal{P}} \notag \]
\[ \newcommand{\sleq}{\ensuremath{\preceq}} \newcommand{\sgeq}{\ensuremath{\succeq}} \newcommand{\diag}{\ensuremath{\mathrm{diag}}} \newcommand{\support}{\ensuremath{\mathrm{support}}} \newcommand{\zo}{\ensuremath{\{0,1\}}} \newcommand{\pmo}{\ensuremath{\{\pm 1\}}} \newcommand{\uppersos}{\ensuremath{\overline{\mathrm{sos}}}} \newcommand{\lambdamax}{\ensuremath{\lambda_{\mathrm{max}}}} \newcommand{\rank}{\ensuremath{\mathrm{rank}}} \newcommand{\Mslow}{\ensuremath{M_{\mathrm{slow}}}} \newcommand{\Mfast}{\ensuremath{M_{\mathrm{fast}}}} \newcommand{\Mdiag}{\ensuremath{M_{\mathrm{diag}}}} \newcommand{\Mcross}{\ensuremath{M_{\mathrm{cross}}}} \newcommand{\eqdef}{\ensuremath{ =^{def}}} \newcommand{\threshold}{\ensuremath{\mathrm{threshold}}} \newcommand{\vbls}{\ensuremath{\mathrm{vbls}}} \newcommand{\cons}{\ensuremath{\mathrm{cons}}} \newcommand{\edges}{\ensuremath{\mathrm{edges}}} \newcommand{\cl}{\ensuremath{\mathrm{cl}}} \newcommand{\xor}{\ensuremath{\oplus}} \newcommand{\1}{\ensuremath{\mathrm{1}}} \notag \]
\[ \newcommand{\transpose}[1]{\ensuremath{#1{}^{\mkern-2mu\intercal}}} \newcommand{\dyad}[1]{\ensuremath{#1#1{}^{\mkern-2mu\intercal}}} \newcommand{\nchoose}[1]{\ensuremath{{n \choose #1}}} \newcommand{\generated}[1]{\ensuremath{\langle #1 \rangle}} \notag \]

Quantum computing

  • Main aspects in which quantum mechanics differs from local deterministic theories.
  • Model of quantum circuits, or equivalently QNAND programs
  • Simon’s Algorithm: an example of potential exponential speedup using quantum computers that predated and inspired Shor’s factoring algorithm.

“We always have had (secret, secret, close the doors!) … a great deal of difficulty in understanding the world view that quantum mechanics represents … It has not yet become obvious to me that there’s no real problem. … Can I learn anything from asking this question about computers–about this may or may not be mystery as to what the world view of quantum mechanics is?” , Richard Feynman, 1981

“The only difference between a probabilistic classical world and the equations of the quantum world is that somehow or other it appears as if the probabilities would have to go negative”, Richard Feynman, 1981

There were two schools of natural philosophy in ancient Greece. Aristotle believed that objects have an essence that explains their behavior, and a theory of the natural world has to refer to the reasons (or “final cause” to use Aristotle’s language) as to why they exhibit certain phenomena. Democritus believed in a purely mechanistic explanation of the world. In his view, the universe was ultimately composed of elementary particles (or Atoms) and our observed phenomena arise from the interactions between these particles according to some local rules. Modern science (arguably starting with Newton) has embraced Democritus’ point of view, of a mechanistic or “clockwork” universe of particles and forces acting upon them.

While the classification of particles and forces evolved with time, to a large extent the “big picture” has not changed from Newton till Einstein. In particular it was held as an axiom that if we knew fully the current state of the universe (i.e., the particles and their properties such as location and velocity) then we could predict its future state at any point in time. In computational language, in all these theories the state of a system with \(n\) particles could be stored in an array of \(O(n)\) numbers, and predicting the evolution of the system can be done by running some efficient (e.g., \(poly(n)\) time) computation on this array.

Alas, in the beginning of the 20th century, several experimental results were calling into question the “billiard ball” theory of the world. One such experiment is the famous double slit experiment. One way to describe it is as following. Suppose that we buy one of those baseball pitching machines, and aim it at a soft plastic wall. If we shoot baseballs at the wall, then we will dent it. Now, if we use another machine, aimed at a slightly different part of the wall, and interleave between shooting at the wall with both machines, then we will now make two dents in the wall. Obviously, we expect the level of “denting” in any particular position of the wall to only be bigger when we shoot at it with two machines than when we shoot at it with one. (See Reference:doublebaseballfig)

In the “double baseball experiment” we shoot baseballs from two guns at a wall. There is only “constructive interference” in the sense that the dent in each position in the wall when both guns operate is the sum of the dents when each gun operates on its own.

The above is (to my knowledge) an accurate description of what happens when we shoot baseballs at a wall. However, this is not the same when we shoot photons. Amazingly, if we shoot with two “photon guns” (i.e., lasers) at a wall equipped with photon detectors, then some of the detectors will see fewer hits when the two lasers operate than when only one of them does.Normally rather than shooting with two lasers, people use a single laser with a barrier between the laser and the detectors that has either one or two slits open in it, hence the name “double slit experiment”, see Reference:doubleslitfig and Reference:doubleslittwofig. The variant of the experiment we describe was first performed by Pfleegor and Mandel in 1967. A nice illustrated description of the double slit experiment appears in this video.] In particular there are positions in the wall that are hit when the first gun is turned on, and when the second gun is turned on, but are not hit at all when both guns are turned on!. It’s almost as if the photons from both guns are aware of each other’s existence, and behave differently when they know that in the future a photon would be shot from another gun. (Indeed, we stress that we can modulate the rate of firing so that photons are not fired at the same time, and so there is no chance of “collision”.)

This and other experiments ultimately forced scientists to accept the following picture of the world. Let us go back to the baseball experiment, and consider a particular position in the wall. Suppose that the probability that a ball shot from the first machine hits that position is \(p\), and the probability that a ball shot from the second machine hits the same position is \(q\). Then, if we shoot \(N\) balls out of each gun, we expect that this position will be hit \((p+q)N\) times. In the quantum world, for photons, we have almost the same picture, except that the probabilities can be negative. In particular, it can be the case that \(p+q=0\), in which case the position would be hit with nonzero probability when gun number one is operating, and with nonzero probability when gun number two is operating, but with zero probability when both of them are operating. If we try to “catch photons in the act” and place detectors right next to the mouth of each gun so we can see exactly the path that the photons took then something even more bizarre happens. The mere fact that we measure the path changes the photon’s behavior, and now this “destructive interference” pattern is gone and the detector will be hit \(p+q\) fraction of the time.

You should read the paragraphs above more than once and make sure you appreciate how truly mind boggling these results are.

The setup of the double slit experiment
Results of the double slit experiement.

Negative probabilities

What does it mean for a probability to be negative? The physicists’ answer is that it does not mean much in isolation, but it can cause interference when a positive and negative probability interact. Specifically, let’s consider the simplest system of all: one that can be in only one of two states, call them “red” and “blue”. (If you have some physics background then you can think of an electron that can be in either “spin up” or “spin down” state.) In classical probability terms, we would model the state of such a system by a pair of two non-negative numbers \(p,q\) such that \(p+q=1\). If we observe (or measure, to use quantum mechanical parlance), the color of the system, we will see that it is red with probability \(p\) and blue with probability \(q\).

In quantum mechanics, we model the state of such a system by a pair of two (potentially negative) numbers \(\alpha,\beta\) such that \(\alpha^2 + \beta^2 = 1\). If we measure the color of the system then we will see that it is red with probability \(\alpha^2\) and blue with probability \(\beta^2\).I should warn that we are making here many simplifications. In particular in quantum mechanics the “probabilities” can actually be complex numbers, though essentially all of the power and subtleties of quantum mechanics and quantum computing arise from allowing negative real numbers, and the generalization from real to complex numbers is much less important. We will also be focusing on so called “pure” quantum states, and ignore the fact that generally the states of a quantum subsystem are mixed states that are a convex combination of pure states and can be described by a so called density matrix. This issue does not arise as much in quantum algorithms precisely because the goal is for a quantum computer is to be an isolated system that can evolve to continue to be in a pure state; in real world quantum computers however there will be interference from the outside world that causes the state to become mixed and increase its so called “von Neumann entropy”. Fighting this interference and the second law of thermodynamics is much of what the challenge of building quantum computers is all about . More generally, this lecture is not meant to be a complete or accurate description of quantum mechanics, quantum information theory, or quantum computing, but rather just give a sense of the main points where it differs from classical computing. In isolation, these negative numbers don’t matter much, since we anyway square them to obtain probabilities. But as we mention above, the interaction of positive and negative probabilities can result in surprising cancellations where somehow combining two scenarios where a system is “blue” with positive probability results in a scenario where it is never blue.

Quantum mechanics is a mathematical theory that allows us to calculate and predict the results of the double-slit and many other experiments. If you think of quantum as an explanation as to what “really” goes on in the world, it can be rather confusing. However, if you simply “shut up and calculate” then it works amazingly well at predicting the experimental results. In particular, in the double slit experiment, for any position in the wall, we can compute numbers \(\alpha\) and \(\beta\) such that photons from the first and second gun hit that position with probabilities \(\alpha^2\) and \(\beta^2\) respectively. When we activate both guns, the probability that the position will be hit is proportional to \((\alpha+\beta)^2\), and so in particular, if \(\alpha=-\beta\) then it will be the case that, despite being hit when either gun one or gun two are working, the position is not hit at all when they both work. If you haven’t seen it before, it may seem like complete nonsense and at this point I’ll have to politely point you back to the part where I said we should not question quantum mechanics but simply “shut up and calculate”.

Bell’s Inequality

There is something weird about quantum mechanics. In 1935 Einstein, Podolsky and Rosen (EPR) tried to pinpoint this issue by highlighting a previously unrealized corollary of this theory. People already understood that the fact that quantum measurement collapses a state to a definite state yields the uncertainty principle, whereby if you measure a quantum system in one orthogonal basis, you will not know how it would have measured in an incoherent basis to it (such as position vs. momentum). What EPR showed was that quantum mechanics results in so called “spooky action at a distance” where if you have a system of two particles then measuring one of them would instantenously effect the state of the other. Since this “state” is just a mathematical description, as far as I know the EPR paper was considered to be a thought experiment showing troubling aspects of quantum mechanics, without bearing on experiment. This changed when in 1965 John Bell showed an actual experiment to test the predictions of EPR and hence pit intuitive common sense against the predictions of quantum mechanics. Quantum mechanics won. Nonetheless, since the results of these experiments are so obviously wrong to anyone that has ever sat in an armchair, that there are still a number of Bell denialists arguing that quantum mechanics is wrong in some way.

So, what is this Bell’s Inequality? Suppose that Alice and Bob try to convince you they have telepathic ability, and they aim to prove it via the following experiment. Alice and Bob will be in separate closed rooms.If you are extremely paranoid about Alice and Bob communicating with one another, you can coordinate with your assistant to perform the experiment exactly at the same time, and make sure that the rooms are sufficiently far apart (e.g., are on two sides of the world, or maybe even one is on the moon and another is on earth) so that Alice and Bob couldn’t communicate to each other in time the results of the coin even if they do so at the speed of light. You will interrogate Alice and your associate will interrogate Bob. You choose a random bit \(x\in\{0,1\}\) and your associate chooses a random \(y\in\{0,1\}\). We let \(a\) be Alice’s response and \(b\) be Bob’s response. We say that Alice and Bob win this experiment if \(a \oplus b = x \wedge y\). In other words, Alice and Bob need to output two bits that disagree if \(x=y=1\) and agree otherwise.

Now if Alice and Bob are not telepathic, then they need to agree in advance on some strategy. It’s not hard for Alice and Bob to succeed with probability \(3/4\): just always output the same bit. However, by doing some case analysis, we can show that no matter what strategy they use, Alice and Bob cannot succeed with higher probability than that:

For every two functions \(f,g:\{0,1\}\rightarrow\{0,1\}\), \(\Pr_{x,y \in \{0,1\}}[ f(x) \oplus g(y) = x \wedge y] \leq 3/4\).

Since the probability is taken over all four choices of \(x,y \in \{0,1\}\), the theorem can only be violated if there exist \(f,g\) that satisfy \(f(x) \oplus g(y) = x \wedge y \;(*)\) for every \(x,y \in \{0,1\}^2\). In other words, we assume for the sake of contradiction that \(f(x) = (x \wedge y) \oplus g(y)\;(*)\) for every \(x,y \in \{0,1\}^2\). So if \(y=0\) it must be that \(f(x)=(x \wedge 0) \oplus g(0) = 0 \oplus g(0)= g(0)\) for both \(x=0\) and \(x=1\) and in particular \(f(0)=f(1)\). On the other hand, if \(y=1\) then plugging \(x=0\) and \(x=1\) to \((*)\) implies that \(f(0)= g(1)\) and \(f(1)=1 \oplus g(1)\). Yet this implies that \(f(0) \neq f(1)\), containing a contradiction.

An amazing experimentally verified fact is that quantum mechanics allows for telepathy.More accurately, one either has to give up on a “billiard ball type” theory of the universe or believe in telepathy (believe it or not, some scientists went for the latter option). Specifically, it has been shown that using the weirdness of quantum mechanics, there is in fact a strategy for Alice and Bob to succeed in this game with probability at least \(0.8\) (see Reference:bellstrategy).

Quantum weirdness

Some of the counterintuitive properties that arise from these negative probabilities include:

Again, as counter-intuitive as these concepts are, they have been experimentally confirmed, so we just have to live with them.

The discussion in this lecture is quite brief and somewhat superficial. The chapter on quantum computation in my book with Arora (see draft here) is one relatively short resource that contains essentially everything we discuss here and more. See also this blog post of Aaronson for a high level explanation of Shor’s algorithm which ends with links to several more detailed expositions. This lecture of Aaronson for a great discussion of the feasibility of quantum computing (Aaronson’s course lecture notes and the book that they spawned are fantastic reads as well). The videos of Umesh Variani’z EdX course are an accessible and recommended introduction to quantum computing. See the “bibliographical notes” section at the end of this lecture for more resources.

Quantum computing and computation - an executive summary.

One of the strange aspects of the quantum-mechanical picture of the world is that unlike in the billiard ball example, there is no obvious algorithm to simulate the evolution of \(n\) particles over \(t\) time periods in \(poly(n,t)\) steps. In fact, the natural way to simulate \(n\) quantum particles will require a number of steps that is exponential in \(n\). This is a huge headache for scientists that actually need to do these calculations in practice.

In the 1981, physicist Richard Feynman proposed to “turn this lemon to lemonade” by making the following almost tautological observation:

If a physical system cannot be simulated by a computer in \(T\) steps, the system can be considered as performing a computation that would take more than \(T\) steps.

So, he asked whether one could design a quantum system such that its outcome \(y\) based on the initial condition \(x\) would be some function \(y=f(x)\) such that (a) we don’t know how to efficiently compute in any other way, and (b) is actually useful for something.As its title suggests, Feynman’s lecture was actually focused on the other side of simulating physics with a computer. However, he mentioned that as a “side remark” one could wonder if it’s possible to simulate physics with a new kind of computer - a “quantum computer” which would “not [be] a Turing machine, but a machine of a different kind”. As far as I know, Feynman did not suggest that such a computer could be useful for computations completely outside the domain of quantum simulation, and in fact he found the question of whether quantum mechanics could be simulated by a classical computer to be more interesting. In 1985, David Deutsch formally suggested the notion of a quantum Turing machine, and the model has been since refined in works of Detusch and Josza and Bernstein and Vazirani. Such a system is now known as a quantum computer.

For a while these hypothetical quantum computers seemed useful for one of two things. First, to provide a general-purpose mechanism to simulate a variety of the real quantum systems that people care about. Second, as a challenge to the Extended Church Turing hypothesis which says that every physically realizable computation device can be modeled (up to polynomial overhead) by Turing machines (or equivalently, NAND++ / NAND<< programs). However, (unless you care about quantum chemistry) it seemed like a challenge that might have little bearing to practice, given that this theoretical “extra power” of quantum computer seemed to offer little advantage in the majority of the problems people want to solve in areas such as combinatorial optimization, machine learning, data structures, etc..

To a significant extent, this is still true today. We have no real evidence that quantum computers, if built, will offer truly significantI am using the theorist’ definition of conflating “significant” with “super-polynomial”. Grover’s algorithm does offer a very generic quadratic advantage in computation. Whether that quadratic advantage will ever be good enough to offset in practice the significant overhead in building a quantum computer remains an open question. We also don’t have evidence that super-polynomial speedups can’t be achieved for some problems outside the Factoring/Dlog or quantum simulation domains, and there is at least one company banking on such speedups actually being feasible. advantage in 99% of the applications of computing. In particular, as far as we know, quantum computers will not help us solve \(\mathbf{NP}\) complete problems in polynomial or even sub-exponential time.This “99 percent” is a figure of speech, but not completely so. It seems that for many web servers, the TLS protocol (which based on the current non-lattice based systems would be completely broken by quantum computing) is responsible for about 1 percent of the CPU usage. However, there is one cryptography-sized exception: In 1994 Peter Shor showed that quantum computers can solve the integer factoring and discrete logarithm in polynomial time. This result has captured the imagination of a great many people, and completely energized research into quantum computing.
This is both because the hardness of these particular problems provides the foundations for securing such a huge part of our communications (and these days, our economy), as well as it was a powerful demonstration that quantum computers could turn out to be useful for problems that a-priori seemd to have nothing to do with quantum physics. As we’ll discuss later, at the moment there are several intensive efforts to construct large scale quantum computers. It seems safe to say that, as far as we know, in the next five years or so there will not be a quantum computer large enough to factor, say, a \(1024\) bit number, but there it is quite possible that some quantum computer will be built that is strong enough to achieve some task that is too inefficient to achieve with a non-quantum or “classical” computer (or at least requires far more resources classically than it would for this computer). When and if such a computer is built that can break reasonable parameters of Diffie Hellman, RSA and elliptic curve cryptography is anybody’s guess. It could also be a “self destroying prophecy” whereby the existence of a small-scale quantum computer would cause everyone to shift away to lattice-based crypto which in turn will diminish the motivation to invest the huge resources needed to build a large scale quantum computer.Of course, given that we’re still hearing of attacks exploiting “export grade” cryptography that was supposed to disappear in 1990’s, I imagine that we’ll still have products running 1024 bit RSA when everyone has a quantum laptop.

Despite popular accounts of quantum computers as having variables that can take “zero and one at the same time” and therefore can “explore an exponential number of possibilities simultaneously”, their true power is much more subtle and nuanced. In particular, as far as we know, quantum computers do not enable us to solve \(\mathbf{NP}\) complete problems such as 3SAT. In particular it is believed that for every large enough \(n\), the restriction of 3SAT to length \(n\) cannot be solved by quantum circuits (or equivalently QNAND programs) of polynomial, or even subexponential size. However, Grover’s search algorithm does give a more modest advantage (namely, quadratic) for quantum computers over classical ones for problems in \(\mathbf{NP}\). In particular, due to Grover’s search algorithm, we know that the \(k\)-SAT problem for \(n\) variables can be solved in time \(O(2^{n/2}poly(n))\) on a quantum computer for every \(k\). In contrast, the best known algorithms for \(k\)-SAT on a classical computer take roughly \(2^{(1-\tfrac{1}{k})n}\) steps.

The computational model

Before we talk about quantum computing, let us recall how we physically realize “vanilla” or classical computing. We model a logical bit that can equal \(0\) or a \(1\) by some physical system that can be in one of two states. For example, it might be a wire with high or low voltage, charged or uncharged capacitor, or even (as we saw) a pipe with or without a flow of water, or the presence or absence of a soldier crab. A classical system of \(n\) bits is composed of \(n\) such “basic systems”, each of which can be in either a “zero” or “one” state. We can model the state of such a system by a string \(s \in \{0,1\}^n\). If we perform an operation such as writing to the 17-th bit the NAND of the 3rd and 5th bits, this corresponds to applying a local function to \(s\) such as setting \(s_{17} = 1 - s_3\cdot s_5\).

In the probabilistic setting, we would model the state of the system by a distribution. For an individual bit, we could model it by a pair of non-negative numbers \(\alpha,\beta\) such that \(\alpha+\beta=1\), where \(\alpha\) is the probability that the bit is zero and \(\beta\) is the probability that the bit is one. For example, applying the negation (i.e., NOT) operation to this bit corresponds to mapping the pair \((\alpha,\beta)\) to \((\beta,\alpha)\) since the probability that \(NOT(\sigma)\) is equal to \(1\) is the same as the probability that \(\sigma\) is equal to \(0\). This means that we can think of the NOT function as the linear map \(N:\R^2 \rightarrow \R^2\) such that \(N(\alpha,\beta)=(\beta,\alpha)\).

If we think of the \(n\)-bit system as a whole, then since the \(n\) bits can take one of \(2^n\) possible values, we model the state of the system as a vector \(p\) of \(2^n\) probabilities \(p_0,\ldots,p_{2^n}\), where for every \(s\in \{0,1\}^n\), \(p_s\) denotes the probability that the system is in the state \(s\), identifying \(\{0,1\}^n\) with \([2^n]\). Applying the operation above of setting the \(17\)-th bit to the NAND of the 3rd and 5th bits, corresponds to transforming the vector \(p\) to the vector \(Fp\) where \(F:\R^{2^n} \rightarrow \R^{2^n}\) is the map such that \[ F(p)_s = \begin{cases}0 & s_{17} \neq 1 - s_3\cdot s_5 \\ p_{s}+p_{s'} & \text{otherwise} \end{cases} \label{eqprobnandevolution} \] where \(s'\) is the string that agrees with \(s\) on all but the 17th coordinate.

It might not be immediate to see why \eqref{eqprobnandevolution} describes the progression of such a system, so you should pause here and make sure you understand that. Understanding evolution of probabilistic systems is a prerequisite to understanding evolution of quantum systems. You should also make sure that you understand why the function \(F:\R^{2^n} \rightarrow F:\R^{2^n}\) described in \eqref{eqprobnandevolution} is linear, in the sense that for every pair of vectors \(p,q \in \R^{2^n}\) and numbers \(a,b\in R\), \(F(ap+bq)=aF(p)+bF(q)\).

If your linear algebra is a bit rusty, now would be a good time to review it, and in particular make sure you are comfortable with the notions of matrices, vectors, (orthogonal and orthonormal) bases, and norms.

Quantum probabilities

In the quantum setting, the state of an individual bit (or “qubit”, to use quantum parlance) is modeled by a pair of numbers \((\alpha,\beta)\) such that \(\alpha^2 + \beta^2 = 1\). As before, we think of \(\alpha^2\) as the probability that the bit equals \(0\) and \(\beta^2\) as the probability that the bit equals \(1\). Therefore, as before, we can model the NOT operation by the map \(N:\R^2 \rightarrow \R^2\) where \(N(\alpha,\beta)=(\beta,\alpha)\).

Following quantum tradition, we will denote the vector \((1,0)\) by \(|0\rangle\) and the vector \((0,1)\) by \(|1\rangle\) (and moreover, think of these as column vectors). So NOT is the unique linear map \(N:\R^2 \rightarrow \R^2\) that satisfies \(N(|0\rangle)=|1\rangle\) and \(N(|1\rangle)=|0\rangle\). (This is known as the Dirac “ket” notation.)

In classical computation, we typically think that there are only two operations that we can do on a single bit: keep it the same or negate it. In the quantum setting, a single bit operation corresponds to any linear map \(OP:\R^2 \rightarrow \R^2\) that is norm preserving in the sense that for every \(\alpha,\beta\) such that \(\alpha^2+\beta^2=1\), if \((\alpha',\beta')= OP(\alpha,\beta)\) then \(\alpha'^2 + \beta'^2 = 1\). Such a linear map \(OP\) corresponds to a unitary two by two matrix.As we mentioned, quantum mechanics actually models states as vectors with complex coordinates. However, this does not make any qualitative difference to our discussion. Keeping the bit the same corresponds to the matrix \(I = \left( \begin{smallmatrix} 1&0\\ 0&1 \end{smallmatrix} \right)\) and the NOT operations corresponds to the matrix \(N = \left( \begin{smallmatrix} 0&1\\ 1&0 \end{smallmatrix} \right)\). But there are other operations we can use as well. One such useful operation is the Hadamard operation, which corresponds to the matrix \(H = \tfrac{1}{\sqrt{2}} \left( \begin{smallmatrix} +1&+1\\ +1&-1 \end{smallmatrix} \right)\). In fact it turns out that Hadamard is all that we need to add to a classical universal basis to achieve the full power of quantum computing.

Quantum circuits and QNAND

A quantum circuit is analogous to a Boolean circuit, and can be described as a directed acyclic graph. One crucial difference that the out degree of every vertex in a quantum circuit is at most one. This is because we cannot “reuse” quantum states without measuring them (which collapses their “probabilities”). Therefore, we cannot use the same bit as input for two different gates.This is known as the No Cloning Theorem. Another more technical difference is that to express our operations as unitary matrices, we will need to make sure all our gates are reversible. This is not hard to ensure. For example, in the quantum context, instead of thinking of \(NAND\) as a (non reversible) map from \(\{0,1\}^2\) to \(\{0,1\}\), we will think of it as the reversible map on three qubits that maps \(a,b,c\) to \(a,b,c\oplus NAND(a,b)\). Equivalently, the NAND operation corresponds to the unitary map \(U_{NAND}\) on \(\R^{2^3}\) such that (identifying \(\{0,1\}^3\) with \([8]\)) for every \(a,b,c \in \{0,1\}\), if \(|abc\rangle\) is the basis element with \(1\) in the \(abc\)-th coordinate and zero elsewhere, then \(U(|abc\rangle)=|ab(c \oplus NAND(a,b))\rangle\).\(U_{NAND}\) is a close variant of the so called Toffoli gate and so QNAND programs correspond to quantum circuits with the Hadamard and Toffoli gates.

Just like in the classical case, there is an equivalence between circuits and straightline programs, and so we can define the programming language QNAND that is the quantum analog of our NAND programming language. To do so, we only add a single operation: HAD(foo) which applies the single-bit operation \(H\) to the variable foo. We also use the following interpretation to make NAND reversible: foo := bar NAND blah means that we modify foo to be the XOR of its original value and the NAND of bar and blah. (In other words, apply the \(8\) by \(8\) unitary transformation \(U_{NAND}\) defined above to the three qubits corresponding to foo, bar and blah.) If foo is initialized to zero then this makes no difference.

If \(P\) is a QNAND program with \(n\) input variables, \(\ell\) workspace variables, and \(m\) output variables, then running it on the input \(x\in \{0,1\}^n\) corresponds to setting up a system with \(n+m+\ell\) qubits and performing the following process:

  1. We initialize the input variables to \(x_0,\ldots,x_{n-1}\) and all other variables to \(0\).
  2. We execute the program line by line, applying the corresponding physical operation \(H\) or \(U_{NAND}\) to the qubits that are referred to by the line.
  3. We measure the output variables y_0\(,\ldots,\) y_\(\expr{m-1}\) and output the result.

This seems quite simple, but maintaining the qubits in a way that we can apply the operations on one hand, but we don’t accidentally measure them or corrupt them in another way, is a significant engineering challenge.

Just like NAND, QNAND is a non uniform model of computation, where we consider a different program for every input step. One can define quantum Turing machines or QNAND++ programs to capture the notion of uniform quantum computation where a single algorithm is specified for all input lengths. The quantum analog of \(\mathbf{P}\), known as \(\mathbf{BQP}\) is defined using such notions. However, for the sake of simplicity, we omit the discussion of those models in this lecture. However, all of the discussion in this section holds equally well for the uniform and non-uniform models.

Analyzing QNAND execution

The state of an \(n+\ell+m\)-qubit system can be modeled, as we discussed, by a vector in \(\R^{2^{n+\ell+m}}\). For an input \(x\), the initial state corresponds to the fact that we initialize the system to \(x0^{\ell+m}\) (i.e., the inputs variable get the values of \(x\) and all other variables get the value zero). We can think of this initial state as saying that if we measured all variables in the system then we’ll get the value \(x0^{\ell+m}\) with probability one, and hence it is modeled by the vector \(s^0 \in \R^{2^{n+\ell+m}}\) such that (identifying the coordinates with strings of length \(n+\ell+m\)) \(s^0_{x0^{\ell+m}}=1\) and \(s^0_z = 1\) for every \(z \neq x0^{n+\ell+m}\). (Please pause here and see that you understand what the notation above means.)

Every line in the program corresponds to applying a (rather simple) unitary map on \(\R^{2^{n+\ell+m}}\), and so if \(L_i\) is the map corresponding to the \(i\)-th line, then \(s_{i+1}=L_i(s_i)\). If the program has \(t\) lines then \(s^* = L_{t-1}(L_{t-2}(\cdots L_0(s^0)))\) is the final state of the system. At the end of the process we measure the bits, and so we get a particular Boolean assignment \(z\) for its variables with probability \((s^*_z)^2\). Since we output the bits corresponding to the output variables, for every string \(y\in \{0,1\}^m\), the output will equal \(y\) with probability \(\sum_{z\in S_y} (s^*_z)^2\) where \(S_y\) is the set of all \(z\in \{0,1\}^{n+\ell+m}\) that agree with \(y\) in the last \(m\) coordinates.

A priori it might seem “obvious” that quantum computing is exponentially powerful, since to perform a quantum computation on \(n\) bits we need to maintain the \(2^n\) dimensional state vector and apply \(2^n\times 2^n\) matrices to it. Indeed popular descriptions of quantum computing (too) often say something along the lines that the difference between quantum and classical computer is that a classic bit can either be zero or one while a qubit can be in both states at once, and so in many qubits a quantum computer can perform exponentially many computations at once.

Depending on how you interpret it, this description is either false or would apply equally well to probabilistic computation, even though we’ve already seen that every randomized algorithm can be simulated by a similar-sized circuit, and in fact we conjecture that \(\mathbf{BPP}=\mathbf{P}\).

Moreover, this “obvious” approach for simulating a quantum computation will take not just exponential time but exponential space as well, while it is not hard to show that using a simple recursive formula one can calculate the final quantum state using polynomial space (in physics this is known as “Feynman path integrals”). So, the exponentially long vector description by itself does not imply that quantum computers are exponentially powerful. Indeed, we cannot prove that they are (i.e., as far as we know, every QNAND program could be simulated by a NAND program with polynomial overhead), but we do have some problems (integer factoring most prominently) for which they do provide exponential speedup over the currently best known classical (deterministic or probabilistic) algorithms.

Complexity classes

If \(F:\{0,1\}^n \rightarrow \{0,1\}\) is a finite function and \(s\in \N\) then we say that \(F\in QSIZE(s)\) if there exists a QNAND program \(P\) of at most \(s\) lines that comptues \(F\), in the sense that for every \(x\in \{0,1\}^n\), \(\Pr[ P(x)=F(x) ] \geq 2/3\).The number \(2/3\) is arbitrary. As in the case of \(\mathbf{BPP}\), we can amplify success probability of a quantum algorithm to our liking. Equivalently, \(F\in QSIZE(S)\) if there is a quantum circuit of at most \(s\) gates that computes it.Recall that we use circuits over the basis consisting of the Hadamard gate and the “reversible NAND” or “shifted Toffoli” gate \(abc \mapsto ab(c \oplus (1-ab))\). However, using any other universal basis only changes the number of gates by a constant factor. For an infinite function \(F:\{0,1\}^* \rightarrow \{0,1\}\), we say that \(F\in \mathbf{BQP_{/poly}}\) if there is some polynomial \(p:\N \rightarrow \N\) and a sequence \(\{ Q_n \}_{n\in \N}\) of QNAND programs such that for every \(n\in \N\), \(Q_n\) has less than \(p(n)\) lines and \(Q_n\) computes the restriction of \(F\) to inputs in \(\{0,1\}^n\). We can also define the class \(\mathbf{BQP}\) to be the uniform analog of \(\mathbf{BQP_{/poly}}\). It can be defined using QNAND++ programs, but also has the following equivalent definition: \(F:\{0,1\}^* \rightarrow \{0,1\}\) is in \(\mathbf{BQP}\) if there is polynomial-time (classical) NAND++ program \(P\) such that for every \(n\in \N\), \(P(1^n)\) is a string representing a QNAND program \(Q_n\) such that \(Q_n\) computes the restriction of \(F\) to inputs in \(\{0,1\}^n\).

Parsing the above definitions can take a bit of time, but they are ultimately not very deep. One way to verify that you’ve understood these definitions it to see that you can prove (1) \(\mathbf{P} \subseteq \mathbf{BQP}\) and in fact the stronger statement \(\mathbf{BPP} \subseteq \mathbf{BQP}\), (2) \(\mathbf{BQP} \subseteq \mathbf{EXP}\), and (3) For every \(\mathbf{NP}\)-complete function \(F\), if \(F\in \mathbf{BQP}\) then \(\mathbf{NP} \subseteq \mathbf{BQP}\).

The relation between \(\mathbf{NP}\) and \(\mathbf{BQP}\) is not known. It is believed that they are incomprable, in the sense that \(\mathbf{NP} \nsubseteq \mathbf{BQP}\) (and in particular no \(\mathbf{NP}\)-complete function belongs to \(\mathbf{BQP}\)) but also \(\mathbf{BQP} \nsubseteq \mathbf{NP}\) (and there are some interesting candidates for such problems).

It can be shown that \(QNANDEVAL\) (evluating a quantum circuit on an input) is computable by a polynomial size QNAND program, and moreover this program can even be generated uniformly and hence \(QNANDEVAL\) is in \(\mathbf{BQP}\). This allows us to “port” many of the results of classical computational complexity into the quantum realm as well.

Physically realizing quantum computation

To realize quantum computation one needs to create a system with \(n\) independent binary states (i.e., “qubits”), and be able to manipulate small subsets of two or three of these qubits to change their state. While by the way we defined operations above it might seem that one needs to be able to perform arbitrary unitary operations on these two or three qubits, it turns out that there several choices for universal sets - a small constant number of gates that generate all others. The biggest challenge is how to keep the system from being measured and collapsing to a single classical combination of states. This is sometimes known as the coherence time of the system. The threshold theorem says that there is some absolute constant level of errors \(\tau\) so that if errors are created at every gate at rate smaller than \(\tau\) then we can recover from those and perform arbitrary long computations. (Of course there are different ways to model the errors and so there are actually several threshold theorems corresponding to various noise models).

There have been several proposals to build quantum computers:The text below was written in early 2016 and likely needs to be updated.

These approaches are not mutually exclusive and it could be that ultimately quantum computers are built by combining all of them together. I am not at all an expert on this matter, but it seems that progress has been slow but steady and it is quite possible that we’ll see a 50 qubit computer sometime in the next 5 years.

Analysis of Bell’s Inequality

Now that we have the notation in place, we can show the strategy for showing “quantum telepathy”.

There is a 2-qubit quantum state \(s\in \R^4\) so that if Alice has access to the first qubit of \(s\), can manipulate and measure it and output \(a\in \{0,1\}\) and Bob has access to the second qubit of \(s\) and can manipulate and measure it and output \(b\in \{0,1\}\) then \(\Pr[ a \oplus b = x \wedge y ] \geq 0.8\).

The main idea is for Alice and Bob to first prepare a 2-qubit quantum system in the state (up to normalization) \(|00\rangle+|11\rangle\) (this is known as an EPR pair). Alice takes the first qubit in this system to her room, and Bob takes the qubit to his room. Now, when Alice receives \(x\) if \(x=0\) she does nothing and if \(x=1\) she applies the unitary map \(R_{\pi/8}\) to her qubit where \(R_\theta = \left( \begin{smallmatrix} cos \theta & \sin -\theta \\ \sin \theta & \cos \theta \end{smallmatrix} \right)\) is the unitary operation corresponding to rotation in the plane with angle \(\theta\). When Bob receives \(y\), if \(y=0\) he does nothing and if \(y=1\) he applies the unitary map \(R_{-\pi/8}\) to his qubit. Then each one of them measures their qubit and sends this as their response. Recall that to win the game Bob and Alice want their outputs to be more likely to differ if \(x=y=1\) and to be more likely to agree otherwise.

If \(x=y=0\) then the state does not change and Alice and Bob always output either both \(0\) or both \(1\), and hence in both case \(a\oplus b = x \wedge y\). If \(x=0\) and \(y=1\) then after Alice measures her bit, if she gets \(0\) then Bob’s state is equal to \(-\cos (\pi/8)|0\rangle-\sin(\pi/8)|1\rangle\) which will equal \(0\) with probability \(\cos^2 (\pi/8)\). The case that Alice gets \(1\), or that \(x=1\) and \(y=0\), is symmetric, and so in all the cases where \(x\neq y\) (and hence \(x \wedge y=0\)) the probability that \(a=b\) will be \(\cos^2(\pi/8) \geq 0.85\). For the case that \(x=1\) and \(y=1\), direct calculation via trigonomertic identities yields that all four options for \((a,b)\) are equally likely and hence in this case \(a=b\) with probability \(0.5\). The overall probability of winning the game is at least \(\tfrac{1}{4}\cdot 1 + \tfrac{1}{2}\cdot 0.85 + \tfrac{1}{4} \cdot 0.5 =0.8\).

It is instructive to understand what is it about quantum mechanics that enabled this gain in Bell’s Inequality. For this, consider the following analogous probabilistic strategy for Alice and Bob. They agree that each one of them output \(0\) if he or she get \(0\) as input and outputs \(1\) with probability \(p\) if they get \(1\) as input. In this case one can see that their success probability would be \(\tfrac{1}{4}\cdot 1 + \tfrac{1}{2}(1-p)+\tfrac{1}{4}[2p(1-p)]=0.75 -0.5p^2 \leq 0.75\). The quantum strategy we described above can be thought of as a variant of the probabilistic strategy for parameter \(p\) set to \(\sin^2 (\pi/8)=0.15\). But in the case \(x=y=1\), instead of disagreeing only with probability \(2p(1-p)=1/4\), because we can use these negative probabilities in the quantum world and rotate the state in opposite directions, and hence the probability of disagreement ends up being \(\sin^2 (\pi/4)=0.5\).

Shor’s Algorithm

Bell’s Inequality is powerful demonstration that there is something very strange going on with quantum mechanics. But could this “strangeness” be of any use to solve computational problems not directly related to quantum systems? A priori, one could guess the answer is no. In 1994 Peter Shor showed that one would be wrong:

Let \(FACT_n:\{0,1\}^n \rightarrow \{0,1\}^*\) be the function that on input a number \(M\) (represented in base two), outputs the prime factorization of \(M\). Then \(FACT_n\) can be computed by a QNAND program of \(O(n^3)\) lines.

This is an exponential improvement over the best known classical algorithms, which take roughly \(2^{\tilde{O}(n^{1/3})}\) time, where the \(\tilde{O}\) notation hides factors that are polylogarithmic in \(n\). In the rest of this section we will sketch some of the ideas behind Shor’s algorithm.

While we will define the concepts we use, some background in group or number theory might be quite helpful for fully understanding this section.

We will not use anything more than the basic properties of finite Abelian groups. Specifically we use the following notions: A finite group \(\mathbb{G}\) can be thought of as simply a set of elements and some binary operation \(\star\) on these elements (i.e., if \(g,h \in \mathbb{G}\) then \(g \star h\) is an element of \(\mathbb{G}\) as well). The operation satisfies the sort of properties that a product operation does. It is associative (i.e., \((g \star h)\star f = g \star (h \star f)\)) and there is some element \(1\) such that \(g \star 1 = g\) for all \(g\), where for every \(g\in \mathbb{G}\) there exists an element \(g^{-1}\) such that \(g \star g^{-1} = 1\). Moreover, we assume the group is Abelian or commutative, which means that \(g \star h = h \star g\) for all \(g,h \in \mathbb{G}\). We denote by \(g^2\) the element \(g\star g\), by \(g^3\) the element \(g \star g \star g\), and so on and so forth. The order of \(g\in \mathbb{G}\) is the smallest natural number \(a\) such that \(g^a = 1\). (It can be shown that such a number exists for every \(g\) in a finite group, and moreover that it is always smaller than the size of the group.)

Period finding

The heart of Shor’s algorithm is the following result:

For every (efficiently presented) Abelian group \(\mathbb{G}\), there is a quantum polynomial time algorithm that given a periodic function \(f:\mathbb{G} \rightarrow \{0,1\}^*\) finds a period of \(f\).

If \(\mathbb{G}\) is an Abelian group with operation \(\star\) and \(z\in \mathbb{G}\) is not the \(1\) element, then a function \(f:\mathbb{G} \rightarrow \{0,1\}^*\) is \(z\) periodic if \(f(x\star z)=f(x)\) for every \(x\in \mathbb{G}\). The algorithm of Reference:shorlem is “given” the group \(\mathbb{G}\) in the form of a classical algorithm (e.g., a NAND program) that computes the function \(\star : \{0,1\}^n \times \{0,1\}^n \rightarrow \{0,1\}^n\), where \(n\) is the number of bits that are required to represent an element of the group (which is logarithmic in the size of the group itself). Similarly, it is given the function \(f\) in the form of a NAND program computing it.

From order finding to factoring and discrete log

Using the function \(f(a)=g^a\) one can use period finding (for the group of \(\Z_{|\mathbb{G}|}= \{0,1,2,\ldots,|\mathbb{G}|-1\}\) with modular addition) to find the order of any element in a group \(\mathbb{G}\). We can then use order finding to both factor integers in polynomial time and solve the discrete logarithm over arbitrary Abelian groups. This shows that quantum computers will break not RSA, Diffie Hellman and Elliptic Curve Cryptography, which are by far the most widely deployed public key cryptosystems today. We merely sketch how one reduces the factoring and discrete logarithm problems to order finding: (see some of the sources above for the full details)

Finding periods of a function: Simon’s Algorithm

How do we find the period of a function? Let us consider the simplest case, where \(f\) is a function from \(\R\) to \(\R\) that is \(h^*\) periodic for some number \(h^*\), in the sense that \(f\) repeats itself on the intervals \([0,h^*]\), \([h^*,2h^*]\), \([2h^*,3h^*]\), etc.. How do we find this number \(h^*\)? A standard technique in finding the period of a function \(f\) is to transform \(f\) from the time to the frequency domain. That is, we use the Fourier transform to represent \(f\) as a sum of wave functions. In this representation, wavelengths that divide the period \(h^*\) would get significant mass, while wavelengths that don’t “cancel out”.

If \(f\) is a periodic function then when we represent it in the Fourier transform, we expect the coefficients corresponding to wavelengths that do not evenly divide the period to be very small, as they would tend to “cancel out”.

Similarly, the main idea behind Shor’s algorithm is to use a tool known as the quantum fourier transform that given a circuit computing the function \(f:\mathbb{H}\rightarrow\R\), creates a quantum state over roughly \(\log |\mathbb{H}|\) qubits (and hence dimension \(|\mathbb{H}|\)) that corresponds to the Fourier transform of \(f\). Hence when we measure this state, we get a group element \(h\) with probability proportional to the square of the corresponding Fourier coefficient. One can show that if \(f\) is \(h^*\)-periodic then we can recover \(h^*\) from this distribution.

Shor carried out this approach for the group \(\mathbb{H}=\Z^*_m\) for some \(m\),For a number \(m\in \N\), the group \(\Z^*_m\) is set \(\{ k \in [m]\;|\; gcd(k,m)=1 \}\) with the operation being multiplication modulo \(m\). It is known as the multiplicative group modulo \(m\). but we will show this for the group \(\mathbb{H} = \{0,1\}^n\) with the XOR operation. This case is known as Simon’s algorithm (given by Dan Simon in 1994) and actually preceded (and inspired) Shor’s algorithm:

If \(f:\{0,1\}^n\rightarrow\{0,1\}^*\) is polynomial time computable and satisfies the property that \(f(x)=f(y)\) iff \(x\oplus y = h^*\) then there exists a quantum polynomial-time algorithm that outputs a random \(h\in \{0,1\}^n\) such that \(\sum_{i=0}^{n-1}h_i h^*_i =0 \mod 2\).

Note that given \(O(n)\) such samples, we can recover \(h^*\) with high probability by solving the corresponding linear equations.

The idea behind the proof is that the Hadamard operation corresponds to the Fourier transform over the group \(\{0,1\}^n\) (with the XOR operations). We can use that to create quantum state over \(n\) qubits where the probability of obtaining some value \(h\) is proportional to the coefficient corresponding to \(h\) in the Fourier transform of (a real-valued function related to) \(f\). We can show that this coefficient will be zero if \(h\) is not orthogonal to the period \(h^*\) modulo \(2\), and hence when we measuer this state we will obtain some \(h\) satisfying \(\sum_{i=0}^{n-1}h_ih^*_i = 0 \mod 2\).

We can express the Hadamard operation \(HAD\) as follows:

\[ HAD|a\rangle = \tfrac{1}{\sqrt{2}}(|0\rangle+(-1)^a|1\rangle) \;.\]

Given the state \(|0^{n+m}\rangle\) we can apply this map to each one of the first \(n\) qubits to get the state \[2^{-n/2}\sum_{x\in\{0,1\}^n}|x\rangle|0^m\rangle \] and then we can apply the gates of \(f\) to map this to the state \[2^{-n/2}\sum_{x\in\{0,1\}^n}|x\rangle|f(x)\rangle \;.\]

Now suppose that we apply the \(HAD\) operation again to the first \(n\) qubits. We can see that we get the state \[2^{-n}\sum_{x\in\{0,1\}^n}\prod_{i=0}^{n-1}(|0\rangle+(-1)^{x_i}|1\rangle)|f(x)\rangle \;. \] We can use the distributive law and express a product of the form \(\varphi(x)= \prod_{i=0}^{n-1}(|0\rangle+(-1)^{x_i}|1\rangle)|f(x)\rangle\) as the sum of \(2^n\) terms, where each term corresponds to picking either \(|0\rangle\) or \((-1)^{x_i}|1\rangle\). Another way to say it is that this product \(\varphi(x)\) is equal to \[ \sum_{y\in \{0,1\}^n} \prod_{i=0}^{n-1}(-1)^{x_i y_i}|y_i\rangle |f(x)\rangle \;. \] Using the equality \((-1)^a(-1)^b = (-1)^{a+b}\) (and the fact that when raising \(-1\) to an integer, we only care if it’s odd or even) we get that \[ \varphi(x)=\sum_{y \in \{0,1\}^n} (-1)^{\sum_{i=0}^{n-1}x_iy_i \mod 2} |y\rangle |f(x) \rangle \] and therefore the overall state is equal to \[ 2^{-n}\sum_{x\in \{0,1\}^n} \varphi(x) = 2^{-n}\sum_{x\in\{0,1\}^n}\sum_{y\in\{0,1\}^n}(-1)^{{\sum_{i=0}^{n-1}x_iy_i \mod 2}}|y\rangle|f(x)\rangle \;. \] Now under our assumptions for every particular \(z\) in the image of \(f\), there exist exactly two preimages \(x\) and \(x\oplus h^*\) such that \(f(x)=f(x+h^*)=z\). So, if \(\sum_{i=0}^{n-1}h^*_iy_i =0 \mod 2\), we get that \(|(-1)^{\sum_{i=0}^{n-1}x_iy_i =0 \mod 2}+(-1)^{\sum_{i=0}^{n-1}(x_i+h^*_i)y_i =0 \mod 2}|=2\) (positive interference) and otherwise we get \(|(-1)^{\sum_{i=0}^{n-1}x_iy_i =0 \mod 2}+(-1)^{\sum_{i=0}^{n-1}(x_i+h^*_i)y_i =0 \mod 2}|=0\) (negative interference, i.e., cancellation). Therefore, if measure the end state then with probability one we the first \(n\) bits will be a string \(y\) such that \(\sum_{i=0}^{n-1}y_ih^*_i = 0 \mod 2\).

Simon’s algorithm seems to really use the special bit-wise structure of the group \(\{0,1\}^n\), so one could wonder if it has any relevance for the group \(\Z^*_m\) for some exponentially large \(m\). It turns out that the same insights that underlie the well known Fast Fourier Transform (FFT) algorithm can be used to essentially follow the same strategy for this group as well.

Lecture summary


Bibliographical notes

Chapters 9 and 10 in the book Quantum Computing Since Democritus give an informal but highly informative introduction to the topics of this lecture and much more. Shor’s and Simon’s algorithms are also covered in Chapter 10 of my book with Arora on computational complexity.

There are many excellent videos available online covering some of these materials. The Fourier transform is covered in this videos of Dr. Chris Geoscience, Clare Zhang and Vi Hart. More specifically to quantum computing, the videos of Umesh Vazirani on the Quantum Fourier Transform and Kelsey Houston-Edwards on Shor’s Algorithm are very recommended.

Chapter 10 in Avi Wigderson’s book gives a high level overview of quantum computing. Andrew Childs’ lecture notes on quantum algorithms, as well as the lecture notes of Umesh Vazirani, John Preskill, and John Watrous

Regarding quantum mechanics in general, this video illustrates the double slit experiment, this Scientific American video is a nice exposition of Bell’s Theorem. This talk and panel moderated by Brian Greene discusses some of the philosophical and technical issues around quantum mechanics and its so called “measurement problem”. The Feynmann lecture on the Fourier Transform and quantum mechanics in general are very much worth reading.

The Fast Fourier Transform, used as a component in Shor’s algorithm, is one of the most useful algorithms across many applications areas. The stories of its discovery by Gauss in trying to calculate asteroid orbits and rediscovery by Turkey during the cold war are fascinating as well.

Further explorations

Some topics related to this lecture that might be accessible to advanced students include: (to be completed)