You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 

551 lines
52 KiB

\todo[inline]{Mention that transition functions / matrices need to be effectively computable by a polynomial algorithm. Otherwise, the hard computations can be outsourced to the preparation of the circuit. Consider a decision problem. The decision could be computed in advance for each input and the transition matrix just writes a designated bit: $1$ for \texttt{ACCEPT} or $1$ for \text{REJECT}}
\section{A Simple Computational Model}
What are Qubits? That's usually the first question getting addressed in any introduction to quantum computing, for a good reason. If we want to construct a new computational model, we first need to define the most basic building block: a single \emph{bit} of information. In classical computer science, the decision on how to define this smallest building block of information seems quite straight forward. We just take the most basic logical fact: either something is \emph{true} or \emph{false}, either 1 or 0. We have a name for an object holding this information: a \textbf{Bit}. Let's envision a computational model based on logical gates. Such a gate has one or more inputs and an output, with each either being \emph{true} or \emph{false}. Now consider a bit $b$ and a gate $f : \{0, 1\} \to \{0, 1\}$. We have a \emph{bit} of information $b$ and can get another \emph{bit} of information $b' \coloneqq f(b)$. In a final third step, we introduce a timescale, which means that now our \emph{bit} of information is time dependent. It can have different values at different times. To make it easier, we choose a discrete timescale. Our Bit $b$ has a distinct value on each point on the timescale. A value of a bit can only be changed in between time steps, by applying a logical gate to it:
$$
\begin{matrix}
\text{Bit} & b &\stackrel{f_1}{\to} &b &\stackrel{f_2}{\to} &\cdots &\to &b &\stackrel{f_k}{\to} &b \\
\text{time} & t_0 &\to &t_1 &\to &\cdots &\to &t_{k-1} &\to &t_k \\
\end{matrix}
$$
Of course, we need more than one bit of information, if we want to be able to perform meaningful computations. For this, we simply look at a list, vector or register of bits $\mathbf{b} \in \{0,1\}^n$ and modify our gates to be functions $f: \{0,1\}^n \to \{0,1\}^n$ mapping from bit vectors to bit vectors.
Let's recap: We've now designed a computational model with just three components.
\begin{itemize}
\item A notion of Information: bits and registers.
\item A way of reasoning: logical gates.
\item A dimension to do the reasoning in: the timescale
\end{itemize}
Notice how the system described above is fully deterministic. The state $\mathbf{b}_l$ of our system at time $t_l$ recursively defined by:
$$
\mathbf{b}_l = \begin{cases}
f_l(\mathbf{b}_{l-1}) &\text{if} \quad l > 0 \\
\mathbf{b}_0 &\text{otherwise}
\end{cases}
$$
Or by the composition of all gate applications up to this point: $(f_l \circ f_{l-1} \circ \cdots \circ f_1)(\mathbf{b}_0)$. Actually, a composition of gates is also just another logical gate $F \coloneqq (f_l \circ f_{l-1} \circ \cdots \circ f_1) : \{0,1\}^n \to \{0,1\}^n$. If we are not interested in intermediate states, we can thus define our computation in the form of $\mathbf{b}_{\text{out}} \coloneqq F(\mathbf{b}_{\text{in}})`$, with $`F: \{0,1\}^n \to \{0,1\}^n$.
\section{A Bit of Randomness}
\label{sec:probabilistic_model}
\subsection{Single Bits in Superposition}
\label{sec:oneBitInSuperposition}
Many real world problems are believed to not be efficiently solvable on fully deterministic computers like the model described above (if $\mathbf{P} \neq \mathbf{NP}$). Fortunately, it turns out that if we allow for some randomness in our algorithms, we're often able to efficiently find solutions for such hard problems with sufficiently large success probabilities. Often times, the error probabilities can even be made exponentially small. For this reason, we also want to introduce randomness into our model. Algorithms or computational models harnessing the power of randomness are usually called \emph{probabilistic}.
Again, we start with simple one bit systems. Later, we'll see how to expand the following methods to full bit vectors/registers. In the deterministic single bit model above, the state transition of a bit $b$ in step $t$ is defined by $f_t(b) \in \{0,1\}$. Now, the transition function (or gate) is simply allowed to flip an unfair coin and either output 0 or 1 for heads or tails respectively. Of course, the state of $b$ prior to the transition should have an effect on the computation. That is, why we allow different (unfair) coins for either $b = 0$ or $b = 1$. To distinguish between deterministic and probabilistic transition functions, we will denote the latter by $\ptrans(b) \in \{0,1\}$. Or to reformulate this idea: Depending on the value of $b$, the output of $\ptrans(b)$ follows one of two Bernoulli trials. There are 4 possible transitions with probabilities $p_{00}$, $p_{01}$, $p_{10}$ and $p_{11}$, where $p_{ij}$ is the probability of $b$ transitioning form $i$ to $j$. Obviously, $\sum_j p_{ij} = 1$ always needs to be satisfied.
$$
\begin{aligned}
p_{00} \coloneqq P(\ptrans(b) = 0 \:|\: b = 0) \\
p_{01} \coloneqq P(\ptrans(b) = 1 \:|\: b = 0) \\
p_{10} \coloneqq P(\ptrans(b) = 0 \:|\: b = 1) \\
p_{11} \coloneqq P(\ptrans(b) = 1 \:|\: b = 1) \\
\end{aligned}
$$
Note that we regain our deterministic transition function $f$ from $\ptrans$, if we restrict the probabilities: $p_{00}, p_{10} \in \{0,1\}$. At this point, we can randomize our computation from above as follows:
$$
\begin{matrix}
\text{Bit} & b &\stackrel{\ptrans_1}{\to} &b &\stackrel{\ptrans_2}{\to} &\cdots &\to &b &\stackrel{\ptrans_k}{\to} &b \\
\text{time} & t_0 &\to &t_1 &\to &\cdots &\to &t_{k-1} &\to &t_k \\
\end{matrix}
$$
Let's have a look at the state of $b$ after the first transition. In the deterministic model, we know with certainty that at this point in time, $b$ will have the value $f_1(b)$. In a probabilistic model, we can not predict the value of $b$ at time $t_1$ with 100\% certainty. In the terminology of probability theory, a probabilistic state transition or even the whole computation would be an \emph{experiment} and the value of bit $b$ at time $t$ would be described by a \emph{random variable} $X_t$. Random variables are defined to take a value out of a set of predefined value options $\Omega = \{\omega_1, \dots, \omega_n\}$ with certain probabilities $p_1,\dots,p_n$ for each value. Only after we perform the experiment and \emph{observe} its outcome, we get a specific value $x_t$ of the random variable $X_t$. We say that $x_t$ is a \emph{random sample} or realization of $X_t$. If we don't want to or can't sample (perform) the experiment, we still could compute the \emph{expected value} $E(X_t) = \sum_i p_i\omega_i$ (if $\Omega$ mathematically allows for such operations).
Let's return to our example: Just as in the deterministic case we would like to predict the state of $b$ after the transition $\ptrans_t$. For this we want to calculate the expected state of b at time $t$. Let $p^t_{ij}$ be the transition probabilities of $\ptrans_t$, furthermore $p^t_{b=x}$ denotes the probability of $b$ being in state $x$ at time $t$. Now we have:
\begin{gather}
E\parens*{\ptrans_t(b)} = p^t_{b=0} \cdot \mathbf{0} + p^t_{p=1} \cdot \mathbf{1} \label{eq:exp_state_single_bit}\\
p^t_{b=x} = \begin{cases}
p^t_{0x} \cdot p^{t-1}_{b=0} + p^t_{1x} \cdot p^{t-1}_{b=1} & ,t > 0 \\
0, 1 & \text{otherwise}
\end{cases}
\end{gather}
It is important to note, that $\mathbf{0}$ and $\mathbf{1}$ in \cref{eq:exp_state_single_bit} are not the scalar values of $b$. They define abstract objects denoting the fact that $b$ is in state $0$ or $1$, so they are just arbitrary labels. For instance, same states could also be labeled $\{\mathbf{T}, \mathbf{F}\}$ or $\{\top, \bot\}$. But if $\mathbf{0}$ and $\mathbf{1}$ are some kind of abstract object and not scalar value, how can \cref{eq:exp_state_single_bit} be evaluated? As of now it can't. Later we will define representations of these abstract stats, which are closed under addition and scalar multiplication, making \cref{eq:exp_state_single_bit} also (a representation of) an abstract state.
From \cref{eq:exp_state_single_bit}, we will now derive a standard form of our random bit $b$. We don't view $b$ as being either in state $\mathbf{0}$ OR $\mathbf{1}$ anymore. From now on, we think of $b$ as being in $\mathbf{0}$ AND $\mathbf{1}$ simultaneously with certain probabilities $p_{b=0}$ and $p_{b=1}$, The one bit system $b$ is in a \emph{superposition} of two \emph{basis states} $\mathbf{0}$ and $\mathbf{1}$:
$$
b = p_0 \mathbf{0} + p_1 \mathbf{1} \quad , p_0 + p_1 = 1
$$
Until now, we have not given an explicit definition of the transition function $\ptrans$, apart from describing its effect. This is partly the case because we were lacking a formalism to describe uncertain states, so there was no direct way to describe the output of $\ptrans\parens{b}$. The other big problem would have been the question of how to handle an uncertain input state. Building on the superposition formalism $\ptrans\parens*{b}$ can be defined as a linear function:
\begin{align*}
\ptrans(b) &= \ptrans\parens*{p_0 \mathbf{0} + p_1 \mathbf{1}} \\
&= p_0\ptrans(\mathbf{0}) + p_1\ptrans(\mathbf{1}) \\
&= p_0\parens*{p_{00}\mathbf{0} + p_{01}\mathbf{1}} + p_1\parens*{p_{10}\mathbf{0} + p_{11}\mathbf{1}} \\
&= \underbrace{\parens*{p_0 p_{00} + p_1 p_{10}}}_{\eqqcolon p'_0}\mathbf{0} +
\underbrace{\parens*{p_0 p_{01} + p_1 p_{11}}}_{\eqqcolon p'_1}\mathbf{1} \\
\end{align*}
A simple calculation verifies that
\begin{align*}
p'_0 + p'_1 &= \parens*{p_0 p_{00} + p_1 p_{10}} + \parens*{p_0 p_{01} + p_1 p_{11}} \\
&= p_0\underbrace{\parens*{p_{00} + p_{01}}}_{= 1} + p_1\underbrace{\parens*{p_{10} + p_{11}}}_{= 1} = p_0 + p_1 = 1
\end{align*}
and thus $\ptrans$ preserves valid superpositions, which finally makes predictions of the full computation through all steps possible. In line with the fully deterministic model the state of $b$ at time $t$ can be described by:
\begin{equation}
\label{eq:deterministic_register_at_time_t}
\begin{aligned}
b_t &= \begin{cases}
\ptrans_t\parens*{b_{t-1}} &\text{if} \quad t > 0 \\
b_0 \in \{\mathbf{0}, \mathbf{1}\} &\text{otherwise} \\
\end{cases} \\
&= \parens*{\ptrans_t \circ \ptrans_{t-1} \circ \cdots \circ \ptrans_1}(b_0)
\end{aligned}
\end{equation}
\subsection{Collapsing Superpositions}
\label{sec:superposition}
Extending this formalism to bit registers is actually fairly straight forward. Systems can be in superposition of arbitrary many basis states. But first, it is time to talk a bit more about the concept of superposition.
\begin{definition}[Superposition of Probabilities]
If $\mathbf{E} \coloneqq \parensc*{E_1, E_2, \dots, E_n}$ is the set of all possible outcomes of an experiment, then a superposition of probable outcomes is defined by:
\begin{equation}
E \coloneqq \sum_{i=1}^n p_i E_i \quad \text{with}\:\: p_i = P\parens*{E_i} \:\text{and}\:\: \sum_{i=1}^n p_i = 1
\end{equation}
The states (outcomes) in $\mathbf{E}$ are called basis states (outcomes).
\end{definition}
As mentioned above, a superposition can not immediately be evaluated. It rather should be seen as a mathematical object holding incomplete knowledge about a certain property of some (stochastic) process, described by a random distribution $(p_i)_{i=1}^n$. Too actually evaluate a superposition, the missing information needs to be filled in by some kind of extra process e.g. performing an experiment, measuring an observable. After this extra information is filled in the property under consideration is fully known and the superposition \emph{collapses} to one of the actually realizable outcomes in $\mathbf{E}$. In this model a system can be in an uncertain state which only can be made concrete by some external influence like measuring an observable. This sounds quite abstract and especially the fact that a measurement could alter the state of a real physical system seems quite counterintuitive, but we will later see that this principle is actually grounded in reality.
Let's consider the experiment of rolling a dice. Of course, for the observable \emph{number of eyes} the expected outcomes are $\mathbf{E} = \parensc{1, 2, \dots, 6}$. While the dice is still in the cup and in the state of being shaken number of eyes can not be reasonably determined, even if a transparent cup is being used. The dice is in a superposition $E = \sum_{i=1}^6 \frac{1}{6} \mathbf{i}$ of showing all numbers of eyes 1 to 6 with uniform probability $\frac{1}{6}$. In order to determine the number of eyes thrown, the dice needs to rest on a solid base, such that one side is evidently showing up. So by \emph{throwing the dice} we interfere with the system by stopping to shake the cup and placing the dice on a solid base (table). With the dice now laying on the table it is clearly showing only one number of eyes. The superposition collapsed!
\begin{definition}[Collapse of Superposition]
A state in superposition of basis states $\mathbf{E} = \parensc*{E_1, E_2, \dots, E_n}$ can be evaluated by collapsing it on one of its basis states. This is done by a measuring operator
\begin{equation}
M_{\mathbf{E}}\parens*{\sum_{i=1}^n p_i E_i} \coloneqq E_i \quad\: \text{with probability}\:\: p_i
\end{equation}
\end{definition}
\begin{remark}
The basis states are not unique. To see this, consider the experiment of rolling a dice. If the observable is \emph{the number of eyes} we have the basis states $\mathbf{E}_{\text{eye}} = \parensc*{\mathbf{i}}_{i=1}^6$. On the other hand, if the measurement is only supposed to distinguish between \emph{even or odd} numbers of eyes we have $\mathbf{E}_{\text{parity}} = \parensc*{\text{even}, \text{odd}}$. The corresponding measuring operators are $M_{\mathbf{E_{\text{eye}}}}$ and $M_{\mathbf{E_{\text{parity}}}}$.
\end{remark}
\subsection{Bit Registers in Superposition}
Extending the probabilistic one-bit model from \cref{sec:oneBitInSuperposition} to bit registers is almost trivial given the definitions from \cref{sec:superposition}. A $n$-bit register can be in $N = 2^n$ possible states, giving rise to a superposition of $N$ basis states for probabilistic register states.
\begin{definition}
\label{def:nbitRegister}
The state of a $n$-bit register in a probabilistic computation is defined by a superposition of all possible basis states $\mathbf{B} = \parensc*{\mathbf{0}, \mathbf{1}}^n = \parensc*{\mathbf{0}, \mathbf{1}, \dots, \mathbf{N-1}}$.
\begin{equation}
\mathbf{b} \coloneqq \sum_{i=0}^{N-1} p_i \cdot \mathbf{i} \quad \:\text{with}\:\: P\parens*{\mathbf{b} = \mathbf{i}} = p_i
\end{equation}
\end{definition}
\begin{remark}
It should be noted that the number representation $\parensc*{\mathbf{i}}_{i=0}^{N-1}$ is defined as the bit string $\{\mathbf{0}, \mathbf{1}\}^n$ in a base of 10. So it is just a shorter label for the state of a $n$-bit register and NOT a scalar value.
\end{remark}
Similar to \cref{sec:oneBitInSuperposition} the transition function $\ptrans$ can be defined on its effect on basis states. For each transition the probabilities of transitioning from basis state $\mathbf{i}$ to basis state $\mathbf{j}$ must be defined. The mapping between states in superposition will then be defined linearly.
\begin{definition}
\label{def:probabilisticTransitionFunction}
Let $\mathbf{b} = \sum_{i=0}^{N-1} p_i \mathbf{i}$ be a $n$-bit register as defined in \cref{def:nbitRegister} and let $p_{\mathbf{ij}}$ be the probability of transitioning form basis state $\mathbf{i}$ to basis state $\mathbf{j}$, then the transition function is defined by:
\begin{equation}
\label{eq:ptrans_on_register}
\ptrans\parens*{\mathbf{b}} \coloneqq \sum_{i=0}^{N-1} p_i \ptrans(\mathbf{i}) = \sum_{i=0}^{N-1} \sum_{j=0}^{N-1} p_i p_{ij} \mathbf{j}
\end{equation}
\end{definition}
\begin{theorem}
\label{thm:superpositionsClosedUnderProbabilisticTransition}
A transition function as defined by \cref{def:probabilisticTransitionFunction} maps superposition to valid superpositions.
\end{theorem}
\begin{proof}
Let $\ptrans$ be a probabilistic transition function and let $\mathbf{b}$ a register state in superposition. By \cref{def:probabilisticTransitionFunction} we get $\ptrans(\mathbf{b}) = \sum_{i=0}^{N-1} \sum_{j=0}^{N-1} p_i p_{ij} \mathbf{j}$ a simple reordering leads to
$$
\ptrans(\mathbf{b}) = \sum_{j=0}^{N-1} \parens*{\sum_{i=0}^{N-1} p_i p_{ij}} \mathbf{j}
$$
Obviously, $p_i p_{ij} = P\parens{\mathbf{b} = \mathbf{i}} P\parens{\ptrans(\mathbf{b}) = \mathbf{j} \:|\: \mathbf{b} = \mathbf{i}}$. It follows directly from the law of total probability that $\sum_{j=0}^{N-1}\sum_{i=0}^{N-1} p_i p_{ij} = \sum_{j=0}^{N-1} P\parens{\ptrans(\mathbf{b}) = \mathbf{j}} = 1$
\end{proof}
A direct consequence of \cref{thm:superpositionsClosedUnderProbabilisticTransition} is that the space of probabilistic transition functions is also closed under composition. In accordance to \cref{eq:deterministic_register_at_time_t} the state of a register $\mathbf{b}$ in a probabilistic computation at time $t$ can be described by:
\begin{equation}\begin{aligned}
\mathbf{b}_t &= \begin{cases}
\ptrans_t\parens*{\mathbf{b}_{t-1}} &\text{if}\:\: t > 0 \\
\mathbf{b}_0 \in \parensc*{\mathbf{0}, \mathbf{1}}^N &\text{otherwise}
\end{cases} \\
&= \parens*{\ptrans_t \circ \ptrans_{t-1} \circ \cdots \circ \ptrans_1}(\mathbf{b}_0)
\end{aligned}\end{equation}
\section{Introducing: Linear Algebra}
\label{sec:stocastic_matrix_model}
The definitions of \cref{sec:probabilistic_model} fully describe a probabilistic computational model. Unfortunately, working with them can be quite cumbersome. This section will introduce an algebraic apparatus based on the definitions from above, with many helpful tools to describe computations and state evolutions. As some terminology and especially the linear properties of \cref{def:probabilisticTransitionFunction} already suggest the mathematical framework of choice will be linear algebra. Let's start by assessing the components of the model described above. We have:
\begin{itemize}
\item States (in superposition)
\item State transitions
\item Measurements (collapse of superposition)
\end{itemize}
As it turns out, all three components and their interactions can be expressed in the language of linear algebra. Readers familiar with that field of mathematics probably already noticed that $\ptrans$ is a linear function and the space of states in superposition looks a lot like a vector space.
\subsection{The State Space}
The defining property of a superposition is the probability distribution of its basis states. Given an enumeration all basis states the superposition is fully defined by the list of probabilities $\parens{p_0, p_1, \dots, p_{N-1}}$.
\begin{definition}[State Spaces of Probabilistic Computations]
\label{def:state_space_probabilistic_computation}
Given a state basis $\mathbf{B}$ of a $n$-bit register, the state space of probabilistic computations on this register is defined as:
\begin{equation*}
\mathbf{B}^n \coloneqq \parensc*{\mathbf{b} = \sum_{i=0}^{N-1} p_i \mathbf{i} \:\middle|\: p_i \in \R_+ \:,\: \sum_{i=0}^{N-1} p_i = 1}
\end{equation*}
\end{definition}
\begin{definition}
The coordinate map is a linear function $\Psi_{\mathbf{B}} : \mathbf{B}^{n} \to \R^N$ mapping the state space to $\R^N$:
\begin{equation*}
\forall \mathbf{b} \in \mathbf{B}^n :\quad \Psi_{\mathbf{B}}\parens{\mathbf{b}} = \parens{p_0, p_1, \dots, p_{N-1}}^T = \sum_{i=1}^N p_i \mathbf{e}_i
\end{equation*}
Often $\Psi_{\mathbf{B}}(\mathbf{b})$ is called the coordinate vector of $\mathbf{b}$ with respect to the basis $\mathbf{B}$.
\end{definition}
\begin{lemma}
\label{thm:state_space_unit_sphere_surface_isomorphism}
The state space of probabilistic computations is isomorphic to the surface of the unit sphere in the first quadrant of $\R^N$.
\begin{equation*}
\mathbf{B}^n \cong \parensc*{\mathbf{v} \in \R_+^N \:\middle|\: \norm{\mathbf{v}} = 1}
\end{equation*}
\end{lemma}
\begin{proof}
For an arbitrary state $\mathbf{b}$ the coordinate vector $\Psi_{\mathbf{B}}(b) = \mathbf{v}$ is the direction of a ray in the first quadrant of $\R^N$ starting from the origin. Rescaling $\mathbf{v}$ results in the point where this ray intersects the unit sphere $\varphi(\mathbf{v}) = \frac{\mathbf{v}}{\norm{\mathbf{v}}} = \mathbf{v}'$ which can be inverted by $\varphi^{-1}(\mathbf{v}') = \frac{\mathbf{v}'}{\norm{\mathbf{v}'}_1} = \mathbf{v}$. Thus, $\varphi \circ \varphi^{-1} = \varphi^{-1} \circ \varphi = \text{id}$ and
$$
\Psi_{\mathbf{B}}\parens*{\mathbf{B}^n} = \parensc{\mathbf{v} \in \R_+^N \:|\: \norm{\mathbf{v}}_1 = 1}\cong \parensc{\mathbf{v} \in \R_+^N \:|\: \norm{\mathbf{v}} = 1}
$$
\end{proof}
\subsection{Transition Matrices}
It follows directly from \cref{eq:exp_state_single_bit} that $\ptrans : \spanspace\parens{\mathbf{B}} \to \spanspace\parens{\mathbf{B}}$ is a linear transformation on the space spanned by state basis $\mathbf{B}$ and \cref{thm:superpositionsClosedUnderProbabilisticTransition} even states that $\ptrans : \mathbf{B}^n \to \mathbf{B}^n$ and $\mathbf{B}^n$ is closed under $\ptrans$. It is well known, that the space of all linear maps $\Hom_{\R}\parens{V,W}$ between two finite-dimensional real vector spaces $V$ and $W$ is isomorphic to $\R^{\parens{\dim(W), \dim(V)}}$. So, there must exist an isomorphism between transition functions $\ptrans$ and $\R^{\parens{N,N}}$.
\begin{theorem}%[see \cite{Knabner}]
\label{thm:probabilistic_matrix_representation}
Let $\mathbf{B} = \parensc{\mathbf{b}_i}_{i=1}^N$ be a $n$-bit state basis and $\mathscr{B} = \parensc{\mathbf{v}_j}_{j=1}^N$ a basis of $\R^N$, then there exists a matrix $A = (a_{ij}) \in \R^{\parens{N,N}}$ such that
\begin{itemize}
\item $\forall \mathbf{b}_i \in \mathbf{B} :\quad \ptrans(\mathbf{b}_i) = \sum_{j=1}^N a_{ji} \mathbf{v}_j$
\item $\ptrans\parens*{\sum_{i=1}^N x_i \mathbf{b}_i} = \sum_{j=1}^N y_j \mathbf{v}_j \iff A\parens{x_1, x_2, \dots, x_N}^T = \parens{y_1, y_2, \dots, y_N}^T$
\end{itemize}
\end{theorem}
\begin{remark}
Usually it is custom to choose the standard basis $\parensc{\mathbf{e}_i}_{i=1}^N$ for $\R^N$, then \cref{thm:probabilistic_matrix_representation} describes how $A$ can be used to describe how $\ptrans$ affects basis states in coordinate space. The $j$-th column vector $A\mathbf{e}_j = \mathbf{a}^j = \parens{a_{1j}, a_{2j}, \dots, a_{Nj}}^T$ represents the probability distribution of $\ptrans(\mathbf{b_j})$. It follows that $a_{ij} = p_{ji}$, with $p_{ji}$ being the probability of transitioning from $\mathbf{b}_j$ to $\mathbf{b}_i$. Consequently, $A = P^T$ with $P = (p_{ij})$.
\end{remark}
\subsection{Measurements}
\label{sec:measurements_probabilistic}
The final component that sill needs to be expressed in the framework of linear algebra are measurements. Let's go back and think about what measuring actually means in our case. The computational model described in \cref{sec:probabilistic_model} provides a macroscopic view of randomized computations. The result of such a randomized computation will be a random state. Usually it is only of interest if a computation outputs a desired state given a specific input, which entails the correctness of said computation. For randomized computations, such an analysis requires the final random state distribution. Superposition states are exactly that. Asking "How likely is it to end up in state $\mathbf{k}$?" corresponds to measuring the quotient of $\mathbf{k}$ in the final superposition $\mathbf{b}$. In the framework of linear algebra this means calculating the scalar product $\parens{\mathbf{k} \:.\: \mathbf{b}}$.
\begin{definition}
\label{def:measurment_operator_probabilistic}
Let $\mathbf{b} \coloneqq \sum_{i=1}^N p_i \mathbf{b}_i \in \mathbf{B}^n$ with basis states $\parensc*{\mathbf{b}_i}_{i=1}^N$, then there exist $N$ operators $\hat{M}_k : \mathbf{B}^n \to [0,1]$
\begin{itemize}
\item in state space: $\hat{M}_k\parens{\mathbf{b}} = \parens*{\mathbf{b}_k \:.\: \mathbf{b}}= p_k$
\item in coordinate space: $M_k = \mathbf{b}_k^t \in \R^{(1, N)}, \quad M_k \mathbf{b} = \mathbf{b}_k^t \mathbf{b}$ With $\mathbf{b}^t$ being the transposed vector of $\mathbf{b} \in \R^N$
\end{itemize}
\end{definition}
\subsection{Tensor Product}
\contentsketch{motivate, combining (state) vector spaces, stochastic matrices closed under kronecker product}
\section{Making it Quantum}
\Cref{sec:stocastic_matrix_model} formulates mathematical tools to algebraically describe an abstract model of probabilistic computations defined in \cref{sec:probabilistic_model}. This section takes a reverse approach. The tools developed in \cref{sec:stocastic_matrix_model} are based on stochastic matrices, which is an obvious choice to model probabilistic state transitions. Unfortunately this model has some shortcomings. This section first highlights these inconveniences and then fixes them. By doing so the model will gain in computational power, demonstrated by the implementation of Deutsch's algorithm. Finally, it will be shown that this extended model is indeed physically realizable.
\subsection{Cleaning Up}
The straight forward and rather simplistic choice of using probability coefficients in \cref{def:nbitRegister,def:state_space_probabilistic_computation} results in quite unwieldy state objects especially in the linear algebra representation. Of course, the probability mass of a complete sample space must always sum up to 1, demanding the normalization of state vectors by the $\norm{.}_1$ norm. The state space $\mathbf{B}^n$ defined in this way is an affine combination of its basis vectors. For an 1-bit system this corresponds to the line segment from $\mathbf{0}$ to $\mathbf{1}$ (see \cref{fig:affine_comb}). As \cref{thm:state_space_unit_sphere_surface_isomorphism} already suggest, randomized computations could be viewed as rotating a ray around the origin. If computations essentially are rotations, then angles between state vectors seem somewhat important. Of course with $\mathbf{a}, \mathbf{b} \in \mathbf{B}^n$ it would be possible to calculate the angle between both sates by rescaling their dot product by their lengths $\mathbf{0}^t \mathbf{1} \parens{\abs{\mathbf{a}} \abs{\mathbf{b}}}^{-1}$. State vectors with unit length would greatly simplify angle calculations. Then, the dot product would suffice. Fortunately, \cref{thm:state_space_unit_sphere_surface_isomorphism} states that $\mathbf{B}^n$ is isomorphic to a subset of the surface of the unit sphere. Therefore, it should also be possible to represent the state space as vectors with unit length. To distinguish between both representation we will write state vectors with coordinates on the unit sphere as $\ket{b}$. This notation is the standard notation of quantum states. For now, $\bra{b}$ will be the transposed vector of $\ket{b}$ and $\braket{b_1}{b_2} = \bra{b_1}\,\ket{b_2}$ is the dot product of $\ket{b_1}$ and $\ket{b_2}$. By definition the length of $\ket{b} = \sum_{i=1}^N \alpha_i \ket{b_i}$ is 1. The linear coefficients $\alpha_i$ are not probabilities but so-called probability amplitudes and the Pythagorean theorem states that $1 = \sum_{i=1}^N \alpha_i^2$. This means squaring the amplitudes or taking the square root of probabilities maps between affine combinations of basis vectors and points on the unit sphere in the state space. As it turns out negative amplitudes must be allowed, thus this mapping is ambiguous and NOT an isomorphism. Each point on the unit sphere describes exactly one possible state of a probabilistic computation. This means moving on the $2^n$-dimensional unit sphere can be viewed as a kind of computation on a state space $\mathcal{B}_{\R}^n$
\begin{definition}[Real State Space]
Given a mapping $\pi : \mathscr{B} \to \parens{0,1}^n$ of each configuration of a $n$-bit register to an orthonormal basis $\mathscr{B}$ of $\R^N$ with $N = 2^n$, points on the $N$-dimensional unit sphere form a computational state space:
\begin{equation*}
\mathcal{B}_{\R}^n \coloneqq \parensc*{\ket{b} = \sum_{i=1}^N a_i \mathbf{b}_i \in \R^N \:\middle|\: \norm{\ket{b}} = 1,\: \mathbf{b}_i \in \mathscr{B}}
\end{equation*}
with
\begin{equation*}
a_i^2 = P(\pi(\mathbf{b}_i))
\end{equation*}
\end{definition}
\begin{figure}
\centering
\begin{subfigure}[][][c]{0.4\linewidth}
\includestandalone[width=\textwidth]{figures/affine_state_space}
\caption{todo}
\label{fig:affine_comb}
\end{subfigure}
\hfill
\begin{subfigure}[][][c]{0.5\linewidth}
\includestandalone[width=\textwidth]{figures/amplitudes_to_affine}
\caption{todo}
\label{fig:amplitudes_to_affine}
\end{subfigure}
\end{figure}
\subsection{Orthogonal Operators}
The set of operations mapping $\mathscr{B}_{\R}^n$ to $\mathscr{B}_{\R}^n$ is exactly the set of all rotations and rotoreflections, which together form the orthogonal group. Every element of the orthogonal group can be represented by an orthogonal matrix.
\begin{definition}
A matrix $A \in \R^n$ is orthogonal iff
\begin{equation*}
A^{-1} = A^t \quad \Leftrightarrow \quad AA^t = A^tA = \idmat
\end{equation*}
\end{definition}
\begin{remark}
It is important that orthogonal matrices from a group, because this means the composition of two orthogonal operators is orthogonal again. So, orthogonal computation can be composed or decomposed by or into other orthogonal computations. This is extremely useful for describing and developing algorithms in this model.
\end{remark}
\begin{definition}[Orthogonal Computation]
\label{def:orthogonal_computation}
A computation on the state space $\mathcal{B}_{\R}^n$ is defined by an orthogonal matrix $A \in \R^N$ with $N = 2^n$.
\end{definition}
What does it mean if a matrix is orthogonal? Let $A = (a_{ij}) = (\mathbf{a}_i,\dots,\mathbf{a}_n) \in \R^{(n,n)}$ be orthogonal with. Then, it follows directly from $AA^t = (b_{ij}) = \idmat$ that $b_{ij} = \mathbf{a}_i^t \mathbf{a}_j = \delta_{ij}$. Hence, the columns (and rows) of $A$ form an orthonormal basis of $\R^n$. It is also easy to check that $A$ preserves the dot product making it angle and length preserving. Another direct consequence of $AA^t = \idmat$ is the reversibility of orthogonal computations.
\subsection{Measurements}
The measurement operators of \cref{sec:measurements_probabilistic} obviously also need to be adjusted to the new state space and also suffered from some shortcomings. The dot product operators of \cref{def:measurment_operator_probabilistic} completely leave the state space when applied to a state vector. But the most important reason for why to redesign measurements are the newly used probably amplitudes. Just extracting an amplitude and squaring it would of course be a possible, but alien solution to the linear framework developed so far. This is because squaring is not linear. The desired goal is to design a linear operator nicely fitting in the framework at hand.
\begin{definition}[(Real) Measurement Operators]
\label{def:measurment_operator_orthogonal_state_space}
Let $\Omega$ be the set of possible outcomes of an experiment. Then the corresponding measurement is described by a set of linear operators $\parensc{M_m}_{m \in \Omega} \subset \R^{(N,N)}$ with:
\begin{itemize}
\item $P(m) = \ev{M_m^t M_m}{b}$
\item $\sum_m P(m) = \sum_m \ev{M_m^t M_m}{b} = 1 \quad\Leftrightarrow\quad \sum_m M_m^t M_m = \idmat$
\end{itemize}
and $\ket{b} \in \mathscr{B}_{\R}^n$
\end{definition}
\subsubsection{(Real) Projective Measurements}
One of the most important special cases of measurement operators are projective measurements. As the name already suggests, projective measurements are linear projections onto subspaces of $\mathcal{B}_{\R}^n$.
\todo[inline]{content}
\subsection{Interference - Computational Power}
So far, moving computations on affine combinations to points on the unit sphere had merely subjective and rather esoteric reasons to \emph{clean up} an abstract description of a computational model. In short: It's mathematically nicer to move around on the unit sphere. This section shows, that utilizing the power of probably amplitudes one actually gains computational power compared to the previous model.
\subsubsection{Reversing a Coin Flip}
%\begin{figure}
% \centering
% \includestandalone{figures/coin_flip_probabilistic}
%\end{figure}
In the classical probabilistic model a coin flip destroys any information stored in a bit, even in superposition states. It is easy to verify that $P_{\sfrac{1}{2}} = \frac{1}{2} \big(\begin{smallmatrix}1 & 1 \\ 1 & 1\end{smallmatrix}\big)$ indeed implements a 1-bit coin flip and satisfies the conditions of \cref{thm:probabilistic_matrix_representation}. Two consecutive coin flips are independent, which is illustrated by $P_{\sfrac{1}{2}} P_{\sfrac{1}{2}} = P_{\sfrac{1}{2}}$. The coin flip applied to an arbitrary superposition $\mathbf{b} = p_0 \mathbf{b}_0 + p_1 \mathbf{b}_1$ yields:
$$
P_{\sfrac{1}{2}} \mathbf{b} = p_0\frac{1}{2}(\mathbf{b}_0 + \mathbf{b}_1) + p_1 \frac{1}{2}(\mathbf{b}_0 + \mathbf{b}_1)
= (p_0 + p_1) \frac{1}{2}(\mathbf{b}_0 + \mathbf{b}_1) = \frac{1}{2}(\mathbf{b}_0 + \mathbf{b}_1)
$$
Given any input state $P_{\sfrac{1}{2}}$ reaches a fixed point after one iteration. With $P_{\sfrac{1}{2}}$ not being orthogonal a different operator is needed for the orthogonal model, which is the \emph{Hadamard} operator.
\begin{definition}[Hadamard Operator]
\label{def:hadamard_gate}
\begin{equation}
H = \frac{1}{\sqrt{2}}\begin{pmatrix}
1 & 1 \\
1 & -1 \\
\end{pmatrix}
\end{equation}
\end{definition}
The Hadamard operator is basically the same as $P_{\sfrac{1}{2}}$ but the global factor is adjusted for probability amplitudes and the matrix columns are made orthogonal by introducing a negative phase in the second column. Exactly this negative phase will have some surprisingly useful effects. It is easy to see that
$$
P\parens{H\ket{b} = \ket{0}} = P\parens{H\ket{b} = \ket{1}} = \bra{b}H^\dag M_{\ket{0}} H\ket{b} = \bra{b}H^\dag M_{\ket{1}} H\ket{b} = \frac{1}{2}
$$
for $\ket{b} \in \parensc{\ket{0}, \ket{1}}$, $M_{\ket{0}} = \ketbra{0}$ and $M_{\ket{1}} = \ketbra{1}$. So, $H$ does indeed implement a fair coin flip. But contrary to $P_{\sfrac{1}{2}}$ the Hadamard operator does not destroy the information stored in a superposition.
\begin{equation}
\label{eq:hadamard_on_superposition}
\begin{aligned}
H\parens*{a_0 \ket{0} + a_1 \ket{1}} &= \frac{a_0}{\sqrt{2}}\parens{\ket{0} + \ket{1}} + \frac{a_1}{\sqrt{2}}\parens{\ket{0} - \ket{1}} \\
&= \frac{1}{\sqrt{2}}\parens{(a_0 + a_1)\ket{0} + (a_0 - a_1)\ket{1}}
\end{aligned}
\end{equation}
Actually, applying $H$ a second time completely reverses the computation, this is the result of $HH = \idmat$ and might seem strange at first but is in fact a trivial consequence of orthogonal operators representing rotations and rotoreflections, which both are obviously reversible. Thus, the first $H$ can not have destroyed any information. It is interesting to look into how something like that happens. After the second $H$ application the state is:
\begin{multline}
\label{eq:2_hadamards_on_superposition}
\frac{1}{2}\parens{(a_0 + a_1)(\ket{0} + \ket{1}) + (a_0 - a_1)(\ket{0} - \ket{1})} \\
= \frac{1}{2}\parens{(a_0 + a_0 + a_1 - a_1)\ket{0} + (a_0 - a_0 + a_1 + a_1)\ket{1}} =
a_0 \ket{0} + a_1 \ket{1}
\end{multline}
The key observation can already be made in \cref{eq:hadamard_on_superposition}. Probability amplitudes can destructively interfere with each other. In \cref{eq:hadamard_on_superposition} this can be seen in the term $(a_0 - a_1)$ and in \cref{eq:2_hadamards_on_superposition} the amplitudes cancel each other out just perfectly to restore the original input state. It can't be mentioned enough: probability amplitudes are not probabilities. Destructive Interference is not possible with stochastic matrices from \cref{sec:stocastic_matrix_model} with all their entries being strictly positive. The next section shows how interference effects can be utilized effectively to outperform any probabilistic computation.
\subsubsection{Deutsch's Algorithm}
\label{sec:deutschs_algorithm}
Given a function $f : \parensc{0,1} \to \parensc{0,1}$, the problem at hand is to determine whether $f(0) \stackrel{?}{=} f(1)$. Obviously, deterministic and even probabilistic computations need to evaluate $f$ two times, once for each input, in order to answer this question. Surprisingly, orthogonal computations only need one call to $f$. But how is that possible?
First, function evaluation needs to be addressed in the context of orthogonal computations. The requirement of orthogonality requires all computations to be reversible. But what if $f$ is not injective e.g. $f(0) = f(1)$? Well, a simple trick solves this dilemma: Instead of evaluating $f$ directly, $f$ will be wrapped by an orthogonal operator $O_f$. Note that the $\XOR$ operator:
\begin{center}
\begin{tabular}{c c}
\begin{tikzcd}
\lstick{$\ket{x}$} & \ctrl{1} & \qw \rstick{$\ket{x}$} \\
\lstick{$\ket{y}$} & \targ{} & \qw \rstick{$\ket{y \oplus x}$}
\end{tikzcd} &
$
\hat{\XOR} = \begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0
\end{pmatrix}
$ \\
\end{tabular}
\end{center}
is reversible as $\XOR \ket{x, y \oplus x} = \ket{x, y \oplus x \oplus x} = \ket{x,y}$. From the matrix form it apparent that $O_f$ even is orthogonal. Similarly, it follows that $O_f \ket{x,y} \coloneqq \ket{x, y \oplus f(x)}$ is reversible. A closer look reveals that $O_f$ just permutes the basis states depending on $f$ and again it is easily verifiable that $O_f O_f = \idmat$, making $O_f$ orthogonal.
\begin{center}
\begin{tabular}{c c}
\begin{tikzcd}
\lstick{$\ket{x}$} & \gate[wires=2]{O_f} & \qw \rstick{$\ket{x}$} \\
\lstick{$\ket{y}$} & & \qw \rstick{$\ket{y \oplus f(x)}$}
\end{tikzcd}
&
$
O_f = \begin{pmatrix}
1-f(0) & f(0) & 0 & 0 \\
f(0) & 1-f(0) & 0 & 0 \\
0 & 0 & 1-f(1) & f(1) \\
0 & 0 & f(1) & 1-f(1) \\
\end{pmatrix}
$
\end{tabular}
\end{center}
So it is indeed possible to evaluate a function $f : \parensc{0, 1} \to \parensc{0, 1}$ in the orthogonal computational model. If the second qubit $\ket{y}$ is initialized as $\ket{0}$, measuring that qubit in the computational basis after the application of $O_f$ returns the value of $f(x)$. The trick that makes it possible to answer $f(0) \stackrel{?}{=} f(1)$ with one call to $O_f$ is to initialize both qubits in a perfect superposition of $\ket{0}$ and $\ket{1}$. Then, both values of $f(0)$ and $f(1)$ will interfere $\ket{y}$.
\begin{center}
\begin{quantikz}
\lstick{$\ket{0}$} & \gate{H}\slice{$\ket{z_1}$} & \gate[2]{O_f}\slice{$\ket{z_2}$} & \gate{H} & \meter{}\\
\lstick{$\ket{1}$} & \gate{H} & & \qw & \qw
\end{quantikz}
\end{center}
So, now it is time to examen this circuit in detail to understand it's inner workings. After the first Hadamard gates the system is in state
\begin{align*}
\ket{z_1} &= \frac{1}{2} \parens{(\ket{0} + \ket{1}) \otimes (\ket{0} - \ket{1})} \\
&= \frac{1}{2} \parens{\ket{00} - \ket{01} + \ket{10} - \ket{11}}
\end{align*}
applying $O_f$ changes the system state to
$$
\begin{array}{c c c}
O_f\ket{z_1} &= \frac{1}{2}\parens*{\begin{aligned}
&(1 - f(0) - f(0))\ket{00} + \\
&(f(0) - 1 + f(0))\ket{01} + \\
&(1 - f(1) - f(1))\ket{10} + \\
&(f(1) - 1 + f(1))\ket{11}
\end{aligned}}
&= \frac{1}{2}\parens*{\begin{aligned}
&(-1)^{f(0)}\ket{00} - \\
&(-1)^{f(0)}\ket{10} + \\
&(-1)^{f(1)}\ket{01} - \\
&(-1)^{f(1)}\ket{11} \\
\end{aligned}} \\
&\multicolumn{2}{l}{=\frac{1}{2}\parens*{\parens*{(-1)^{f(0)}\ket{0} + (-1)^{f(1)}\ket{1}} \otimes
\parens*{\ket{0} - \ket{1}}}} \\
&\multicolumn{2}{l}{=\ket{z_2}}
\end{array}
$$
That is a lot to digest, but what happened is: Both qubits interfered with each other and although $f$ got applied to the second qubit, the effect of this operation moved through the amplitudes to the first qubit. From now on the second qubit is not important anymore, so only the first one will be considered from now on. The first bit viewed as an isolated system is in state $\sfrac{1}{\sqrt{2}}\parens{(-1)^{f(0)}\ket{0} + (-1)^{f(1)}\ket{1}}$. Applying a Hadamard gate results in
\begin{align*}
&H\frac{1}{\sqrt{2}}\parens*{(-1)^{f(0)}\ket{0} + (-1)^{f(1)}\ket{1}} = \\
&\frac{1}{2}\parens*{(-1)^{f(0)}(\ket{0} + \ket{1}) + (-1)^{f(1)}(\ket{0} - \ket{1})} = \\
&\frac{1}{2}\parens*{(-1)^{f(0)}\ket{0} + (-1)^{f(0)}\ket{1} +(-1)^{f(1)}\ket{0} - (-1)^{f(1)}\ket{1}} = \\
&\frac{1}{2}\parens*{\parens*{(-1)^{f(0)} + (-1)^{f(1)}}\ket{0} + \parens*{(-1)^{f(0)} - (-1)^{f(1)}}\ket{1}} = \\
&\frac{1}{2}(-1)^{f(0)}\parens*{\parens*{1 + (-1)^{f(0) \oplus f(1)}}\ket{0} + \parens*{1 - (-1)^{f(0) \oplus f(1)}}\ket{1}}
\end{align*}
Now measuring the first bit returns $\ket{0}$ iff $f(0) = f(1)$ and otherwise $\ket{1}$.
Apparently interference of amplitudes is truly a remarkable feature. There is a key difference between probabilistic and orthogonal computations. In the probabilistic model the concept of computation superposition merely models the lack of knowledge about the true state of the system. If a system is in a probabilistic superposition of two states the true state is just unknown but the system is only in one of the two states at any time. The interference effects in orthogonal computations paint a different picture. In order to interfere with each other the two sates in a orthogonal superposition must somehow be present at the same time. The system has to be in both states, resulting a kind of inherent parallelism built into the model! In Deutsch's algorithm, the application of $O_f$ to a superposition does in fact not equal one call to $f$ but evaluating $f$ in parallel for both inputs.
\todo[inline]{rename qubits as the term is not yet defined}
\subsection{Entering the Real World: Quantum Computing}
The best theoretical model is of no value if it relies on some kind of magic that not be realized in the real world. Luckily if the real probability amplitudes of the orthogonal model are replaced by complex numbers, one ends up with a computational model which can perfectly described by quantum mechanics, thus making it a real world physical system, fittingly called \emph{quantum computing}. Of course, real probability amplitudes are a subset of complex probably amplitudes and likewise orthogonal computations are a subset of quantum computations. Only minor adjustments need to be made to the mathematical framework to handle complex amplitudes. The fundamental unit of information is a \emph{qubit} this special terminology is intended to highlight the special properties of quantum superpositions. Also, oftentimes greek leters are used in the quantum case.
\begin{definition}[Qubit]
\label{def:qubit}
A qubit is a quantum superposition with unit length of two orthonormal basis vectors $\ket{0}$ and $\ket{1}$:
\begin{equation}
\ket{\psi} = \alpha \ket{0} + \beta \ket{1}
\end{equation}
With the transposed complex conjugate of $\ket{\psi}$ denoted as:
\begin{equation}
\parens*{\ket{\psi}^*}^t = \ket{\psi}^\dagger = \bra{\psi} = \bra{0} \alpha^* + \bra{1} \beta^*
\end{equation}
Furthermore, every qubit satisfies the normalization criterion:
\begin{equation}
\braket{\psi} = \alpha^*\alpha + \beta^*\beta = \abs{\alpha}^2 + \abs{\beta}^2 = 1
\end{equation}
\end{definition}
\begin{remark}
In accordance with the standard in quantum computing literature the computational basis states $\ket{0}$ and $\ket{1}$ where chosen for in \cref{def:qubit}. However, similarly to orthogonal matrices, unitary matrices (their complex extension) also map any complex orthonormal basis to another complex orthonormal basis of the same dimension. Also, every bijective mapping of two complex orthonormal bases is a unitary operation. So, any qubit can be transformed to another basis by simply applying a unitary transformation to it, making the basis choice of a qubit irrelevant.
\end{remark}
\begin{definition}[Computational Basis]
\label{def:computational_basis}
The standard bais of quantum computing is the computational basis. The basis $2^n$ values of a $n$-bit register are mapped to an orthonormal basis $\mathcal{B} = \parensc*{\ket{i}}_{i=0}^{2^n - 1}$, with $i$ being the decimal value of the $n$-bit register. In coordinate space $\ket{i}$ is represented by $e_i \in \C^{2^n}$:
\begin{equation}
\ket{i} \cong e_i = \parens{\underbrace{0,\dots,0}_{i - 1},1,0,\dots,0}^t
\end{equation}
\end{definition}
\begin{definition}[Quantum State Space]
\label{def:quantum_state_space}
Spanning a complex vector space over an arbitrary set of $2^n$ many orthonormal base vectors we get the state space of a $n$-qubit system. The standard base of choice is the computational basis (\cref{def:computational_basis}), giving the standard $n$-qubit state space $\mathcal{B}$
\begin{equation*}
\mathcal{B}^n = \parensc*{\ket{\psi} = \sum_{i=0}^{2^n -1} \alpha_i \ket{i} \:\middle|\: \alpha_i \in \C ,\: \braket{\psi} = 1}
\end{equation*}
\end{definition}
\begin{definition}[Unitary Operator]
\label{def:unitary_operator}
Unitary operators and matrices are the complex extensions to orthogonal operators and matrices. An operator $\Phi$ is unitary if its matrix representation $U_\Phi$ is a unitary matrix, which is the case iff
\begin{equation*}
U_\Phi^{-1} = U_\Phi^\dagger \quad \Leftrightarrow \quad U_\Phi U_\Phi^\dagger = U_\Phi^\dagger U_\Phi = \idmat
\end{equation*}
with $U_\Phi^\dagger$ being the transposed complex conjugate of $U_\Phi$, defined as
\begin{equation*}
\forall n \in \N,\: A = (a_{ij}) \in \C^n \::\quad A^\dagger = (a_{ji}^*)
\end{equation*}
\end{definition}
\begin{definition}[Unitary Computation]
\label{def:unitary_computation}
In accordance with \cref{def:orthogonal_computation}, a computation on the state space $\mathcal{B}^n$ is defined by a unitary matrix $U \in \C^{2^n}$.
\end{definition}
\begin{definition}[Measurement Operators]
\label{def:measurment_quantum}
The definition of measurements on quantum state spaces follow the definition of measurements on orthogonal state spaces (\cref{def:measurment_operator_orthogonal_state_space}), but with states $\ket{\psi} \in \mathcal{B}^n$ and complex operators $\parensc*{M_m}_{m \in \Omega} \subset \C^{(N,N)}$.
\end{definition}
This wraps up the framework of quantum computing. Fortunately, \cref{def:quantum_state_space,def:unitary_operator,def:measurment_quantum} correspond to the fundamental postulates of quantum mechanics, meaning that quantum computing with all its seemingly strange properties is in fact a physically realizable computational model!
\subsection{Quantum Circuits}
\label{sec:quantum_circuits}
In theory, the framework introduced above should be enough to do quantum computation. All that needs to be done in order to develop a new algorithm is to come up with its unitary matrix. At second glance, however, one notices quickly the impracticality of this approach. It implies finding a $2^n \times 2^n$ complex matrix with all required and desired properties needed to solve a $n$-qubit problem. The standard praxis in quantum algorithm design is to formulate the computation as a quantum circuit. The classical pendant to this is the relationship between boolean functions $f : \parensc{0,1}^n \to \parensc{0,1}^m$ and boolean circuits. Where $f$ is realized by a circuit of cascading boolean primitive gates. A similar situation arises in quantum computing, already illustrated by the description of Deutsch's algorithm in \cref{sec:deutschs_algorithm}. Here the complete algorithm is composed of just Hadamard gates, function application gate $U_f$ and one measurement. This makes it way easyer to understand than just one bit unitary matrix. A quantum circuit just allows though processes of the likes of: What operation performed when on which qubit? Which is much more natural to the kind of algorithmic thinking people are usually used to.
A quantum gate is just a unitary operator on a select set of qubits. Because they are unitary, quantum gates always have the same number of input and output qubits, unlike their boolean counterparts. Conceptual wires are used to connect the right qubits to the inputs and outputs of each gate. Let $U$ be a $k$-qubit gate in a $n$-qubit system, with $k < n$. Then $U$ can be extended to a $n$-qubit gate by calculating the tensor product of $U$ and a $(n-k)$-qubit identity matrix.
\begin{definition}
\label{def:gate_nbit_extension}
Let $\ket{\mathbf{x}} = \ket{x_1, \dots, x_n}$ be the $n$ qubits of a quantum system and let $U$ be a $k$-qubit quantum gate, with $k < n$. The $k$-tuple $I = (i_1, \dots, i_k)$ with $i_l, i_g \in [1,n]$ and $i_l \neq i_g$ for all $l,g$ maps $k$ qubits from $\ket{\mathbf{x}}$ to the inputs of $U$. Then,
\begin{equation*}
G_n(U,I) \coloneqq V_{\pi_I} (U \otimes \idmat_{n-k}) V_{\pi_I}^\dagger
\end{equation*}
is the $n$-qubit extension of $U$, with $V_{\pi_I}$ being a permutation matrix that satisfies
\begin{equation*}
V_{\pi_I}\ket{x_1,\dots,x_n} = \ket{x_{i_1},\dots,x_{i_k},\dots}
\end{equation*}
and $\idmat_{n-k}$ is the identity operator on the remaining qubits.
\end{definition}
\begin{remark}
Two gates $U_1 \in \C^{2^{k_1}}$ and $U_2 \in \C^{2^{k_2}}$ acting on $I_1 = (i_1^1,\dots,i_{k_1}^1)$ and $I_2 = (i_1^2,\dots,i_{k_2}^2)$ respectively can be computed in parallel iff $I_1 \cap I_2 = \emptyset$ and thus
\begin{equation*}
G_n(U_1,I_1) G_n(U_2,I_2) = G_n(U_1 \otimes U_2, I_1I_2) = V_{\pi_{I_1 I_2}}(U_1 \otimes U_2 \otimes \idmat_{n - (k_1 + k_2)}) V_{\pi_{I_1 I_2}}^\dagger
\end{equation*}
with $I_1I_2$ being the concatenation of $I_1$ and $I_2$ and
\begin{equation*}
V_{\pi_{I_1 I_2}}\ket{x_1,\dots,x_n} = \bigotimes_{i^1 \in I_1}\ket{x_{i^1}} \bigotimes_{i^2 \in I_2}\ket{x_{i^2}} \bigotimes_{i \in [1,n] \cap (I_1 \cup I_2)} \ket{x_i}
\end{equation*}
\end{remark}
\begin{definition}
A $n$-qubit quantum circuit is formally described by a sequence of gates and input mappings $C = (U_l, I_l)_{l=1}^m$ with $n = \abs{\bigcup_{l=1}^m I_l}$. Using \cref{def:gate_nbit_extension}, the circuit description equals a unitary matrix according to \cref{def:unitary_operator}:
\begin{equation}
\label{eq:unitary_matrix_of_circuit}
U_C = G_n(U_m,I_m) \cdots G_n(U_1,I_1)
\end{equation}
The circuit $C$ is said to be maximally parallelized if all neighboring gate extensions $G_n(U_l,I_l)$ and $G_n(U_{l+1}, I_{l+1})$ with $I_l \cap I_{l+1} = \emptyset$ are reduced to $G_n(U_l \otimes U_{l+1}, I_l I_{l+1})$.
\end{definition}
\begin{remark}
If a quantum circuit $C = (U_l,I_l)_{l=1}^m$ is used in combination with bra-ket notation sometimes out of convenience $C \ket{\psi}$ instead of $U_C \ket{\psi}$ will be written. With the latter being the formally correct application of $C$s unitary matrix representation $U_C$ according to \cref{eq:unitary_matrix_of_circuit}.
\end{remark}
\begin{definition}
\label{def:circuit_size_depth}
Given a quantum circuit $C=(U_l,I_l)_{l=1}^m$ then
\begin{itemize}
\item the size of $C$ is the total number of gates $m$
\item the depth of $C$ is the number of unitary operator $G_n$ in the maximally parallelized form of $C$
\end{itemize}
\end{definition}
In \cref{sec:deutschs_algorithm} Deutsch's Algorithm already was illustrated using an informal notion of quantum circuits. Let $U_D$ be the unitary operator and $C_D$ the corresponding circuit description of Deutsch's algorithm before measuring the first qubit. The circuit $C_D$ has a size of 4 and a depth of 3.
\begin{equation*}
\begin{aligned}
C_D &= \underbrace{(H, 1)}_{(U_1,I_1)},\underbrace{(H,2)}_{(U_2,I_2)},\underbrace{(O_f, (1,2))}_{(U_3,I_3)},\underbrace{(H,1)}_{(U_4,I_4)} \\
U_D &= G_2(H,1) G_2(O_f,(1,2)) \underbrace{G_2(H, 2)G_2(H, 1)}_{G_2(H \otimes H, (1,2))}
\end{aligned}
\end{equation*}