Source code to machine instruction

Tracing the ubiquitous “Hello World!” as far as we can

When you compile the following code,


int main()
{
  printf ("Hello World!");
  return 0;
}

Then the compiler processes the text above into machine code. For now, simply take that the compiler is a program that takes the source file we see above and spits them out in another form that the specific computer architecture that you’re running on can operate. The output being an executable. It is a sequence of ones and zeros, where different computer architecture have agreed to mean specific instructions for the CPU. Thus an excerpt of the above “Hello World!” executable would look like

0000620 ffff e8ff ff48 ffff 05c6 09e1 0020 5d01
0000630 0fc3 801f 0000 0000 c3f3 0f66 441f 0000
0000640 4855 e589 e95d ff66 ffff 4855 e589 8d48
0000650 9f3d 0000 b800 0000 0000 c1e8 fffe b8ff
0000660 0000 0000 c35d 2e66 1f0f 0084 0000 0000
0000670 5741 5641 8949 41d7 4155 4c54 258d 0736

The first column are offsets, and the remaining columns are hexadecimal numbers. For instance 0x55 when seen by the CPU might mean “push rbp”. The excerpt above are carefully selected from the executable hexdump because they actually form the instructions for the CPU to print the string “Hello World!” to the standard output!

In radare2’s disassembly of the same executable, we see that


0x0000064a      55             push rbp
0x0000064b      4889e5         mov rbp, rsp
0x0000064e      488d3d9f0000.  lea rdi, str.Hello_World    ; 0x6f4 ; "Hello World!"
0x00000655      b800000000     mov eax, 0
0x0000065a      e8c1feffff     call sym.imp.printf         ; int printf(const char *format)
0x0000065f      b800000000     mov eax, 0
0x00000664      5d             pop rbp
0x00000665      c3             ret


Note that the middle column that starts with 55 and end with c3 are also present in the hexdump lines 0000640 and 0000660. The order in the bytes are flipped due to little endian formatting.

The symbols on the right such as “push rbp” etc are symbolic representations of the binary values. They are called assembly language and serves to provide the human reader an understanding of the instructions.

Next, we look at how the CPU is capable of performing tasks such as arithmetic, or read/write to memory when presented with instructions such as “55 48 89 e5 48 83 … 5d c3”.

Any computer algorithm can be performed by a Turing Complete machine. Almost all modern deterministic computers adheres to the Von Neumann architecture, which in a Turing Complete design.

A typical CPU of the Von Neumann architecture contains the components in the diagram below,

Image: Von Neumann architecture

When the CPU is presented with machine instructions such as “55 48 89 … 5d c3”, it triggers various patterns of electrical signals. Similar to assembly, the electrical signal patterns are based on agreed upon convention.

In the diagram above, there is a control unit within the CU that contains a program counter that allows the instructions to be operated on sequentially. Each instruction is sent to the arithmetic logic unit (ALU) which then returns certain electrical signals as output. The ALU is made from combinations of logic gates and transistor which enables the deterministic response pattern of electrical signals. The electrical signals are physical representations of binary values.

Full derivation of Kalman-Filter algorithm

Specifying the Kalman-Filter algorithm

The Bayes filter elaborated in the previous post gave us the basis of evolving a posterior state distribution across time. However, there are several issues we have to sort out before we can implement a Bayes filter for practical usage,

  • the exact distributions of initial state {p(x_0)}, measurement function {p(z_t|x_t)}, and transition function {p(x_t|x_{t-1},u_t)} are often unknown, and
  • even if they are known, the evaluation of product of distributions (e.g., transition density function * {\overline{bel}(x_t)}), and integration of distributions (e.g., transition function over range of prior state) generally requires numerical methods as there are no closed-form solutions. These increases computational requirement and could cause our robot to be inappropriate for real-time usages.

One way to address the issues above is to make some assumptions that allows our Bayes algorithm to be more amenable. Let’s assume that our state evolves according to a linear Gaussian process. This allows us to reformulate our functions as follows,

\displaystyle \begin{array}{rcl}  \text{Transition function, } x_t & = & A_t x_{t-1} + B_t u_t + \epsilon_t  \\ \text{Measurement function, } z_t & = & C_t x_t + \delta_t \end{array}\ \ \ \ \ (1)

where,

\displaystyle  \begin{array}{rcl}  \text{Transition error, } \epsilon_t & \sim & \mathcal{N}(0, R_t) \\ \text{Measurement error, } \delta_t & \sim & \mathcal{N}(0, Q_t) \\ \text{Initial state distribution, } x_0 & \sim & \mathcal{N}(\mu_0, \Sigma_0) \end{array}

A Bayes-filter that follows Gaussian process is also known as a Kalman-filter. Our interest today is to show that in a Kalman-filter, the conditional state posterior distribution is also a Gaussion process. In other words, that the following holds,

\displaystyle  x_t|z_{1:t}, u_{1:t} \sim \mathcal{N}(\mu_t, \Sigma_t)  \ \ \ \ \ (2)

Restating the Bayes algorithm under a Kalman-filter gives us,

\displaystyle  \begin{array}{rcl}  && \mkern-60mu \text{Kalman-filter algorithm: Input} \left(\mu_{t-1}, \Sigma_{t-1}, u_t, z_t\right) \\ \bar{\mu_t} & = & A_t \mu_{t-1} + B_t u_t \\ \bar{\Sigma_t} & = & A_t \Sigma_{t-1}{A_t}^{T} + R_t \\ K & = & \bar{\Sigma_t} {C_t}^T {(Q_t + C_t \bar{\Sigma_t} {C_t}^T)}^{-1} \\ \Sigma_t & = & (I - K_t C_t)\bar{\Sigma_t} \\ \mu_t & = & \bar{\mu_t} + K(z_t - C_t \bar{\mu_t}) \\ && \mkern-60mu \text{return } \mu_t, \Sigma_t \end{array}

Derivation of Kalman-filter algorithm

We shall now prove that the Kalman-filter algorithm results in the state posterior distribution (2) by induction. For this, we need to show that,

\displaystyle  \begin{array}{rcl}  x_0 & \sim & \mathcal{N}(\mu_0, \Sigma_0) \qquad \text{and,} \\ x_{t-1}|z_{1:t-1}, u_{1:t-1} & \sim & \mathcal{N}(\mu_{t-1}, \Sigma_{t-1}) \end{array}

implies,

\displaystyle  x_t|z_{1:t}, u_{1:t} \sim \mathcal{N}(\mu_t, \Sigma_t)  \ \ \ \ \ (3)

The first point is true because our initial state distribution is assumed to be normal within the context of Kalman-filter. The second point will be shown as part of our derivation of the Kalman-filter algorithm.

Our goal is to express {\mu_t } and {\Sigma_t } as function of {\mu_{t-1}, \Sigma_{t-1}, u_t, \text{ and } z_t }. Recall from the previous post that,

\displaystyle  \begin{array}{rcl}  bel(x_t) & = & p(x_t|z_{1:t}, u_{1:t}) \\ & = & \eta p(z_t|x_t) \ \overline{bel}(x_{t-1}) \\ & = & \eta p(z_t|x_t) \int_{x_{t-1}} p(x_t| x_{t-1}, u_t) \ p(x_{t-1}|z_{1:t-1}, u_{1:t-1}) \ \mathrm{d}x_{t-1} \end{array}

We first show that the integrand above evaluates to a normal distribution. From (1), we know that

\displaystyle  x_t | x_{t-1}, u_t \sim \mathcal{N}(A_t x_{t-1}+B_t u_t, A\Sigma_{t-1}A^{T}+R_t)

Now, we can express {\overline{bel}(x_t) } as,

\displaystyle \begin{array}{rcl}  \overline{bel}(x_t) & = & \int_{x_{t-1}}p(x_t|x_{t-1},u_t)\ p(x_{t-1}|z_{1:t-1},u_{1:t-1}) \mathrm{d}x_{t-1} \nonumber \\ & = & \eta \int_{x_{t-1}} \text{exp } \{ - \frac{1}{2} {(x_t - A_t x_{t-1} - B_t u_t)}^T {R_t}^{-1} (x_t - A_t x_{t-1} - B_t u_t) \} \nonumber \\ && \quad \qquad \text{exp }\{ - \frac{1}{2} {(x_{t-1} - \mu_{t-1})}^T {\Sigma_{t-1}}^{-1} (x_{t-1} - \mu_{t-1}) \} \mathrm{d}x_{t-1}  \end{array}\ \ \ \ \ (4)

{\eta } is a normalizing factor that collects all constants not expressed, such that our probability density function still integrates to 1.

We want to rearrange the terms in the exponential in (4) in such a way that allows us to integrate over {x_{t-1}}. (For convenience, let’s denote the terms in the exponential in (4) as {-L_t}. Our strategy is to attempt to,

  1. decompose {L_t } into two functions, one with the terms {x_{t-1}}, and one without, which we shall label as {L_t(x_{t-1}, x_t) \text{ and } L_t(x_t)} respectively, and
  2. identify a form for {L_t(x_{t-1}, x_t)} such that the integral over {x_{t-1}} evaluates to a constant, which we can then subsume into the normalizing constant {\eta }.

To do the above, we observe that the function {L_t } is,

  1. quadratic in {x_{t-1} }, and
  2. exponential over the quadratic

This suggests that if we can collect the terms with {x_{t-1} } and rearrange it such that it matches a normal probability density, then the integral would evaluate to 1 (or a constant which then gets collected together with {\eta }).

We can construct a normal distribution from an exponential quadratic (see Appendix). By applying first and second order differentiation to {L_t(x_{t-1}, x_t) }, we get,

\displaystyle \begin{array}{rcl}  \frac{\delta L_t(x_{t-1})}{\delta x_{t-1}} & = & - {A_t}^T {R_t}^{-1} (x_t-A_t x_{t-1} -B_t u_t) + {\Sigma_{t-1}}^{-1}(x_{t-1}-\mu_{t-1}) \\ \frac{\delta^2 L_t(x_{t-1})}{\delta {x_{t-1}}^2} & = & {A_t}^T {R_t}^{-1} A_t + {\Sigma_{t-1}}^{-1} \\ &=:& {\Psi_t}^{-1}  \end{array}\ \ \ \ \ (5)

Setting the first derivative to 0 and solving for {x_{t-1} } gives us (after some algebraic manipulation),

\displaystyle  x_{t-1} = \Psi_t[{A_t}^T {R_t}^{-1}(x_t-B_t u_t) + {\Sigma_{t-1}}^{-1} \mu_{t-1}] \ \ \ \ \ (6)

Thus using the {x_{t-1} } we solved for, and {\Psi_t } as the mean and variance of a Gaussian pdf, we have

\displaystyle  \begin{array}{rcl}  L_t(x_{t-1},x_t) & = & \frac{1}{2} {(x_{t-1} - \Psi_t[{A_t}^T {R_t}^{-1}(x_t-B_t u_t) + {\Sigma_{t-1}}^{-1} \mu_{t-1}])}^T {\Psi_t}^{-1} \\ && \negmedspace (x_{t-1} - \Psi_t[{A_t}^T {R_t}^{-1}(x_t-B_t u_t) + {\Sigma_{t-1}}^{-1} \mu_{t-1}]) \end{array}

With this, let us find {L_t(x_t) } by subtracting {L_t(x_{t-1}, x_t) } from {L_t }

\displaystyle \begin{array}{rcl}  L_t(x_t) & = & L_t - L_t(x_{t-1}, x_t) \\ & = & \frac{1}{2} {(x_t - A_t x_{t-1} - B_t u_t)}^T {R_t}^{-1} (x_t - A_t x_{t-1} - B_t u_t)\\ && \negmedspace + \ \frac{1}{2}{(x_{t-1}-\mu_{t-1})}^T{\Sigma_{t-1}}^{-1}(x_{t-1}-\mu_{t-1})\\ && \negmedspace - \ \frac{1}{2} { ( x_{t-1} - \Psi_t[{A_t}^T {R_t}^{-1}(x_t-B_t u_t) + {\Sigma_{t-1}}^{-1} \mu_{t-1}] ) }^T {\Psi_t}^{-1} \\ && \negmedspace ( x_{t-1} - \Psi_t[{A_t}^T {R_t}^{-1}(x_t-B_t u_t) + {\Sigma_{t-1}}^{-1} \mu_{t-1}] )  \end{array}\ \ \ \ \ (7)

Substituting {\Psi_t } from (5) into (7), and after more algebraic manipulation, it turns out that all terms with {x_{t-1} } cancels out and we are left with

\displaystyle  \begin{array}{rcl}  L_t(x_t) & = & \frac{1}{2}{(x_t - B_t u_t)}^T {R_t}^{-1}(x_t - B_t u_t) \\ && \negmedspace +\ \frac{1}{2}{\mu_{t-1}}^T {\Sigma_{t-1}}^{-1}\mu_{t-1} \\ && \negmedspace -\ \frac{1}{2}{ [{A_t}^T{R_t}^{-1}(x_t-B_t u_t) +{\Sigma_{t-1}}^{-1}\mu_{t-1} ]}^T {({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1}) }^{-1} \\ && \negmedspace [{A_t}^T{R_t}^{-1}(x_t-B_t u_t) +{\Sigma_{t-1}}^{-1}\mu_{t-1}] \end{array}

The forms of {L_t(x_{t-1}, x_t)} and {L_t(x_t)} that we have now tells us two things, that

  1. {L_t(x_{t-1}, x_t) } indeed has a Gaussian distribution form which would result in {\int \textnormal{exp } \{-L_t(x_{t-1}, x_t)\}\ \mathrm{d} x_{t-1} = \eta }, and
  2. {L_t(x_t)} indeed does not contain any terms with {x_{t-1} }

With the decomposition of {L_t } we have now, we can now continue developing equation (4).

\displaystyle \begin{array}{rcl}  \overline{bel}(x_t) & = & \int_{x_{t-1}}p(x_t|x_{t-1}, u_t)p(x_{t-1}|z_{1:t-1}, u_{1:t-1}) \ \mathrm{d}x_{t-1} \\ & = & \eta \int_{x_{t-1}} \text{exp }\{ - \frac{1}{2} {(x_t - A_t x_{t-1} - B_t u_t)}^T {R_t}^{-1} (x_t - A_t x_{t-1} - B_t u_t)\} \\ && \negmedspace \text{exp } \{ -\frac{1}{2} {(x_{t-1} - \mu_{t-1})}^T {\Sigma_{t-1}}^{-1} (x_{t-1} - \mu_{t-1})\} \ \mathrm{d}x_{t-1} \\ & = & \eta \ \text{exp } \{-L_t(x_t)\} \int_{x_{t-1}} \text{exp } \{ -L_t(x_{t-1}, x_t)\} \ \mathrm{d}x_{t-1} \\ & = & \eta \ \text{exp } \{-L_t(x_t)\}  \end{array}\ \ \ \ \ (8)

We notice from (7) that {L_t(x_t)} is also exponential quadratic in the terms {x_t}, as such, we again apply the first and second order differentiation technique to construct a normal distribution (see Appendix). We get,

\displaystyle \begin{array}{rcl}  \frac{\delta L_t(x_t)}{\delta {x_t}} & = & {R_t}^{-1}(x_t-B_t u_t) - {R_t}^{-1} A_t {({A_t}^T {R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}^{-1} \\ && \negmedspace [{A_t}^T {R_t}^{-1}(x_t-B_t u_t)+{\Sigma_{t-1}}^{-1}\mu_{t-1}] \\ \frac{\delta^2 L_t(x_t)}{\delta {x_t}^2} & = & {R_t}^{-1} - {R_t}^{-1}A_t{({A_t}^T{R_t}^{-1}A_t+{\Sigma_{t-1}}^{-1})}^{-1}{A_t}^T{R_t}^{-1}  \end{array}\ \ \ \ \ (9)

Using inversion lemma (see Appendix), we can rewrite (9) as,

\displaystyle \begin{array}{rcl}  \frac{\delta^2 L_t(x_t)}{\delta {x_t}^2} & = & {R_t}^{-1} - {R_t}^{-1}A_t{({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}^{-1}{A_t}^T{R_t}^{-1} \\ & = & {(R_t+A_t\Sigma_{t-1}{A_t}^T)}^{-1} \\ & =: & {{\bar{\Sigma}}_t}^{-1}  \end{array}\ \ \ \ \ (10)

The intuitive interpretation of (10) is that the variance of our normal distribution, {{\bar{\Sigma}}_t} is the variance of the transition process (1), which is the sum of

  • the error in transition process (1), {R_t }, and
  • the variance of the prior period’s belief, {\overline{bel}(x_{t-1})} evolved through the linear process {A_t }

Next, we solve for {x_t } in (9),

\displaystyle  \begin{array}{rcl}  0 &=& {R_t}^{-1}(x_t-B_t u_t) \\ &-& {R_t}^{-1} A_t {({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}^{-1}[{A_t}^T {R_t}^{-1}(x_t-B_t u_t) + {\Sigma_{t-1}}^{-1}\mu_{t-1}] \\ &\iff& {R_t}^{-1}(x_t-B_t u_t) \\ &=& {R_t}^{-1} A_t {({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}^{-1}[{A_t}^T {R_t}^{-1}(x_t-B_t u_t) + {\Sigma_{t-1}}^{-1}\mu_{t-1}] \\ &\iff& [{R_t}^{-1} - {R_t}^{-1} A_t {({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}^{-1}{A_t}^T {R_t}^{-1}](x_t-B_t u_t) \\ &=& {R_t}^{-1} A_t {({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}^{-1}{\Sigma_{t-1}}^{-1}\mu_{t-1} \\ &\iff& {(R_t+A_t\Sigma_{t-1}{A_t}^T)}^{-1}(x_t-B_t u_t) \\ &=& {R_t}^{-1} A_t {({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}^{-1}{\Sigma_{t-1}}^{-1}\mu_{t-1} \text{ by inversion lemma} \\ &\iff& (x_t-B_t u_t) \\ &=& (R_t+A_t\Sigma_{t-1}{A_t}^T){R_t}^{-1} A_t {({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}^{-1}{\Sigma_{t-1}}^{-1}\mu_{t-1} \\ &\iff& x_t = B_t u_t + (R_t+A_t\Sigma_{t-1}{A_t}^T){R_t}^{-1} A_t {({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}^{-1}{\Sigma_{t-1}}^{-1}\mu_{t-1} \\ &\iff& x_t = B_t u_t + (I+A_t\Sigma_{t-1}{A_t}^T{R_t}^{-1}) A_t {({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}^{-1}{\Sigma_{t-1}}^{-1}\mu_{t-1} \\ &\iff& x_t = B_t u_t + (I+A_t\Sigma_{t-1}{A_t}^T{R_t}^{-1}) A_t [{\Sigma_{t-1}}{({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}]^{-1}\mu_{t-1} \\ &\iff& x_t = B_t u_t + (A_t+A_t\Sigma_{t-1}{A_t}^T{R_t}^{-1}A_t) [{\Sigma_{t-1}}{({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}]^{-1}\mu_{t-1} \\ &\iff& x_t = B_t u_t + A_t(I+\Sigma_{t-1}{A_t}^T{R_t}^{-1}A_t) [{\Sigma_{t-1}}{({A_t}^T{R_t}^{-1}A_t + {\Sigma_{t-1}}^{-1})}]^{-1}\mu_{t-1} \\ &\iff& x_t = B_t u_t + A_t(I+\Sigma_{t-1}{A_t}^T{R_t}^{-1}A_t) {(I+{\Sigma_{t-1}}{A_t}^T{R_t}^{-1}A_t)}^{-1}\mu_{t-1} \\ &\iff& x_t = B_t u_t + A_t\mu_{t-1} \\ &\iff& x_t = A_t\mu_{t-1} + B_t u_t \end{array}

Thus, we obtain the mean of the normal distribution as,

\displaystyle  x_t = A_t\mu_{t-1} + B_t u_t  \ \ \ \ \ (11)

With (11) as the mean, and (10) as the variance parameter of the normal distribution of (8), we can now say that,

\displaystyle  \overline{bel}(x_t) \sim \mathcal{N}(A_t\mu_{t-1} + B_t u_t, A_t\Sigma_{t-1}{A_t}^T + R_t) =: \mathcal{N}(\bar{\mu}_t, \bar{\Sigma}_t)  \ \ \ \ \ (12)

Now the last remaining step we have to do is to combine the transition distribution with the measurement distribution,

\displaystyle  bel(x_t) \\ = p(x_t|z_{1:t}, u_{1:t}) \\ = \eta \ p(z_t|x_t)\ \overline{bel}(x_{t-1})  \ \ \ \ \ (13)

Since we now know that both {p(z_t|x_t)} and {\overline{bel}(x_{t-1}) } are normal distribution with parameters (1) and (12) respectively, we can write (13) as,

\displaystyle \begin{array}{rcl}  bel(x_t) &=& p(x_t|z_{1:t}, u_{1:t}) \\ &=& \eta \ p(z_t|x_t)\ \overline{bel}(x_{t-1}) \\ &=& \eta \text{ exp } \{-\frac{1}{2}{(z_t - C_t x_t)}^T{Q_t}^{-1}(z_t - C_t x_t) \} \\ && \negmedspace \text{ exp } \{-\frac{1}{2}{ (x_t - \bar{\mu}_t)}^T{\bar{\Sigma}_t}^{-1}(x_t - \bar{\mu}_t) \}  \end{array}\ \ \ \ \ (14)

Let us collect the exponential terms in (14) as {J_t}, such that,

\displaystyle  bel(x_t) = \eta \text{ exp } \{-J_t\}

Again we notice that (14) is exponential quadratic in {x_t}, which means we can apply first and second order differentiation technique to {J_t } to determine the mean and variance parameters of its distribution, which we also know at this point is Gaussian,

\displaystyle \begin{array}{rcl}  \frac{\delta J_t}{\delta {x_t}} &=& -{C_t}^T{Q_t}^{-1}(z_t - C_t x_t) + {\bar{\Sigma}_t}^{-1}(x_t-\bar{\mu}_t) \\ \frac{\delta^2 J_t}{\delta {x_t}^2} &=& {C_t}^T{Q_t}^{-1}C_t + {\bar{\Sigma}_t}^{-1} \\ &=:& {\Sigma_t}^{-1}  \end{array}\ \ \ \ \ (15)

Setting (15) to zero, and substituing {\mu_t } for {x_t },

\displaystyle \begin{array}{rcl}  0 &=& -{C_t}^T{Q_t}^{-1}(z_t - C_t x_t) + {\bar{\Sigma}_t}^{-1}(x_t-\bar{\mu}_t) \\ 0 &=& -{C_t}^T{Q_t}^{-1}(z_t - C_t \mu_t) + {\bar{\Sigma}_t}^{-1}(\mu_t-\bar{\mu}_t) \\ &\iff& {C_t}^T{Q_t}^{-1}(z_t - C_t \mu_t) = {\bar{\Sigma}_t}^{-1}(\mu_t-\bar{\mu}_t) \\ &\iff& {C_t}^T{Q_t}^{-1}z_t + {\bar{\Sigma}_t}^{-1}\bar{\mu}_t = ({C_t}^T{Q_t}^{-1}C_t + {\bar{\Sigma}_t}^{-1})\mu_t \\ &\iff& \mu_t = {({C_t}^T{Q_t}^{-1}C_t + {\bar{\Sigma}_t}^{-1})}^{-1}({C_t}^T{Q_t}^{-1}z_t + {\bar{\Sigma}_t}^{-1}\bar{\mu}_t)  \end{array}\ \ \ \ \ (16)

We now have a complete specification of {bel(x_t) } as a normal distribution with mean, and variance as specified in (16) and (15), which proved (3). Our proof by induction of (2) is now complete.

Kalman Gain

We could have used (16) and (15) in our Kalman-filter algorithm as below, and call it a day.

\displaystyle \begin{array}{rcl}  \mu_t &=& {({C_t}^T{Q_t}^{-1}C_t + {\bar{\Sigma}_t}^{-1})}^{-1}({C_t}^T{Q_t}^{-1}z_t + {\bar{\Sigma}_t}^{-1}\bar{\mu}_t) \\ \Sigma_t &=& ({C_t}^T{Q_t}^{-1}C_t + {\bar{\Sigma}_t}^{-1})^{-1}  \end{array}\ \ \ \ \ (17)

However another representation that in more commonly used exists. It also is more efficient by allowing us to avoid having to evaluate double inverses. By introducing a Kalman gain term, we can express (17) as,

\displaystyle \begin{array}{rcl}  \mu_t &=& \bar{\mu}_t+K(z_t-C_t \bar{\mu}_t) \\ \Sigma_t &=& (I - K_t C_t)\bar{\Sigma}_t  \end{array}\ \ \ \ \ (18)

To go from the variance formulation from (17) to (18), we write

\displaystyle \begin{array}{rcl}  \Sigma_t \ &=& {(C_t}^T{Q_t}^{-1}C_t + {\bar{\Sigma}_t)}^{-1} \\ &=& \bar{\Sigma}_t - \bar{\Sigma}_t {C_t}^T (Q_t + C_t \bar{\Sigma}_t {C_t}^T)^{-1}C_t \bar{\Sigma}_t \\ &=& (I - \bar{\Sigma}_t {C_t}^T [Q_t + C_t \bar{\Sigma}_t {C_t}^T]^{-1}C_t )\bar{\Sigma}_t \\ &=& (I - K_t C_t )\overline{\Sigma}_t  \end{array}\ \ \ \ \ (19)

Where we define the Kalman gain {K_t } as

\displaystyle  K_t = \bar{\Sigma}_t {C_t}^T [Q_t + C_t \bar{\Sigma}_t {C_t}^T]^{-1}  \ \ \ \ \ (20)

Next, we show how we can get the mean formulation from (17) to (18),

\displaystyle \begin{array}{rcl}  \mu_t &=& {({C_t}^T{Q_t}^{-1}C_t + {\bar{\Sigma}_t}^{-1})}^{-1}({C_t}^T{Q_t}^{-1}z_t + {\bar{\Sigma}_t}^{-1}\bar{\mu}_t) \\ &=& (I-K_t C_t)\bar{\Sigma}_t ({C_t}^T{Q_t}^{-1}z_t + {\bar{\Sigma}_t}^{-1}\bar{\mu}_t) \\ &=& \bar{\Sigma}_t {C_t}^T{Q_t}^{-1}z_t - K_t C_t \bar{\Sigma}_t {C_t}^T{Q_t}^{-1}z_t + (I-K_t C_t)\bar{\mu}_t \\ &=& K_t (Q_t + C_t \bar{\Sigma}_t {C_t}^T) {Q_t}^{-1}z_t - K_t C_t \bar{\Sigma}_t {C_t}^T{Q_t}^{-1}z_t + (I-K_t C_t)\bar{\mu}_t \\ &=& K_t (I + C_t \bar{\Sigma}_t {C_t}^T {Q_t}^{-1} - C_t \bar{\Sigma}_t {C_t}^T{Q_t}^{-1})z_t + (I-K_t C_t)\bar{\mu}_t \\ &=& K_t z_t + (I-K_t C_t)\bar{\mu}_t \\ &=& \bar{\mu}_t + K_t (z_t - C_t \bar{\mu}_t)  \end{array}\ \ \ \ \ (21)

Finally, combining (12), (19), (20), and (21), we obtain the Kalman-filter algorithm.

Appendix

Constructing Normal distribution

For a normal probability density function {\mathcal{N}(\mu, \Sigma) }, it’s pdf has the form

\displaystyle  f(x) = \frac{1}{\sqrt{2\pi\Sigma}} \text{ exp }\{- \frac{1}{2} {(x - \mu)}^T {\Sigma}^{-1} (x - \mu)\}

Let the expression in the exponential be denoted as {L(x)}, then the parameters of the normal distribution can be obtained as,

  • {\mu = x \textnormal{ solved for with } \frac{\delta L(x)}{\delta x}} set to 0, and
  • {\Sigma^{-1} = \frac{\delta^2 L(x)}{\delta x^2} }

Inversion Lemma

The inversion lemma (or also known as Woodbury matrix identity) states that for invertible matrices {A, U, C, V } of the correct size, the following holds

\displaystyle  (A+UCV)^{-1} = A^{-1} - A^{-1}U(C^{-1}+VA^{-1}U)^{-1}VA^{-1}

References

Thrun, Sebastian, et al. Probabilistic Robotics. MIT Press, 2010.

Derivation of Bayes-Filter algorithm

Why is Bayes-filter useful?
For many autonomous robotics system, the action to take at each time period depends on its environment. For instance, a roomba (robotic vacuum cleaner) needs to know the position it is in a room before it can decide which way to move.
The more a robot knows about the environment, the better the decision it can make. However measuring the environment is often uncertain, as it can be affected by among others,

  • noise in measurement data (e.g., sensors such as LIDAR/RADAR having limited accuracy),
  • faulty sensors,
  • faulty actuators (e.g., wheels slippage on the robot may cause actual travelled distance to be less than calculated distance, which could then affect its estimation of environment variables),
  • fuzzy and dynamic environment (e.g., fog or surrounded by moving objects), and
  • the robot’s own actions (e.g., movements, or interaction with the environment)

To incorporate these uncertainty, one approach is to model the state variables probabilistically. The probability distribution of the state variables conditioned on historical states and measurements is known as belief. What we want is to describe the probability distribution of the state variables given measurements and controls (note: control is the term used to mean all the robot’s action).

Bayes-filter
Before we venture further, we need to define the symbols we will be using.
Let,

\displaystyle  \begin{array}{rcl}  x_{t} && \text{ be the state variables} \\ z_{t} && \text{ be the measurements obtained from sensors} \\ u_{t} && \text{ be the control/action variables the robot takes} \\ && \text{(e.g., steering at an angle for some duration)} \\ u_{1:t} && \text{ be the set of controls from time 1 to t} \end{array}

We can then state our motivation as “how do we evolve an initial boundary condition of {p(x_0)} to {p(x_t|z_{1:t}, u_{1:t})}.”
Going from one time period to another, we assume the robot will execute two operations 1) perform control using its actuators (note: we consider doing nothing as a valid control operation), 2) measure its environment.
Let’s illustrate these two operations from t = 0 to 2,
At t = 0,

\displaystyle  p(x_0)

With whatever knowledge that we have, we form an opinion about the environment by specifying {p(x_0)}, which could simply be a uniform distribution if we have zero knowledge.
At t = 1,
After the robot performs a control action, our belief becomes

\displaystyle  p(x_1|u_1) = \int_{x_0}  p(x_1 | x_0, u_1) p(x_0) \ \mathrm{d}x_0  \ \ \ \ \ (1)


Then the robot performs a measurement on its environment. Its belief will then be updated by

\displaystyle  p(x_1|u_1, z_1) = \frac{p(z_1 | x_1, u_1)p(x_1|u_1)}{p(z_1|u_1)}  \ \ \ \ \ (2)


At t = 2,
With the next control action performed,

\displaystyle  p(x_2|z_1, u_{1:2}) = \int_{x_1}  p(x_2 | x_1, z_1, u_{1:2}) p(x_1|z_1, u_1) \ \mathrm{d}x_1  \ \ \ \ \ (3)


With the next measurement update performed

\displaystyle  p(x_2|u_{1:2}, z_{1:2}) = \frac{p(z_2 | x_2, u_{1:2}, z_1)p(x_2|u_{1:2}, z_1)} {p(z_2|u_{1:2}, z_1)}  \ \ \ \ \ (4)


At this point, we observe that equation (2) and (4) are similar except for the time subscript. Such is also true for equation (1) and (3). This leads us to suspect that we could formulate a recursive algorithm to evolve forward our robot’s belief.
We shall introduce more terms for convenience.
Let,

\displaystyle  \begin{array}{rcl}  \overline{bel}(x_t) &=& p(x_t|z_{1:t-1}, u_{1:t}) \text{,} \\ bel(x_t) &=& p(x_t|z_{1:t}, u_{1:t}) \text{, and} \\ \eta &=& \text{normalizing constant such that probability expression integrates to 1} \end{array}

Our posterior distribution becomes

\displaystyle  bel(x_{t}) = \eta \ p(z_{t} | x_{t}, u_{1:t}, z_{1:t-1}) \ \overline{bel}(x_{t})  \ \ \ \ \ (5)


Inspecting (5), we list the distributions we need

  1. initial probability of environment, {p(x_0)},
  2. transition probability, {p(x_t|x_{t-1}, u_{1:t})},
  3. measurement probability, {p(z_t|x_t, u_{1:t}, z_{1:t-1})}

As long as we have the transition function and measurement function, we could implement an algorithm that evolves our robot’s state probability distribution over time. However, our specification requires us to keep track of all historical controls and measurements (as can be seen from the subscript {{1:t} } of (5). This is rather impractical as our machines would quickly run out of memory. In fact, it generally makes sense that future prediction would depend lesser on historical information the farther they are back in time. One way to capture this sentiment formally, is to state the following assumption,

\displaystyle  p(x_t|x_{t-1}, x_{t-2},\ldots,x_1) = p(x_t|x_{t-1}, x_{t-2},\ldots,x_{t-m}) \qquad \text{ for } t > m

With this assumption, we can now say that the prediction of the current state depends only on the past {m} states. A process that obeys this property is also known as a Markov chain of order {m}. For our purpose, we will assume that our process is a 1st order Markov chain. We can then restate our posterior belief in a simplified recursive form as below,

\displaystyle  \begin{array}{rcl}  bel(x_t) &=& p(x_t|z_{1:t}, u_{1:t}) \\ &=& \frac{p(z_t|x_t, z_{1:t-1}, u_{1:t})\ p(x_t|z_{1:t-1}, u_{1:t})} {p(z_t|z_{1:t-1}, u_{1:t})} \\ &=& \eta \ p(z_t|x_t, z_{1:t-1}, u_{1:t})\ p(x_t|z_{1:t-1}, u_{1:t}) \\ &=& \eta \ p(z_t|x_t)\int_{x_{t-1}}\ p(x_t| x_{t-1}, z_{1:t-1}, u_{1:t}) \ p(x_{t-1}|z_{1:t-1}, u_{1:t}) \ \mathrm{d}x_{t-1}\\ &=& \eta \ p(z_t|x_t)\int_{x_{t-1}} p(x_t| x_{t-1}, u_t) \ p(x_{t-1}|z_{1:t-1}, u_{1:t-1}) \ \mathrm{d}x_{t-1}\\ &=& \eta \ p(z_t|x_t)\int_{x_{t-1}}\ p(x_t| x_{t-1}, u_t) bel(x_{t-1}) \ \mathrm{d}x_{t-1}\\ &=& \eta \ p(z_t|x_t)\ \overline{bel}(x_{t-1}) \end{array}

The derivation above shows that we can now state a recursive state distribution with

  • initial state distribution
  • measurement distribution
  • state transition distribution

This is also known as the Bayes-filter. To implement the filter, one can use the pseudocode presented below

\displaystyle  \begin{array}{rcl}  && \mkern-95mu \text{Input } (bel(x_{t-1}), u_t, z_t) \text{ for all } x_t \\ \overline{bel}(x_t) &=& \int_{x_{t-1}} p(x_t|x_{t-1},u_t)\ bel(x_{t-1}) \\ \indent bel(x_t) &=& \eta \ p(z_t|x_t)\ \overline{bel}(x_t) \\ && \mkern-95mu \text{return } bel(x_t) \end{array}

References
Thrun, Sebastian, et al. Probabilistic Robotics. MIT Press, 2010.