Specifying the Kalman-Filter algorithm
The Bayes filter elaborated in the previous post gave us the basis of evolving a posterior state distribution across time. However, there are several issues we have to sort out before we can implement a Bayes filter for practical usage,
- the exact distributions of initial state
, measurement function
, and transition function
are often unknown, and
- even if they are known, the evaluation of product of distributions (e.g., transition density function *
), and integration of distributions (e.g., transition function over range of prior state) generally requires numerical methods as there are no closed-form solutions. These increases computational requirement and could cause our robot to be inappropriate for real-time usages.
One way to address the issues above is to make some assumptions that allows our Bayes algorithm to be more amenable. Let’s assume that our state evolves according to a linear Gaussian process. This allows us to reformulate our functions as follows,
where,
A Bayes-filter that follows Gaussian process is also known as a Kalman-filter. Our interest today is to show that in a Kalman-filter, the conditional state posterior distribution is also a Gaussion process. In other words, that the following holds,
Restating the Bayes algorithm under a Kalman-filter gives us,
Derivation of Kalman-filter algorithm
We shall now prove that the Kalman-filter algorithm results in the state posterior distribution (2) by induction. For this, we need to show that,
implies,
The first point is true because our initial state distribution is assumed to be normal within the context of Kalman-filter. The second point will be shown as part of our derivation of the Kalman-filter algorithm.
Our goal is to express and
as function of
. Recall from the previous post that,
We first show that the integrand above evaluates to a normal distribution. From (1), we know that
is a normalizing factor that collects all constants not expressed, such that our probability density function still integrates to 1.
We want to rearrange the terms in the exponential in (4) in such a way that allows us to integrate over . (For convenience, let’s denote the terms in the exponential in (4) as
. Our strategy is to attempt to,
- decompose
into two functions, one with the terms
, and one without, which we shall label as
respectively, and
- identify a form for
such that the integral over
evaluates to a constant, which we can then subsume into the normalizing constant
.
To do the above, we observe that the function is,
- quadratic in
, and
- exponential over the quadratic
This suggests that if we can collect the terms with and rearrange it such that it matches a normal probability density, then the integral would evaluate to 1 (or a constant which then gets collected together with
).
We can construct a normal distribution from an exponential quadratic (see Appendix). By applying first and second order differentiation to , we get,
Setting the first derivative to 0 and solving for gives us (after some algebraic manipulation),
Thus using the we solved for, and
as the mean and variance of a Gaussian pdf, we have
With this, let us find by subtracting
from
Substituting from (5) into (7), and after more algebraic manipulation, it turns out that all terms with
cancels out and we are left with
The forms of and
that we have now tells us two things, that
-
indeed has a Gaussian distribution form which would result in
, and
-
indeed does not contain any terms with
With the decomposition of we have now, we can now continue developing equation (4).
We notice from (7) that is also exponential quadratic in the terms
, as such, we again apply the first and second order differentiation technique to construct a normal distribution (see Appendix). We get,
Using inversion lemma (see Appendix), we can rewrite (9) as,
The intuitive interpretation of (10) is that the variance of our normal distribution, is the variance of the transition process (1), which is the sum of
- the error in transition process (1),
, and
- the variance of the prior period’s belief,
evolved through the linear process
Next, we solve for in (9),
Thus, we obtain the mean of the normal distribution as,
With (11) as the mean, and (10) as the variance parameter of the normal distribution of (8), we can now say that,
Now the last remaining step we have to do is to combine the transition distribution with the measurement distribution,
Since we now know that both and
are normal distribution with parameters (1) and (12) respectively, we can write (13) as,
Let us collect the exponential terms in (14) as , such that,
Again we notice that (14) is exponential quadratic in , which means we can apply first and second order differentiation technique to
to determine the mean and variance parameters of its distribution, which we also know at this point is Gaussian,
Setting (15) to zero, and substituing for
,
We now have a complete specification of as a normal distribution with mean, and variance as specified in (16) and (15), which proved (3). Our proof by induction of (2) is now complete.
Kalman Gain
We could have used (16) and (15) in our Kalman-filter algorithm as below, and call it a day.
However another representation that in more commonly used exists. It also is more efficient by allowing us to avoid having to evaluate double inverses. By introducing a Kalman gain term, we can express (17) as,
To go from the variance formulation from (17) to (18), we write
Where we define the Kalman gain as
Next, we show how we can get the mean formulation from (17) to (18),
Finally, combining (12), (19), (20), and (21), we obtain the Kalman-filter algorithm.
Appendix
Constructing Normal distribution
For a normal probability density function , it’s pdf has the form
Let the expression in the exponential be denoted as , then the parameters of the normal distribution can be obtained as,
-
set to 0, and
-
Inversion Lemma
The inversion lemma (or also known as Woodbury matrix identity) states that for invertible matrices of the correct size, the following holds
References
Thrun, Sebastian, et al. Probabilistic Robotics. MIT Press, 2010.
In equation (4) and the terms thereafter, shouldn’t R_t be replaced by the total covariance in the expression above it i.e. (A \Sigma_{t-1} A^T + R_t)? Because If (x_t | x_{t-1}, u_t) is distributed as mentioned in the expression above eq. (4), then the covariance term should correspond to that, right?
LikeLike
Good catch! The equation directly above (4) is inaccurate. “A \Sigma_{t-1} A^T + R_t” is the variance of “x_t”, not “x_t | x_{t-1}, u_t”. The covariance for “x_t | x_{t-1}, u_t” should have been stated as R instead. To see this, evaluate the conditional variance using https://en.wikipedia.org/wiki/Conditional_variance#Definition, the x_{t-1} term will cancel out and you’ll be left with R.
LikeLike
Oh yeah, that makes sense. I didn’t think too much about that conditional expression above eq. (4). I took that as the reference and then started working out the remaining equations. I should have thought about that.
Thank you for your prompt response, and, of course, for this derivation!
LikeLike