Let’s start with what a Kalman filter is: It’s a method of predicting the future state of a system based on the previous ones. To understand what it does, take a look at the following data – if you were given the data in blue, it may be reasonable to predict that the green dot should follow, by simply extrapolating the linear trend from the few previous samples. However, how confident would you be predicting the dark red point on the right using that method? how confident would you be about predicting the green point, if you were given the red series instead of the blue?
From this simple example, we can learn three important principles:
Now, let’s try and use the above to model our prediction.
The first thing we need is a state. The state is a description of all the parameters we will need to describe the current system and perform the prediction. For the example above, we’ll use two numbers: The current vertical position (y), and our best estimate of the current slope (let’s call it mm). Thus, the state is in general a vector, commonly denoted x, and you of course can include many more parameters to it if you wish to model more complex systems.
The next thing we need is a model: The model describes how we think the system behaves. In an ordinary Kalman filter, the model is always a linear function of the state. In our simple case, our model is:
Of course, our model isn’t perfect (else we wouldn’t need a Kalman Filter!), so we add an additional term to the state – the process noise, vt which is assumed to be normally distributed. Although we don’t know the actual value of the noise, we assume we can estimate how “large” the noise is, as we shall presently see. All this gives us the state equation, which is simply:
The third part and final part we are missing is the measurement. When we get new data, our parameters should change slightly to refine our current model and the next predictions. What is important to understand is that one does not have to measure exactly the same parameters as those in the state. For instance, a Kalman filter describing the motion of a car may want to predict the car’s acceleration, velocity, and position, but only measure say, the wheel angle and rotational velocity. In our example, we only “measure” the vertical position of the new points, not the slope. That is
In the more general case, we may have more than one measurement, so the measurement is a vector, denoted by z. Also, the measurements themselves are noisy, so the general measurement equation takes the form:
Where w is the measurement noise, and HH is, in general, a matrix with a width of the number of state variables, and height of the number of measurement variables.
Now that we have understood what goes into modeling the system, we can now start with the prediction stage, the heart of the Kalman Filter.
The difference y (also called the innovation) represents how wrong our current estimation is – if everything was perfect, the difference would be zero! To incorporate this into our model, we add the innovation to our state equation, multiplied by a matrix factor that tells us how much the state should change based on this difference between the expected and actual measurements:
The matrix W is known as the Kalman gain, and its determination is where things get messy, but understanding why the prediction takes this form is the really important part. But before we get into the formula for W, we should give thought to what it should look like:
One way to evaluate the uncertainty of a value is to look at its variance. The first variance we care about is the variance of our prediction of the state:
https://math.stackexchange.com/questions/840662/an-explanation-of-the-kalman-filter
https://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/