Automatic Differentiation Explained
2019-08-06
A complete guide to understanding Long Short Term Memory (LSTM) Networks
2019-08-15
Show all

What are eigenvectors and eigenvalues?

16 mins read

Introduction

Eigenvectors and eigenvalues have many important applications in computer vision and machine learning in general. Well-known examples are PCA (Principal Component Analysis) for dimensionality reduction or EigenFaces for face recognition. An interesting use case of eigenvectors and eigenvalues is also illustrated in my post about error ellipses. Furthermore, eigendecomposition forms the base of the geometric interpretation of covariance matrices, discussed in a more recent post. In this article, I will provide a gentle introduction to this mathematical concept, and will show how to manually obtain the eigendecomposition of a 2D square matrix.

An eigenvector is a vector whose direction remains unchanged when a linear transformation is applied to it. Consider the image below in which three vectors are shown. The green square is only drawn to illustrate the linear transformation that is applied to each of these three vectors.

Eigenvectors (red) do not change direction when a linear transformation (e.g. scaling) is applied to them. Other vectors (yellow) do.

The transformation, in this case, is a simple scaling with factor 2 in the horizontal direction and factor 0.5 in the vertical direction, such that the transformation matrix A is defined as:

A=\begin{bmatrix} 2 & 0 \\ 0 & 0.5 \end{bmatrix}.

A vector \vec{(x, y)}  is then scaled by applying this transformation as \vec{v}\lambda=A\vec{v}. The above figure shows that the direction of some vectors (shown in red) is not affected by this linear transformation. These vectors are called eigenvectors of the transformation, and uniquely define the square matrix A. This unique, deterministic relation is exactly the reason that those vectors are called ‘eigenvectors’ (Eigen means ‘specific’ in German).

In general, the eigenvector \vec{v} of a matrix A is the vector for which the following holds:

(1) A \vec{v} = \lambda \vec{v}

where \lambda is a scalar value called the ‘eigenvalue’. This means that the linear transformation A on vector \vec{v} is completely defined by \lambda.

We can rewrite equation (1) as follows:

(2) A \vec{v} - \lambda \vec{v} = 0 \\ \Rightarrow \vec{v} (A - \lambda I) = 0 

where I is the identity matrix of the same dimensions as A.

However, assuming that \vec{v}  is not the null-vector, equation (2) can only be defined if A-\lambda I is not invertible. If a square matrix is not invertible, that means that its determinant must equal zero. Therefore, to find the eigenvectors of A, we simply have to solve the following equation:

(3) Det(A - \lambda I) = 0

In the following sections, we will determine the eigenvectors and eigenvalues of a matrix A, by solving equation (3). Matrix A in this example is defined by:

(4) A = \begin{bmatrix} 2 & 3 \\ 2 & 1 \end{bmatrix}

Calculating the eigenvalues

To determine the eigenvalues for this example, we substitute A in equation (3) with equation (4) and obtain:

(5) Det\begin{pmatrix}2-\lambda&3\\2&1-\lambda\end{pmatrix}=0

Calculating the determinant gives:

(6) &(2-\lambda)(1-\lambda) - 6 = 0\\ \Rightarrow &2 - 2 \lambda - \lambda - \lambda^2 -6 = 0\\ \Rightarrow &{\lambda}^2 - 3 \lambda -4 = 0 

To solve this quadratic equation in \lambda, we find the discriminant:

D = b^2 -4ac = (-3)^2 -4*1*(-4) = 9+16 = 25.

Since the discriminant is strictly positive, this means that two different values for \lambda exist:

(7) \lambda _1 &= \frac{-b - \sqrt{D}}{2a} = \frac{3-5}{2} = -1,\\ \lambda _2 &= \frac{-b + \sqrt{D}}{2a} = \frac{3+5}{2} = 4.

We have now determined the two eigenvalues \lambda_1 and \lambda_2. Note that a square matrix of size N \cdot N always has exactly N eigenvalues, each with a corresponding eigenvector. The eigenvalue specifies the size of the eigenvector.

Calculating the first eigenvector

We can now determine the eigenvectors by plugging the eigenvalues from equation (7) into equation (1) that originally defined the problem. The eigenvectors are then found by solving this system of equations.

We first do this for eigenvalue \lambda_1, in order to find the corresponding first eigenvector:

\begin{bmatrix}2&3\\2&1\end{bmatrix} \begin{bmatrix}x_{11}\\x_{12}\end{bmatrix} = -1 \begin{bmatrix}x_{11}\\x_{12}\end{bmatrix}.

Since this is simply the matrix notation for a system of equations, we can write it in its equivalent form:

(8) \left\{ \begin{array}{lr} 2x_{11} + 3x_{12} = -x_{11}\\ 2x_{11} + x_{12} = -x_{12} \end{array} \right.

and solve the first equation as a function of x_{12}, resulting in:

(9) x_{11} = -x_{12}.

Since an eigenvector simply represents an orientation (the corresponding eigenvalue represents the magnitude), all scalar multiples of the eigenvector are vectors that are parallel to this eigenvector and are therefore equivalent (If we would normalize the vectors, they would all be equal). Thus, instead of further solving the above system of equations, we can freely choose a real value for either x_{11} or x_{12}, and determine the other one by using equation (9).

For this example, we arbitrarily choose x_{12}=1, such that x_{11}=-1. Therefore, the eigenvector that corresponds to eigenvalue \lambda_{1}=-1 is

(10) \vec{v}_1 = \begin{bmatrix} -1 \\ 1 \end{bmatrix}. 

Calculating the second eigenvector

Calculations for the second eigenvector are similar to those needed for the first eigenvector;
We now substitute eigenvalue \lambda_2=4 into equation (1), yielding:

(11) \begin{bmatrix}2&3\\2&1\end{bmatrix} \begin{bmatrix}x_{21}\\x_{22}\end{bmatrix} = 4 * \begin{bmatrix}x_{21}\\x_{22}\end{bmatrix}.

Written as a system of equations, this is equivalent to:

(12) \left\{ \begin{array}{lr} 2x_{21} + 3x_{22} = 4x_{21}\\ 2x_{21} + x_{22} = 4x_{22} \end{array} \right.

Solving the first equation as a function of x_{21} results in:

(13) x_{22} = \frac{3}{2}x_{21}

We then arbitrarily choose x_{21}=2 and find x_{22}=3. Therefore, the eigenvector that corresponds to eigenvalue \lambda_2=4 is

(14) \vec{v}_2 = \begin{bmatrix} 3 \\ 2 \end{bmatrix}.

A large set of data can be represented as a matrix and we need to somehow compress the columns of the sparse matrix to speed up our calculations. Plus if we multiply a matrix by a vector then we achieve a new vector. The multiplication of a matrix by a vector is known as transformation matrices. We can transform and change matrices into new vectors by multiplying a matrix with a vector. The multiplication of the matrix by a vector computes a new vector. This is the transformed vector. Hold that thought for now!

The new vector can be considered to be in two forms:

  1. Sometimes, the new transformed vector is just a scaled form of the original vector. This means that the new vector can be re-calculated by simply multiplying a scalar (number) by the original vector; just as in the example of vectors A and B above.
  2. And other times, the transformed vector has no direct scalar relationship with the original vector which we used to multiply to the matrix.

If the new transformed vector is just a scaled form of the original vector then the original vector is known to be an eigenvector of the original matrix. Vectors that have this characteristic are special vectors and they are known as eigenvectors. Eigenvectors can be used to represent a large dimensional matrix.

Therefore, if our input is a large sparse matrix M then we can find a vector o that can replace the matrix M. The criteria is that the product of matrix M and vector o should be the product of vector o and a scalar n: M * o = n* o

This means that a matrix M and a vector o can be replaced by a scalar n and a vector o.

In this instance, o is the eigenvector and n is the eigenvalue and our target is to find o and n.

Therefore an eigenvector is a vector that does not change when a transformation is applied to it, except that it becomes a scaled version of the original vector.

Eigenvectors can help us calculate an approximation of a large matrix as a smaller vector. There are many other uses which I will explain later on in the article.

Eigenvectors are used to make linear transformation understandable. Think of eigenvectors as stretching/compressing an X-Y line chart without changing their direction.

Eigenvalue— The scalar that is used to transform (stretch) an Eigenvector.

Eigenvalues and eigenvectors are all about constructing one vector with one value to represent a large matrix. If a square matrix has a size n then we will get n eigenvalues and as a result, n eigenvectors will be computed to represent the matrix.

Visual Explanation:

Linear Transformations

Suppose A is a matrix of size m×n. Given a vector

Then T is a linear transformation from R^n to R^m

How is this used? Suppose you want to scale a 2d vector by a factor of 2 along the x-axis and 3 along the y-axis. Say the vector v is [1, 4] then after scaling it should be [2, 12]. This can be done as follows:

This may look trivial for one vector. But suppose if you have n number of 2d vectors that you want to scale, you can transform all of them at once by only one matrix multiplication operation. Linear Transformations are widely used in the field of Computer Graphics, Game Engines, Statistics, etc.

This operation is not only limited to scaling, but we can use linear transformation matrices for flipping vectors, rotating vectors, shearing vectors, etc.

Going back to Eigenvectors and Eigenvalues

Suppose we have a square represented in 2d space where every point on the square is a vector of which I will be using only 3 vectors as shown below.

Suppose we scale the square by a factor of 2 along the y-axis as shown below

Scaling by a factor of 2 along the y-axis

If you notice the red vector has the same scale and direction after the linear transformation. The green vector changes in scale but still has the same direction. Whereas the yellow vector neither has the same scale but also its angle with the x-axis increased, hence its direction also changed. If we look closely, apart from the red vector and the green vector all the other vectors’ directions changed. Hence we can say the red and green vectors are special and they are characteristic of this linear transform. These vectors are called eigenvectors of this linear transformation. And their change in scale due to the transformation is called their eigenvalue. For the red vector the eigenvalue is 1 since its scale is constant after and before the transformation, whereas for the green vector, its eigenvalue is 2 since it is scaled up by a factor of 2.

Let’s have a look at another linear transformation where we shear the square along the x-axis.

Shear along x-axis

If you guessed the red vector to be the eigenvector, you’re right and its eigenvalue is 1.

What if we rotate the square 90 degrees clockwise?

Rotate by 90 degrees clockwise

Here there are no eigenvectors (Academic people will argue that there are complex eigenvectors in this case, but they are far away from the scope of this article so let’s stick with this case having no eigenvectors for simplicity). What if instead of 90 degrees we rotated the square by 180 degrees?

Rotate by 180 degrees clockwise

Here all the vectors along with the three colored vectors are eigenvectors with an eigenvalue of -1.

Let’s have a look at a special case where we scale the square equally along the x and y-axis.

Scaling equally along the x and y-axis

Here all the vectors are eigenvectors and their eigenvalue would be the scale factor.

Now let’s go back to Wikipedia’s definition of eigenvectors and eigenvalues:

If T is a linear transformation from a vector space V over a field F into itself and v is a vector in V that is not the zero vector, then v is an eigenvector of T if T(v) is a scalar multiple of v. This condition can be written as the equation

T ( v ) = λ v

where λ is a scalar in the field F, known as the eigenvaluecharacteristic value, or characteristic root associated with the eigenvector v.

Let’s see how the equation works for the first case we saw where we scaled a square by a factor of 2 along the y axis where the red vector and green vector were the eigenvectors.

Scaling by a factor of 2 along the y-axis

Before the linear transformation:

After the linear transformation:

We can show that after the linear transformation using the equation:

Let’s scale the problem up to 3 dimensions. Suppose we rotate a cube along the z-axis then the vector along the z-axis would be the eigenvector with an eigenvalue of 1.

It gets hard to visualize and figure out the eigenvectors if we go above 3 dimensions. Even in some cases in 2d or 3d, it’s not that easy. So how do we figure out the eigenvectors?

We know that for any eigenvector v,

Let’s say the transformation matrix is A. Therefore,

From both of the above equations, we can deduce that,

Moving the terms together,

Now λ is just a scalar. We can take v as common out if we can convert it into a matrix since A is a matrix. To do so we can multiply λ with an identity matrix I. Therefore,

Now for the right-hand side to be 0 either (A-λI) should be 0 or/and v should be 0. But if you remember from the definition an eigenvector is a non-zero vector. So (A-λI) should always be 0 for v to be an eigenvector. We can calculate whether a matrix operation is 0 by calculating its determinant.

Therefore,

Let’s see if this works using the same example of scaling a square by a factor of 2 along the y-axis.

Here the transformation matrix A can be shown as:

Now we know that,

Putting in the values of A and solving further:

We know that,

Solving for λ = 1 we get,

This means for any vector where v2=0 that vector is an eigenvector with eigenvalue 1. It’s true for any vertical vector, which in our case was the red vector.

Solving for λ = 2 we get:

This means for any vector where v1=0 that vector is an eigenvector with eigenvalue 2. It’s true for any vertical vector, which in our case was the green vector.

Applications:

1. There are multiple uses of eigenvalues and eigenvectors: Eigenvalues and Eigenvectors have their importance in linear differential equations where you want to find a rate of change or when you want to maintain relationships between two variables. Think of eigenvalues and eigenvectors as providing a summary of a large matrix.

2. We can represent a large set of information in a matrix. Performing computations on a large matrix is a very slow process. To elaborate, one of the key methodologies to improve efficiency in computationally intensive tasks is to reduce the dimensions after ensuring most of the key information is maintained. Hence, one eigenvalue and eigenvector are used to capture key information that is stored in a large matrix. This technique can also be used to improve the performance of data churning components.

3. Component analysis is one of the key strategies that is utilized to reduce dimension space without losing valuable information. The core of component analysis (PCA) is built on the concept of eigenvalues and eigenvectors. The concept revolves around computing eigenvectors and eigenvalues of the covariance matrix of the features.

4. Additionally, eigenvectors and eigenvalues are used in facial recognition techniques such as EigenFaces.

5. They are used to reduce dimension space. The technique of Eigenvectors and Eigenvalues is used to compress the data. As mentioned above, many algorithms such as PCA rely on eigenvalues and eigenvectors to reduce the dimensions.

6. Eigenvalues are also used in regularisation and they can be used to prevent overfitting.

Eigenvectors and eigenvalues are used to reduce noise in data. They can help us improve efficiency in computationally intensive tasks. They also eliminate features that have a strong correlation between them and also help in reducing over-fitting.

6. Occasionally we gather data that contains a large amount of noise. Finding important or meaningful patterns within the data can be extremely difficult. Eigenvectors and eigenvalues can be used to construct spectral clustering. They are also used in singular value decomposition.

7. We can also use eigenvectors to rank items in a dataset. They are heavily used in search engines and calculus.

8. Lastly, in non-linear motion dynamics, eigenvalues and eigenvectors can be used to help us understand the data better as they can be used to transform and represent data into manageable sets.

Having said that, it can be slow to compute eigenvectors and eigenvalues. The computation is O(n³)

Calculate Eigenvalues and Eigenvectors in Python

Although we don’t have to calculate the Eigenvalues and Eigenvectors by hand, it is important to understand the inner workings to be able to confidently use the algorithms. Furthermore, It is very straightforward to calculate eigenvalues and eigenvectors in Python.

We can use numpy.linalg.eig module. It takes in a square matrix as the input and returns eigenvalues and eigenvectors. It also raises an LinAlgError if the eigenvalue computation does not converge.

import numpy as np
from numpy import linalg as LA
input = np.array([[2,-1],[4,3]])
w, v = LA.eig(input)
print(w)
print(v)

Conclusion

In this article, we reviewed the theoretical concepts of eigenvectors and eigenvalues. These concepts are of great importance in many techniques used in computer vision and machine learning, such as dimensionality reduction using PCA, or face recognition using EigenFaces.

Reference:

https://www.visiondummy.com/

https://medium.com/fintechexplained/what-are-eigenvalues-and-eigenvectors-a-must-know-concept-for-machine-learning-80d0fd330e47

https://towardsdatascience.com/eigenvectors-and-eigenvalues-all-you-need-to-know-df92780c591f

Leave a Reply

Your email address will not be published. Required fields are marked *