Here is a mathematical picture that I was looking at recently:
On the left hand size of the ‘approximately equal to’ sign () we have a picture of a cube which represents a tensor.
A matrix is a 2-dimensional grid of numbers and a tensor is a higher-dimensional version of a matrix. In this case the tensor pictured has three dimensions: the two usual directions contained within the screen and the one extra direction that appears to go into the screen.
The equation says that this can be approximated by a smaller 3D cube, and three 2D rectangles.
What’s the mathematics going on behind this idea?
The idea is to approximate a large cumbersome tensor by a smaller more manageable one. Most of the important information contained amongst the larger tensor’s numbers can be more concisely represented by the smaller one, and it’s smaller size makes it more receptive to performing computations or analysis.
Then there are the three rectangles. One looks like a parallelogram but is actually a rectangle going ‘into the page’ seen in perspective. In fact, all three rectangles follow one of the directions of the dimensions of the smaller tensor. They represent matrices (linear maps) which convert us from the small tensor back to the large tensor. They are the translational tools from the world of the small cube to the world of the big cube. When we apply these matrices to the corresponding directions of the tensor, we will get back a tensor the same size as the original big one, and with similar entries.
As well as being mathematically accurate, the picture is intuitively pleasing because the three rectangles look like they are unpacking the small tensor to get us back to the large tensor.
More mathematically speaking: this approximation is called the low multilinear rank approximation. It is a higher-dimensional analogue of the ubiquitous singular valued decomposition (SVD). A singular valued decomposition of a matrix gives us the best low-rank (‘small’) approximation to a large matrix.
Here in the higher-dimensional tensor situation, things are more challenging. We cannot make guarantees about our approximation being the best, but in practice this process gives us a smaller tensor whose information is a decent summary. It is a useful tool to understand big tensors of data.