Tensor approximation of a colour photo

This post is about using low-rank tensor approximations to roughly capture the information in a colour photograph. As far as I know, it is not used in practice as part of a method for compressing colour photos. However, it gives a nice illustration of what’s happening when you find a low-rank tensor approximation, such as is done for compressing tensor data of other kinds.

First, we’ll talk about the simpler case of a greyscale (black and white) photograph. Such a photo has one number for each pixel which describes the brightness of that pixel. The numbers range from 0=black to 1=white to indicate how bright that part of the photo should be.

A standard size for a photo is 1000 x 667 pixels. This means the height of the photo is 1000 pixels and the width is 667 pixels. The information of the photo is a grid of numbers (a matrix) of size 1000 x 667, where each number in the matrix is the brightness of the corresponding pixel.

For a colour photograph, each pixel is described by three numbers, ranging from 0 to 1, that describe the redness, blueness and greenness of that pixel. The photo is described by a 3D array (a tensor) of size 1000 x 667 x 3. The height and width dimensions of the image have stayed the same, but we now have an extra “RGB” coordinate that tells us if we are referring to red intensity, blue intensity or green intensity.

A low-rank approximation of an array of data compresses the information it contains. For the matrix (greyscale) case, the best low-rank approximation is found using the Singular Value Decomposition (SVD). People learn this method in an undergraduate linear algebra class. For biological matrix data, using the SVD to view the most variance in the data (Principal Component Analysis) is widely used.

If we want a low-rank approximation of a colour photo (a tensor), we can no longer in general use the SVD (although sometimes we can). We use tensor methods instead. The numerics, and theory behind these method is much less well-understood and less widely taught. People tend to focus on the 2D case. This might explain why some people find linear algebra to be a little grey.

Using the software package Tensorlab, I took an original color photograph and found its best rank 10 tensor approximation.

Jun7

The tensor approximation has captured quite a few features of the original image. Rather than the 1000 x 667 x 3 = 2,001,000 numbers needed to store the original image, the new one requires only (1000 + 667 + 3 – 2) x 10 = 16,680 numbers. Of course, storage concerns are not likely to be a problem for a single photo but for large data this data compression becomes important. If the original ‘photo’ had been a billion by a billion pixels, we’d get a compression factor of one hundred million.

Turning the tensor into three matrices. For comparison, I will talk about the option of considering the color photo (tensor) as three matrices: a red matrix, a blue matrix and a green matrix, each of size 1000 x 667. We divide up the cat photo into three pieces, by extracting the parts of each colour:

Jun8

We can find a low rank approximation of each slice, one by one, using the SVD. If we wanted to store the image in 16,000 numbers overall, as we did in the tensor case, we take a rank 3 approximation in each slice: this requires (1000 + 667 – 1) x 3 x 3 numbers. To get back the original picture, we stack the three slices in their original order. A comparison of the two methods is here:

Jun8no2

We see that the tensor method has recovered more features of the original image for the same amount of data storage! This post resulted from a fun conversation with Daniel Lowengrub yesterday, and the cat photo is from Piotr Achinger.

Here is the code I used:

[code language="matlab"]
% best rank r tensor approximation
r = 10;
A = imread('fig2.jpg');
A = single(A); m = max(max(max(A))); A = A./m;
U = cpd(A,r);
Tensor_approx = cpdgen(U);

% best rank 3 * (r/3) matrix approximation
k = floor(r/3);
for i = 1:3;
    CC{i} = A(:,:,i);
    [U,S,V] = svd(CC{i});
    Ck{i} = U(:,1:k)*S(1:k,1:k)*V(:,1:k)';
end
sz = size(Ck{1});
D = zeros(sz(1),sz(2),3);
for i = 1:3;
    D(:,:,i) = Ck{i};
end
Matrix_approx = D;
[/code]
Advertisements

8 thoughts on “Tensor approximation of a colour photo

  1. lukehutch June 9, 2016 / 6:43 am

    Hi, could you please also show the results for separated then recombined low-rank approximations in the HLS and/or HSV colorspace? I think you’ll get better results than with RGB.

    Like

    • anna June 9, 2016 / 6:38 pm

      Thanks for your comment!

      I used RGB because of the nice visual representation of each slice. But at your request I tried HSV as well. Actually the results were quite bad – surprisingly, they were worse than the RGB approach. I would be happy to share them with you! (sorry that I can’t reply to your comment with an image)

      Like

      • lukehutch June 9, 2016 / 7:10 pm

        Huh, fascinating, I would have expected improved quality! What about YUV? It was designed to correspond roughly with aspects of human perception. Feel free to update the post, or if you don’t want to do that, I’d love to see the results, you can email the username on this comment followed by @gmail.com. Also, any chance you could post your code, please?

        Like

    • anna June 9, 2016 / 7:32 pm

      I’ll email it to you. I updated the post to include my code 🙂

      Like

  2. adam ginensky June 12, 2016 / 6:30 pm

    Can you give an expository reference for the tensor svd ? I don’t understand where the ‘-2’ in (10000+667 +3 -2) comes from.

    Like

    • anna June 13, 2016 / 7:29 pm

      Hi, thanks for your comment! The number (1000 + 667 + 3 – 2) gives the count for the number of independent entries in a rank 1 tensor of size 1000 x 667 x 3.

      A rank 1 tensor is the outer product of three vectors of lengths 1000, 667 and 3: it is specified by 1000 + 667 + 3 entries. But taking the outer product is unchanged under some scaling operations. To control for this, we can set the first two vectors to be of norm 1, and incorporate all scaling information into the vector of length 3. This gives the “-2” in the expression, since we remove one degree of freedom from each of the first two vectors.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s