A duality of pictures

Duality relates objects, which seem different at first but turn out to be similar. The concept of duality occurs almost everywhere in maths. If two objects seem different but are actually the same, we can view each object in a “usual” way, and in a “dual” way – the new vantage point is helpful for new understanding of the object.  In this blog post we’ll see a pictorial example of a mathematical duality.

How are these two graphs related?


bg2In the first graph, we have five vertices, the five black dots, and six green edges which connect them. For example, the five vertices could represent cities (San Francisco, Oakland, Sausalito etc. ) and the edges could be bridges between them.

In the second graph, the role of the cities and the bridges has swapped. Now the bridges are the vertices, and the edges (or hyperedges) are the cities. For example, we can imagine that the cities are large metropolises and the green vertices are the bridge tolls between one city and the next.

Apart from swapping the role of the vertices and the edges, the information in the two graphs is the same. If we shrink each city down to a dot in the second graph, and grow each bridge toll into a full bridge, we get the first graph. We will see that the graphs are dual to each other.

We represent each graph by a labeled matrix: we label the rows by the vertices and the columns by the edges, and we put a 1 in the matrix whenever the vertex is in the edge. For example, the entry for vertex 1 and edge a is 1, because edge a contains vertex 1. The matrix on the left is for the first graph, and the one on the right is for the second graph.


We can see that the information in the two graphs is the same from looking at the two matrices – they are the same matrix, transposed (or flipped). The matrix of a hypergraph is the transpose of the matrix of the dual hypergraph.

Mathematicians are always on the look-out for hidden dualities between seemingly different objects, and we are happy when we find them. For example, in a recent project we studied the connection between graphical models, from statistics, and tensor networks, from physics. We showed that the two constructions are the duals of each other, using the hypergraph duality we saw in this example.


Flattening a cube

If you conduct a survey, among some friends, consisting of three YES/NO questions, how can you summarize the responses?

I conducted a survey recently at a conference. The three questions were:

  • Is it your first time at the Mathematisches Forschungsinstitut Oberwolfach?
  • Do you like the weather?
  • Have you played any games?

Screen Shot 2017-08-15 at 11.49.45 AM

There are eight options for how someone could respond to three YES/NO questions. Taking YES=1, and NO=0, the eight options are labelled by the binary strings: 000, 001, 010, 100, 011, 101, 110, 111.

We can think of 0 and 1 as coordinates in space, and arrange the eight numbers into a cube:


This 3D arrangement reflects the fact that there are three questions in the survey. Since our dataset is small, there’s not much need for further analysis to compress or visualize the data. But for a larger survey, we will summarize the structural information in the data using principal components.

The first step of principal component analysis is to restructure the 3D cube of data into a 2D matrix. This is called “flattening” the cube. We combine two YES/NO questions from the survey into a single question with four possible responses. There are three choices for which questions to combine, so there are three possible ways to flatten the cube into a matrix:

\begin{bmatrix} p_{000} & p_{001} & p_{010} & p_{011} \\ p_{100} & p_{101} & p_{110} & p_{111} \end{bmatrix} \qquad \begin{bmatrix} p_{000} & p_{001} & p_{100} & p_{101} \\ p_{010} & p_{011} & p_{110} & p_{111} \end{bmatrix} \qquad \begin{bmatrix} p_{000} & p_{010} & p_{100} & p_{110} \\ p_{001} & p_{011} & p_{101} & p_{111} \end{bmatrix}

Our analysis of the data depends on which flattening we choose! Generally speaking, it’s bad news if an arbitrary decision has an impact on the conclusions of an analysis.

So we need to understand…

How do the principal components depend on the choice of flattening?

This picture give an answer to that question:


All points inside the star-shaped surface correspond to valid combinations of principal components from the three flattenings, while points outside are the invalid combinations. More details can be found here.

Tea with (Almond) Milk

Making a cup of tea in a hurry is a challenge. I want the tea to be as drinkable (cold) as possible after a short amount of time. Say, 5 minutes. What should I do: should I add milk to the tea at the beginning of the 5 minutes or at the end?


The rule we will use to work this out is Newton’s Law of Cooling. It says “the rate of heat loss of the tea is proportional to the difference in temperature between the tea and its surroundings”.

This means the temperature of the tea follows the differential equation T' = -k (T - T_s), where the constant k is a positive constant of proportionality. The minus sign is there because the tea is warmer than the room – so it is losing heat. Solving this differential equation, we get T = T_s + (A - T_s) e^{-kt}, where A is the initial temperature of the tea.

We’ll start by defining some variables, to set the question up mathematically. Most of them we won’t end up needing. Let’s say the tea, straight from the kettle, has temperature T_0. The cold milk has temperature m. We want to mix tea and milk in the ratio L:l. The temperature of the surrounding room is T_s.

Option 1: Add the milk at the start

We begin by immediately mixing the tea with the milk. This leaves us with a mixture whose temperature is \frac{T_0 L + m l }{L + l}. Now we leave the tea to cool. Its cooling follows the equation T = T_s +\left( \frac{T_0 L + m l }{L + l} - T_s \right) e^{-kt}. After five minutes, the temperature is

Option 1 = T_s +\left( \frac{T_0 L + m l }{L + l}- T_s \right) e^{-5k} .

Option 2: Add the milk at the end

For this option, we first leave the tea to cool. Its cooling follows the equation T = T_s + (T_0 - T_s) e^{-kt}. After five minutes, it has temperature T = T_s + (T_0 - T_s) e^{-5k}. Then, we add the milk in the specified ratio. The final concoction has temperature

Option 2 = \frac{(T_s + (T_0 - T_s) e^{-5k}) L + m l }{L + l}.

So which temperature is lower: the “Option 1” temperature or the “Option 2” temperature?

It turns out that most of the terms in the two expressions cancel out, and the inequality boils down to a comparison of e^{-5k} (T_s L - ml) (from Option 2) with (T_s L - ml) (from Option 1). The answer depends on whether T_s L - ml > 0. For our cup of tea, it will be: there’s more tea than milk (L > l) and the milk is colder than the surroundings (m < T_s). [What does this quantity represent?] Hence, since k is positive, we have e^{-5k} < 1, and option 2 wins: add the milk at the end.

But, does it really make a difference? (What’s the point of calculus?)

Well, we could plug in reasonable values for all the letters (T_0 = 95^o C, etc.) and see how different the two expressions are.

So, why tea with Almond milk?

My co-blogger Rachael is vegan. She inspires me to make my tea each morning with Almond milk.

Finally, here’s a picture of an empirical experiment from other people (thenakedscientists) tackling this important question:


Planes, trains and Kummer Surfaces

Here’s a short blog post for the holiday season, inspired by this article from Wolfram MathWorld. The topic is Kummer Surfaces, which are a particular family of algebraic varieties in 3-dimensional space. They make beautiful mathematical pictures, like these from their wikipedia page:


A Kummer surface is the points in space where a particular equation is satisfied. One way to describe them is as the zero-sets of equations like:

{(x^2 + y^2 + z^2 - \mu^2 )}^2 - \lambda (-z-\sqrt{2} x) ( -z + \sqrt{2} x) ( z + \sqrt{2} y ) ( z - \sqrt{2} y ).

The variables x, y , z are coordinates in 3-dimensional space, and \lambda and \mu are two parameters, related by the equation \lambda ( 3 - \mu^2) = 3 \mu^2 - 1. As we change the value of the parameter, the equation changes, and its zero set changes too.

What does the Kummer Surface look like as the parameter \mu changes?

When the parameter \mu^2 = 3, the non-linearity of the Kummer surface disappears, the surface degenerates to a union of four planes.


When the parameter is close to 3, we’re between planes and Kummer surfaces:


And for \mu^2 = 1.5, we see the 16 singular points surrounding five almost-tetrahedra, in the center. A zoomed in version is in my other blog post that featured Kummer Surfaces.


Ok, I can see “planes” and “Kummer surface”, but what about “trains”? Well, I guess you say that when a parameter is changing, often something is being trained. Though, er, not here.

This equation is not for a Kummer surface, but it’s not so dissimilar either. It came up recently in one of my research projects:

{\left( x^2 + y^2 + z^2 - 2( x y + x z + y z ) \right)}^2  - 2(x + y - z )( x - y + z ) ( - x + y + z )

P.S. The code (language=Mathematica) that I used to make the video is here:

anim = Animate[
 ContourPlot3D[{(x^2 + y^2 + z^2 - 
 musq)^2 - ((3*musq - 1)/(3 - musq))*(1 - z - 
 sq2*x)*(1 - z + sq2*x)*(1 + z + sq2*y)*(1 + z - sq2*y) == 
 0}, {x, -5, 5}, {y, -5, 5}, {z, -5, 5}, 
 PerformanceGoal -> "Quality", BoxRatios -> 1, 
 PlotRange -> 1], {musq, 3.001, 1, 0.0002}];

Un-knotting an Un-knot

In a recent project, I was thinking about curves in 3D space. For example, think of the slow-motion replays of the trajectory of a tennis ball that they show at Wimbledon.


The idea is to take a fixed curve in space. Then, to stand at a particular location in space and photograph the curve. This gives a 2D image of it. We are interested in if this 2D image curve intersects itself.

One example is this 3D curve:

Seen from most perspectives, the curve does not self-intersect
From some vantage points, we see a loop: the curve crosses itself.











We want to map out all places in 3D space where there is a crossing. This question is tangentially related to real tensor decomposition (reading the paper will reveal that this last sentence is true and also a pun!).

To separate the viewpoints for which there is a crossing, from those for which there isn’t, we need to find the boundary cases between the two. There turn out to be only two ways, locally speaking, that a curve can transition from having a crossing to not having one, as we change the viewpoint slightly.

The first move, the T-move, gradually untwists a single loop, which we can see happening for the curve above. The second move, the E-move, starts with two arcs of the curve that are overlapping, and our position changes so that they cease to overlap. This second case is more elusive than the first case.

For some curves, both moves occur:

Looking carefully at this example, we see that a crossing appears for an E-move reason. But that the curve becomes un-twisted again for a T-move reason.

So far, we’ve seen a curve which transitions from crossed to uncrossed, or vice-versa, only via a T-move. We also saw a curve that crosses/un-crosses itself both via a T-move and via an E-move. What about the other case? Does there exist a curve that can only cross and un-cross itself via E-moves?

If so, what would this curve look like?

  • T-moves could still exist: we can have loops that appear and then untwist themselves. The crucial thing is that such an untwisting cannot cause there to be no crossings. It can only happen if there is another crossing elsewhere on the curve that stops this from being a true transition point.
  • The curve has to have some viewpoints from which it looks completely un-tangled (no crossings). If a curve crosses over itself, as seen from every possible angle, then we wouldn’t have an E-move boundary point between the self-intersecting and non-self-intersecting parts. One example of a 3D curve that has crossings, regardless of which way you look, is a knot such as your shoelaces.


I thought about the question of making an “E-move-only” curve for a day or two. One morning I sat in a cafe with a friend and constructed possibilities using plastic straws. So, if you see someone playing with straws at a cafe: just think! they could be a maths phd student. Or, you know, a child.

And here it is!

This example is different than the ones above – it’s not anything like the trajectory of a tennis ball (unless the tennis ball is navigating a complex architectural construction in zero-gravity). The curve is made of a collection of straight lines. If we wanted a smooth curve, we could smooth the corners slightly without changing the E-move property. But it remains to be seen if we can find a low-degree algebraic example like the ones above.

To download a .cdf version of the curve, click here.

And here’s some code I used to generate it.

[sourcecode language="mathematica"]
szl = -1; sz = 4;
a = ParametricPlot3D[{{v, 0, 0}, {v + 2, 0, 0}, {1, 1.5*v, 0}, {v + 1,
 1.5, 0}, {2, 1.5*v, 0}, {3, v, 0}, {3, v + 2, 0}, {3, 1, 
 1.5*v}, {3, v + 1, 1.5}, {3, 2, 1.5*v}, {3, 3, v}, {3, 3, 
 v + 2}, {1.5*(v + 1), 3, 1}, {1.5, 3, v + 1}, {1.5*(v + 1), 3, 
 2}, {v, 3, 3}, {v + 2, 3, 3}, {1, 1.5*(v + 1), 3}, {v + 1, 1.5, 
 3}, {2, 1.5*(v + 1), 3}, {0, v, 3}, {0, v + 2, 3}, {0, 1, 
 1.5*(v + 1)}, {0, v + 1, 1.5}, {0, 2, 1.5*(v + 1)}, {0, 0, 
 v}, {0, 0, v + 2}, {1.5*v, 0, 1}, {1.5, 0, v + 1}, {1.5*v, 0, 
 2}}, {v, 0, 1}, PlotRange -> {{szl, sz}, {szl, sz}, {szl, sz}}, 
 ViewPoint -> {1.3, -2.4, 2}];

P.S. I know, I know, there are 101 ways to export an animation/gif/table/3dplot… from mathematica and embed it into a WordPress post. All of them are better than an embedded YouTube video, but none of them work. If you know one that works, get in touch!

ANNA’s notebook -> AMS notices

Here at “Picture this Maths”, we were very lucky last month to be featured by the American Mathematical Society (AMS) on their Blog on Math Blogs! It is wonderful to have people reading and sharing our blog in the mathematical community and beyond.

This blog post also tells an exciting AMS story. It is on the topic of tensors (like this post, and this one, and this one too). It’s about a mathematical picture which started out as a cartoon in my notebook.


Version 1: ANNA’s notebook

It ended up — much souped up, with help from computer graphics experts — on the front cover of the June/July issue of the Notices of the AMS. The full issue is available here.

Version 2: AMS Notices

So, what’s going on in this picture?

The story begins, as with many stories (ok, many of my stories)  with singular vectors and singular values of matrices. To understand mathematical concepts, it’s useful to have a picture in mind. Luckily, singular vectors and singular values of matrices lend themselves extremely well to visual description. Just take a look at this wikipedia gif.

A matrix can be thought of in complementary ways, either as a two-dimensional grid of data, or as the information that encodes a linear transformation of a space. The gif is about matrices as linear maps. Below are a couple of still images from it. They show how a linear transformation of space


can be decomposed as the combination of three “building block” transformations, each of which is far easier to understand. A rotation V^*, a coordinate scaling \Sigma and then another rotation U^*


What about visualizing the singular vectors and singular values of tensors?

Here, the story is more complicated, not least because the greater number of dimensions makes visualizing things harder. Usually, matrices have a finite number of singular vectors, and the same is true of tensors. But, like for the matrix case, some tensors have infinitely many singular vectors, and the singular vectors themselves form an interesting structure.

The picture shows the structure of the singular vectors of a four-dimensional orthogonally decomposable tensor of size 2 \times 2 \times 3 \times 3. For more on the ‘maths behind the picture’, see this About The Cover article from the AMS.


Tensor approximation of a colour photo

This post is about using low-rank tensor approximations to roughly capture the information in a colour photograph. As far as I know, it is not used in practice as part of a method for compressing colour photos. However, it gives a nice illustration of what’s happening when you find a low-rank tensor approximation, such as is done for compressing tensor data of other kinds.

First, we’ll talk about the simpler case of a greyscale (black and white) photograph. Such a photo has one number for each pixel which describes the brightness of that pixel. The numbers range from 0=black to 1=white to indicate how bright that part of the photo should be.

A standard size for a photo is 1000 x 667 pixels. This means the height of the photo is 1000 pixels and the width is 667 pixels. The information of the photo is a grid of numbers (a matrix) of size 1000 x 667, where each number in the matrix is the brightness of the corresponding pixel.

For a colour photograph, each pixel is described by three numbers, ranging from 0 to 1, that describe the redness, blueness and greenness of that pixel. The photo is described by a 3D array (a tensor) of size 1000 x 667 x 3. The height and width dimensions of the image have stayed the same, but we now have an extra “RGB” coordinate that tells us if we are referring to red intensity, blue intensity or green intensity.

A low-rank approximation of an array of data compresses the information it contains. For the matrix (greyscale) case, the best low-rank approximation is found using the Singular Value Decomposition (SVD). People learn this method in an undergraduate linear algebra class. For biological matrix data, using the SVD to view the most variance in the data (Principal Component Analysis) is widely used.

If we want a low-rank approximation of a colour photo (a tensor), we can no longer in general use the SVD (although sometimes we can). We use tensor methods instead. The numerics, and theory behind these method is much less well-understood and less widely taught. People tend to focus on the 2D case. This might explain why some people find linear algebra to be a little grey.

Using the software package Tensorlab, I took an original color photograph and found its best rank 10 tensor approximation.


The tensor approximation has captured quite a few features of the original image. Rather than the 1000 x 667 x 3 = 2,001,000 numbers needed to store the original image, the new one requires only (1000 + 667 + 3 – 2) x 10 = 16,680 numbers. Of course, storage concerns are not likely to be a problem for a single photo but for large data this data compression becomes important. If the original ‘photo’ had been a billion by a billion pixels, we’d get a compression factor of one hundred million.

Turning the tensor into three matrices. For comparison, I will talk about the option of considering the color photo (tensor) as three matrices: a red matrix, a blue matrix and a green matrix, each of size 1000 x 667. We divide up the cat photo into three pieces, by extracting the parts of each colour:


We can find a low rank approximation of each slice, one by one, using the SVD. If we wanted to store the image in 16,000 numbers overall, as we did in the tensor case, we take a rank 3 approximation in each slice: this requires (1000 + 667 – 1) x 3 x 3 numbers. To get back the original picture, we stack the three slices in their original order. A comparison of the two methods is here:


We see that the tensor method has recovered more features of the original image for the same amount of data storage! This post resulted from a fun conversation with Daniel Lowengrub yesterday, and the cat photo is from Piotr Achinger.

Here is the code I used:

[code language="matlab"]
% best rank r tensor approximation
r = 10;
A = imread('fig2.jpg');
A = single(A); m = max(max(max(A))); A = A./m;
U = cpd(A,r);
Tensor_approx = cpdgen(U);

% best rank 3 * (r/3) matrix approximation
k = floor(r/3);
for i = 1:3;
    CC{i} = A(:,:,i);
    [U,S,V] = svd(CC{i});
    Ck{i} = U(:,1:k)*S(1:k,1:k)*V(:,1:k)';
sz = size(Ck{1});
D = zeros(sz(1),sz(2),3);
for i = 1:3;
    D(:,:,i) = Ck{i};
Matrix_approx = D;