Visualizing statistical models – it’s child’s play

Before you ask a mathematician if they can visualize the fourth dimension, ask them if they can truly visualize a three-dimensional object, like the boundary of a four-dimensional football. If they tell you it’s easy, and their name isn’t Maryna Viazovska, they’re probably lying.

Making an accurate picture of an object from a high dimensional space is very challenging. In this blog post we’ll see a surprising case where it turns out to be possible. We’ll visualize an interesting seven-dimensional object, which comes from a question in statistics.

Let’s consider the probability that each of the teams in the quarter-finals of the Men’s FIFA 2018 World Cup would win. The teams were (Uruguay, France, Brazil, Belgium, Russia, Croatia, Sweden, England). Today we know the probabilities of the teams winning, in that order, are (0,1,0,0,0,0,0,0), because France has already won. Back on 3rd July the probabilities (according to FiveThirtyEight) were (0.06, 0.15, 0.3, 0.11, 0.05, 0.12, 0.07, 0.14), and on 7th July the probabilities were (0,0.29,0,0.26,0,0.18,0,0.27).

In a recent project we were studying which probability distributions lie in a particular statistical model. We found out that our statistical model is given by inequalities that the eight probabilities need to satisfy. If we call the probabilities (a,b,c,d,e,f,g,h), the inequalities are:

(ad-bc)(eh-fg) \geq 0, \quad (af-be)(ch - dg) \geq 0, \quad (ag-ce)(bh-df) \geq 0 .

The probabilities have to sum to 1, so a + b + c + d + e + f + g + h = 1. We want to visualize the part of seven-dimensional space in which the inequalities hold. How can we do it?

The first step is to notice that some combinations of letters do not affect whether the inequalities hold or not. They are:

(a + b + c + d) - (e + f + g + h) , \quad (a + c + e + g) - (b + d + f + h) , \quad (a + b + e + f) - (c + d + g + h)

So we can apply a change of coordinates that removes these three directions, leaving something four-dimensional. Finally, to get something three-dimensional we can assume that the four remaining coordinates lie on the sphere.

We end up with a picture that looks like this:


The part of space that lies inside the statistical model are the points outside either the blue blob, the green blob, or the yellow blob.

These days, we have an even better way to visualize the statistical model, truly in 3D. It even doubles-up as a handmade toy for children.

IMG_20180716_094013 (1)
Order yours here


We can’t help but wonder – which other children’s toys are really statistical models in disguise?


A duality of pictures

Duality relates objects, which seem different at first but turn out to be similar. The concept of duality occurs almost everywhere in maths. If two objects seem different but are actually the same, we can view each object in a “usual” way, and in a “dual” way – the new vantage point is helpful for new understanding of the object.  In this blog post we’ll see a pictorial example of a mathematical duality.

How are these two graphs related?


bg2In the first graph, we have five vertices, the five black dots, and six green edges which connect them. For example, the five vertices could represent cities (San Francisco, Oakland, Sausalito etc. ) and the edges could be bridges between them.

In the second graph, the role of the cities and the bridges has swapped. Now the bridges are the vertices, and the edges (or hyperedges) are the cities. For example, we can imagine that the cities are large metropolises and the green vertices are the bridge tolls between one city and the next.

Apart from swapping the role of the vertices and the edges, the information in the two graphs is the same. If we shrink each city down to a dot in the second graph, and grow each bridge toll into a full bridge, we get the first graph. We will see that the graphs are dual to each other.

We represent each graph by a labeled matrix: we label the rows by the vertices and the columns by the edges, and we put a 1 in the matrix whenever the vertex is in the edge. For example, the entry for vertex 1 and edge a is 1, because edge a contains vertex 1. The matrix on the left is for the first graph, and the one on the right is for the second graph.


We can see that the information in the two graphs is the same from looking at the two matrices – they are the same matrix, transposed (or flipped). The matrix of a hypergraph is the transpose of the matrix of the dual hypergraph.

Mathematicians are always on the look-out for hidden dualities between seemingly different objects, and we are happy when we find them. For example, in a recent project we studied the connection between graphical models, from statistics, and tensor networks, from physics. We showed that the two constructions are the duals of each other, using the hypergraph duality we saw in this example.

Flattening a cube

If you conduct a survey, among some friends, consisting of three YES/NO questions, how can you summarize the responses?

I conducted a survey recently at a conference. The three questions were:

  • Is it your first time at the Mathematisches Forschungsinstitut Oberwolfach?
  • Do you like the weather?
  • Have you played any games?

Screen Shot 2017-08-15 at 11.49.45 AM

There are eight options for how someone could respond to three YES/NO questions. Taking YES=1, and NO=0, the eight options are labelled by the binary strings: 000, 001, 010, 100, 011, 101, 110, 111.

We can think of 0 and 1 as coordinates in space, and arrange the eight numbers into a cube:


This 3D arrangement reflects the fact that there are three questions in the survey. Since our dataset is small, there’s not much need for further analysis to compress or visualize the data. But for a larger survey, we will summarize the structural information in the data using principal components.

The first step of principal component analysis is to restructure the 3D cube of data into a 2D matrix. This is called “flattening” the cube. We combine two YES/NO questions from the survey into a single question with four possible responses. There are three choices for which questions to combine, so there are three possible ways to flatten the cube into a matrix:

\begin{bmatrix} p_{000} & p_{001} & p_{010} & p_{011} \\ p_{100} & p_{101} & p_{110} & p_{111} \end{bmatrix} \qquad \begin{bmatrix} p_{000} & p_{001} & p_{100} & p_{101} \\ p_{010} & p_{011} & p_{110} & p_{111} \end{bmatrix} \qquad \begin{bmatrix} p_{000} & p_{010} & p_{100} & p_{110} \\ p_{001} & p_{011} & p_{101} & p_{111} \end{bmatrix}

Our analysis of the data depends on which flattening we choose! Generally speaking, it’s bad news if an arbitrary decision has an impact on the conclusions of an analysis.

So we need to understand…

How do the principal components depend on the choice of flattening?

This picture give an answer to that question:


All points inside the star-shaped surface correspond to valid combinations of principal components from the three flattenings, while points outside are the invalid combinations. More details can be found here.

Tea with (Almond) Milk

Making a cup of tea in a hurry is a challenge. I want the tea to be as drinkable (cold) as possible after a short amount of time. Say, 5 minutes. What should I do: should I add milk to the tea at the beginning of the 5 minutes or at the end?


The rule we will use to work this out is Newton’s Law of Cooling. It says “the rate of heat loss of the tea is proportional to the difference in temperature between the tea and its surroundings”.

This means the temperature of the tea follows the differential equation T' = -k (T - T_s), where the constant k is a positive constant of proportionality. The minus sign is there because the tea is warmer than the room – so it is losing heat. Solving this differential equation, we get T = T_s + (A - T_s) e^{-kt}, where A is the initial temperature of the tea.

We’ll start by defining some variables, to set the question up mathematically. Most of them we won’t end up needing. Let’s say the tea, straight from the kettle, has temperature T_0. The cold milk has temperature m. We want to mix tea and milk in the ratio L:l. The temperature of the surrounding room is T_s.

Option 1: Add the milk at the start

We begin by immediately mixing the tea with the milk. This leaves us with a mixture whose temperature is \frac{T_0 L + m l }{L + l}. Now we leave the tea to cool. Its cooling follows the equation T = T_s +\left( \frac{T_0 L + m l }{L + l} - T_s \right) e^{-kt}. After five minutes, the temperature is

Option 1 = T_s +\left( \frac{T_0 L + m l }{L + l}- T_s \right) e^{-5k} .

Option 2: Add the milk at the end

For this option, we first leave the tea to cool. Its cooling follows the equation T = T_s + (T_0 - T_s) e^{-kt}. After five minutes, it has temperature T = T_s + (T_0 - T_s) e^{-5k}. Then, we add the milk in the specified ratio. The final concoction has temperature

Option 2 = \frac{(T_s + (T_0 - T_s) e^{-5k}) L + m l }{L + l}.

So which temperature is lower: the “Option 1” temperature or the “Option 2” temperature?

It turns out that most of the terms in the two expressions cancel out, and the inequality boils down to a comparison of e^{-5k} (T_s L - ml) (from Option 2) with (T_s L - ml) (from Option 1). The answer depends on whether T_s L - ml > 0. For our cup of tea, it will be: there’s more tea than milk (L > l) and the milk is colder than the surroundings (m < T_s). [What does this quantity represent?] Hence, since k is positive, we have e^{-5k} < 1, and option 2 wins: add the milk at the end.

But, does it really make a difference? (What’s the point of calculus?)

Well, we could plug in reasonable values for all the letters (T_0 = 95^o C, etc.) and see how different the two expressions are.

So, why tea with Almond milk?

My co-blogger Rachael is vegan. She inspires me to make my tea each morning with Almond milk.

Finally, here’s a picture of an empirical experiment from other people (thenakedscientists) tackling this important question:


Planes, trains and Kummer Surfaces

Here’s a short blog post for the holiday season, inspired by this article from Wolfram MathWorld. The topic is Kummer Surfaces, which are a particular family of algebraic varieties in 3-dimensional space. They make beautiful mathematical pictures, like these from their wikipedia page:


A Kummer surface is the points in space where a particular equation is satisfied. One way to describe them is as the zero-sets of equations like:

{(x^2 + y^2 + z^2 - \mu^2 )}^2 - \lambda (-z-\sqrt{2} x) ( -z + \sqrt{2} x) ( z + \sqrt{2} y ) ( z - \sqrt{2} y ).

The variables x, y , z are coordinates in 3-dimensional space, and \lambda and \mu are two parameters, related by the equation \lambda ( 3 - \mu^2) = 3 \mu^2 - 1. As we change the value of the parameter, the equation changes, and its zero set changes too.

What does the Kummer Surface look like as the parameter \mu changes?

When the parameter \mu^2 = 3, the non-linearity of the Kummer surface disappears, the surface degenerates to a union of four planes.


When the parameter is close to 3, we’re between planes and Kummer surfaces:


And for \mu^2 = 1.5, we see the 16 singular points surrounding five almost-tetrahedra, in the center. A zoomed in version is in my other blog post that featured Kummer Surfaces.


Ok, I can see “planes” and “Kummer surface”, but what about “trains”? Well, I guess you say that when a parameter is changing, often something is being trained. Though, er, not here.

This equation is not for a Kummer surface, but it’s not so dissimilar either. It came up recently in one of my research projects:

{\left( x^2 + y^2 + z^2 - 2( x y + x z + y z ) \right)}^2  - 2(x + y - z )( x - y + z ) ( - x + y + z )

P.S. The code (language=Mathematica) that I used to make the video is here:

anim = Animate[
 ContourPlot3D[{(x^2 + y^2 + z^2 - 
 musq)^2 - ((3*musq - 1)/(3 - musq))*(1 - z - 
 sq2*x)*(1 - z + sq2*x)*(1 + z + sq2*y)*(1 + z - sq2*y) == 
 0}, {x, -5, 5}, {y, -5, 5}, {z, -5, 5}, 
 PerformanceGoal -> "Quality", BoxRatios -> 1, 
 PlotRange -> 1], {musq, 3.001, 1, 0.0002}];

Un-knotting an Un-knot

In a recent project, I was thinking about curves in 3D space. For example, think of the slow-motion replays of the trajectory of a tennis ball that they show at Wimbledon.


The idea is to take a fixed curve in space. Then, to stand at a particular location in space and photograph the curve. This gives a 2D image of it. We are interested in if this 2D image curve intersects itself.

One example is this 3D curve:

Seen from most perspectives, the curve does not self-intersect
From some vantage points, we see a loop: the curve crosses itself.











We want to map out all places in 3D space where there is a crossing. This question is tangentially related to real tensor decomposition (reading the paper will reveal that this last sentence is true and also a pun!).

To separate the viewpoints for which there is a crossing, from those for which there isn’t, we need to find the boundary cases between the two. There turn out to be only two ways, locally speaking, that a curve can transition from having a crossing to not having one, as we change the viewpoint slightly.

The first move, the T-move, gradually untwists a single loop, which we can see happening for the curve above. The second move, the E-move, starts with two arcs of the curve that are overlapping, and our position changes so that they cease to overlap. This second case is more elusive than the first case.

For some curves, both moves occur:

Looking carefully at this example, we see that a crossing appears for an E-move reason. But that the curve becomes un-twisted again for a T-move reason.

So far, we’ve seen a curve which transitions from crossed to uncrossed, or vice-versa, only via a T-move. We also saw a curve that crosses/un-crosses itself both via a T-move and via an E-move. What about the other case? Does there exist a curve that can only cross and un-cross itself via E-moves?

If so, what would this curve look like?

  • T-moves could still exist: we can have loops that appear and then untwist themselves. The crucial thing is that such an untwisting cannot cause there to be no crossings. It can only happen if there is another crossing elsewhere on the curve that stops this from being a true transition point.
  • The curve has to have some viewpoints from which it looks completely un-tangled (no crossings). If a curve crosses over itself, as seen from every possible angle, then we wouldn’t have an E-move boundary point between the self-intersecting and non-self-intersecting parts. One example of a 3D curve that has crossings, regardless of which way you look, is a knot such as your shoelaces.


I thought about the question of making an “E-move-only” curve for a day or two. One morning I sat in a cafe with a friend and constructed possibilities using plastic straws. So, if you see someone playing with straws at a cafe: just think! they could be a maths phd student. Or, you know, a child.

And here it is!

This example is different than the ones above – it’s not anything like the trajectory of a tennis ball (unless the tennis ball is navigating a complex architectural construction in zero-gravity). The curve is made of a collection of straight lines. If we wanted a smooth curve, we could smooth the corners slightly without changing the E-move property. But it remains to be seen if we can find a low-degree algebraic example like the ones above.

To download a .cdf version of the curve, click here.

And here’s some code I used to generate it.

[sourcecode language="mathematica"]
szl = -1; sz = 4;
a = ParametricPlot3D[{{v, 0, 0}, {v + 2, 0, 0}, {1, 1.5*v, 0}, {v + 1,
 1.5, 0}, {2, 1.5*v, 0}, {3, v, 0}, {3, v + 2, 0}, {3, 1, 
 1.5*v}, {3, v + 1, 1.5}, {3, 2, 1.5*v}, {3, 3, v}, {3, 3, 
 v + 2}, {1.5*(v + 1), 3, 1}, {1.5, 3, v + 1}, {1.5*(v + 1), 3, 
 2}, {v, 3, 3}, {v + 2, 3, 3}, {1, 1.5*(v + 1), 3}, {v + 1, 1.5, 
 3}, {2, 1.5*(v + 1), 3}, {0, v, 3}, {0, v + 2, 3}, {0, 1, 
 1.5*(v + 1)}, {0, v + 1, 1.5}, {0, 2, 1.5*(v + 1)}, {0, 0, 
 v}, {0, 0, v + 2}, {1.5*v, 0, 1}, {1.5, 0, v + 1}, {1.5*v, 0, 
 2}}, {v, 0, 1}, PlotRange -> {{szl, sz}, {szl, sz}, {szl, sz}}, 
 ViewPoint -> {1.3, -2.4, 2}];

P.S. I know, I know, there are 101 ways to export an animation/gif/table/3dplot… from mathematica and embed it into a WordPress post. All of them are better than an embedded YouTube video, but none of them work. If you know one that works, get in touch!

ANNA’s notebook -> AMS notices

Here at “Picture this Maths”, we were very lucky last month to be featured by the American Mathematical Society (AMS) on their Blog on Math Blogs! It is wonderful to have people reading and sharing our blog in the mathematical community and beyond.

This blog post also tells an exciting AMS story. It is on the topic of tensors (like this post, and this one, and this one too). It’s about a mathematical picture which started out as a cartoon in my notebook.


Version 1: ANNA’s notebook

It ended up — much souped up, with help from computer graphics experts — on the front cover of the June/July issue of the Notices of the AMS. The full issue is available here.

Version 2: AMS Notices

So, what’s going on in this picture?

The story begins, as with many stories (ok, many of my stories)  with singular vectors and singular values of matrices. To understand mathematical concepts, it’s useful to have a picture in mind. Luckily, singular vectors and singular values of matrices lend themselves extremely well to visual description. Just take a look at this wikipedia gif.

A matrix can be thought of in complementary ways, either as a two-dimensional grid of data, or as the information that encodes a linear transformation of a space. The gif is about matrices as linear maps. Below are a couple of still images from it. They show how a linear transformation of space


can be decomposed as the combination of three “building block” transformations, each of which is far easier to understand. A rotation V^*, a coordinate scaling \Sigma and then another rotation U^*


What about visualizing the singular vectors and singular values of tensors?

Here, the story is more complicated, not least because the greater number of dimensions makes visualizing things harder. Usually, matrices have a finite number of singular vectors, and the same is true of tensors. But, like for the matrix case, some tensors have infinitely many singular vectors, and the singular vectors themselves form an interesting structure.

The picture shows the structure of the singular vectors of a four-dimensional orthogonally decomposable tensor of size 2 \times 2 \times 3 \times 3. For more on the ‘maths behind the picture’, see this About The Cover article from the AMS.