## Causality

In my last post I showed a picture of a surface in 3D space that gave us information about a probability distribution. This week’s post also finishes with such an image!

It is a problem of central importance in all walks of science to be able to say whether or not “X causes Y”. It’s important to know when we have enough information to be able to make such a declaration.

One way to look at causality is via a “directed acyclic graph” or “DAG”. This is a collection of vertices with arrows connecting them, such that there is no way to follow some arrows and get back to your starting point (there are no “cycles” in the graph). Here is an example, from this paper about the transgenerational impact of nicotine exposure – whether being around smokers makes you want to smoke:

Given some observations, we would like to be able to build such a graph. One condition which allows us to do this is called the “faithfulness assumption”, which imposes conditions on the conditional independences of the observed things, for example it tells us information of the form “X is independent of Y given Z”: the only way that X and Y are related is via Z.

This condition is explained in greater detail in this paper “Geometry of the Faithfulness Assumption in Causal Inference” by Caroline Uhler et al. which this blog post is based on and which both the following two images are taken from.

They consider the following graph, which is much smaller and more simple than the one pictured above:

We have arrows $1 \to 2$ , $2 \to 3$ and $1 \to 3$. Note that whilst it might look as though this graph contains a cycle, it is not possible to follow the directions of the edges and travel in a full loop around the triangle.

The parameters which give the strength of the causality relations between vertices 1, 2 and 3 are given by the weight of the edge that connects them. We have three edges appearing above, so we can consider the distribution as being a point, $(x,y,z)$, in 3-dimensional space, where

1. $x$ is the weight of the edge $1 \to 2$
2. $y$ is the weight of the edge $1 \to 3$
3. and $z$ is the weight of the edge $2 \to 3$.

Since “faithful” combinations of $(x,y,z)$ allow us to make inferences, we want to look at the potential problem areas where we are close to an “unfaithful” combination of $(x,y,z)$. This picture from the paper shows when we are in a problem area for any of the three problems (green, blue and red) which may occur, and the last picture combines these to show the points in 3D space which experience any one of the three possible problems:

In order to make accurate conclusions in applications we would have to ensure that our distribution does not lie close to any of these problem areas.