I met Rachael at the LMS Women in Mathematics Day in Edinburgh in April, where I gave a talk about using graph theory to understand the spread of disease in livestock, and she invited me to write a blog on this topic. Everything I’m going to say here is based on joint work with Dr Jess Enright from the University of Stirling (supported by Scottish Government as part of EPIC: Scotland’s Centre of Expertise on Animal Disease Outbreaks), who knows a lot more about cattle than I do!
It seems obvious that, when we want to understand how a disease spreads through a population, we should consider which members of the population have contact with each other – this simple observation has given rise to the field of Network Epidemiology. If we’re trying to understand the spread of disease in humans, one of the key challenges is to guess the underlying contact network – who has contact with whom – and a lot of interesting work has been done on this. Looking at the spread of disease in livestock, however, we don’t have to worry about this: all movements of farm animals have to be reported to the relevant government agencies, so we (well, people I collaborate with, who provide advice to the Scottish government on controlling animal disease outbreaks) know exactly which animals have contact with each other.
Depending on the specific disease we are interested in, there are several different kinds of contact we might need to consider. If animals need to be in close contact to transmit the disease, then there are two main ways the disease could spread from one farm to another: it could spread between farms that are geographically adjacent (if cows on opposite sides of the fence decide to have a chat) or by trade when an animal is sold from an infected farm to another farm that is not yet infected. Some other diseases are spread by biting insects, in which case any farm within a certain radius of an infected farm would be at risk.
Mathematically, we can represent these potentially infectious contacts by means of a graph: a vertex is used to represent each farm, and two farms are connected by an edge if there is the potential for the disease to spread from one to the other (in the case of infection due to trade, this would normally be one-way, so we could capture more information by using a directed edge in the direction of the trade). My first picture illustrates what part of this graph might look like if we are just considering the spread of disease across shared fence lines. The graph we get in this situation will normally be planar, that is it can be drawn in the plane without any edges crossing (although there are some circumstances where this is not necessarily true, for example if one farmer owns several fields that are not connected to each other).
Now we can add to this graph the contacts that come from trade. It’s not so obvious what we would expect the graph of trade links to look like, but this is exactly what Jess and I have been trying to understand.
First, though, I should explain why we want to know about the structural properties of these contact graphs. We’re interested in restricting the spread of disease, and Jess and her colleagues are asked by policy makers to investigate how it might be possible, in theory, to modify the network to make it less vulnerable to an epidemic. When we talk about modifying the network, we’re considering actions that correspond roughly to deleting edges and/or vertices in the graph: to “delete” a vertex, we might vaccinate all animals at the particular farm (so that it can no longer play any role in the transmission of disease), whereas “deleting” an edge might correspond to putting up a double fence-line between two farms (keeping animals on adjacent farms just a little bit further apart) or introducing extra monitoring on a specific trade route. What do we want to achieve by modifying the network? Well, there are lots of different parameters we could look at which relate to how vulnerable a particular graph is to an epidemic, but the simplest one we could think of to start with is the maximum component size (the largest number of vertices that can be reached from any single starting vertex in the graph by traveling along edges). In the picture below, if the farm circled in red is infected, then all of the others are potentially at risk.
However, if we delete one edge from this network, we can be sure that the disease will not spread to the three farms on the right hand side (now highlighted in green).
Of course, all of these possible ways to modify the network are costly, so we want to do this in the most efficient way possible, i.e. make the fewest modifications required to achieve our goal. This motivates the following question: given a graph, what is the minimum number of edges (or vertices) we need to delete so that the maximum components size of the resulting graph is at most h? This is an example of a graph modification problem, a kind of problem which comes up a lot on theoretical computer science. It turns out that, like many graph modification problems, this particular problem is NP-complete: unless most mathematicians and computer scientists are wrong, and P=NP, there is no efficient algorithm to answer this question for all possible inputs. But we don’t have to give up there! Just because a problem is intractable on arbitrary inputs doesn’t mean we can’t find a fast way to solve it on input graphs with specific structural properties – most of my research is based on exactly this idea – so in order to solve the problem efficiently on the real data we want to understand the structure of the livestock contact networks.
Jess and I haven’t yet considered the general problem as I described it above: instead, we’ve been looking only at the trade network, and only at edge deletion. This special case is relevant in the management of a particular kind of persistent cattle trade link that exists in Scotland, so it seemed as good a place to start as any. Thinking about this problem, we did what any graph theorist does when faced with a computationally hard problem: we tried to solve it on trees (graphs that don’t have any cycles, so there’s exactly one way to get between any two vertices). And, after dealing with a few slightly awkward details, it worked in exactly the way any graph theorist would expect: we can solve the problem recursively, starting with the leaves (vertices with only one neighbour) and working our way up the tree – the fact there are no cycles means we never have to rethink the answers we’ve already calculated.
But cattle trade networks aren’t trees, so is this any use in the real world? Somewhat surprisingly, the answer seems to be yes! Although the trade networks aren’t exactly trees, many of them seem to have small treewidth – this is a hard-to-define graph parameter that essentially captures how “tree-like” a particular graph is. And the great thing about graphs of small treewidth is that algorithms that work on trees can usually be adapted (taking just a bit more time) to work on graphs of small treewidth in exactly the same way. The next picture shows how the treewidth of the Scottish cattle trade network grows as we add in more trades, based on real data from 2009. The trades that took place that year are added to the graph one day at a time, and the treewidth (in fact, for technical reasons, an upper bound on the treewidth) of the resulting graph is shown.
We can see from this graph that, even if we look at trades taking place in the whole year (much longer than the timescale we would need to consider for many disease outbreaks) the treewidth of the trade network is still less than 20, for a graph that has nearly 3000 vertices and 8000 edges – this is much lower than we would expect for a random graph with the same number of vertices and edges.
Initial experiments suggest that our algorithm that exploits the treewidth performs significantly better on some real datasets than a generic approach based on constraint programming (which doesn’t take into account any structural properties of the data): there were examples where our algorithm found the optimal solution in under a minute, whereas the alternative couldn’t solve the problem in 6 hours.
We’ve really only scratched the surface of this problem, though. The planar graphs that arise from geographical proximity definitely don’t have small treewidth, so we will need a different approach here – but they certainly have other distinctive structural properties that we hope to exploit. What interests me most to investigate next, though, is why many cattle trade networks have surprisingly low treewidth: farmers aren’t looking at the horrible definition of this parameter when they decide where to sell their animals, so there’s something else going on here – and if we could find a good way to model this “something else” mathematically, would it let us devise even better algorithms?