If we roll a dice and then flip a coin, the outcome of the dice roll shouldn’t affect whether the coin lands head or tails: the two events are independent. This is characterized by the equation
where could be any of the possible outcomes of the die, , and could be either heads or tails. Now assume that we have two events, both of which could take 3 possible outcomes. For example, the weather is sunny, cloudy or rainy and I could arrive early, on-time or late. I want to work out whether my arrival time is independent of the weather.
To do this, we collect data on all the possible 9 outcomes, and obtain estimates for the probabilities of these 9 events. We depict them in a table:
We see from the shape that the data can naturally be given the structure of a matrix,
where is the probability that I am early and it is sunny, is the probability that I am late and it is rainy, etc.
If the two events really are independent, then the probabilities will satisfy the multiplicative relation above, where is the probability that I am early, etc. In this case, we can write matrix as
This is a rank 1 matrix, because all the rows are scalar multiples of each other, and likewise so are all the columns. So, to test to see if the events are independent, we could see if our computed probabilities in the table above make a rank 1 matrix. This is a test for the events being independent.
What if instead we have the incomplete data
- Is it possible to show that our events must be independent? (No, think about why!)
- Is it possible to show the events cannot be independent? (Yes, see below)
It might be that there are no possible values for the remaining entries of the table such that the overall matrix of probabilities will be rank 1. This will depend on the known diagonal entries of the matrix.
The picture below (taken from this paper from 2014 called “Matrix Completion for the Independence Model” by Kaie Kubjas and Zvi Rosen) shows one restriction on what the possible diagonal entries could be in order that the rest of the entries can be filled to get something rank one.
The three axes are the possible values for the three diagonal entries and the filled in part of the shape gives the possible combinations that might come from two independent events.