If you conduct a survey, among some friends, consisting of three YES/NO questions, how can you summarize the responses?
I conducted a survey recently at a conference. The three questions were:
- Is it your first time at the Mathematisches Forschungsinstitut Oberwolfach?
- Do you like the weather?
- Have you played any games?
There are eight options for how someone could respond to three YES/NO questions. Taking YES=1, and NO=0, the eight options are labelled by the binary strings: 000, 001, 010, 100, 011, 101, 110, 111.
We can think of 0 and 1 as coordinates in space, and arrange the eight numbers into a cube:
This 3D arrangement reflects the fact that there are three questions in the survey. Since our dataset is small, there’s not much need for further analysis to compress or visualize the data. But for a larger survey, we will summarize the structural information in the data using principal components.
The first step of principal component analysis is to restructure the 3D cube of data into a 2D matrix. This is called “flattening” the cube. We combine two YES/NO questions from the survey into a single question with four possible responses. There are three choices for which questions to combine, so there are three possible ways to flatten the cube into a matrix:
Our analysis of the data depends on which flattening we choose! Generally speaking, it’s bad news if an arbitrary decision has an impact on the conclusions of an analysis.
So we need to understand…
How do the principal components depend on the choice of flattening?
This picture give an answer to that question:
All points inside the star-shaped surface correspond to valid combinations of principal components from the three flattenings, while points outside are the invalid combinations. More details can be found here.