From Rubik's Cube to Khuluma Data - A Mathematician's View on 160 Characters

In August, the 160 Characters Project workshop was held in London. The project is an innovative model for analysing text message support data using a ‘six voices’ framework; combining insights from medical science, literature, socio-cultural, implementation, technology and participatory. Experts from the ‘six voices’ framework analysed the text messages shared between the participants. The workshop created intense debate on what is good communication and how does each discipline measure successful interaction in the lives of people living with HIV. In the workshop, it became clear the echoes between disciplines; voices re-shaped and reiterated ideas expressed by other disciplines, yet each new formulation brought rich divergences in understanding the data, its limitations, and the future for better implementation of the technology.

Following the success of this dialogue, we have collected a series of blog posts from each of the six voices. This will provide an insight into the findings of the workshop, as well as foregrounding the potential of each discipline for thinking through the key questions of what is good communication, how can technology be used to create supportive communication and can this be leveraged to help improve the lives of people living with HIV.

For the second instalment, mathematician and Natural Language Processing engineer, Hector Durham, explores what the abstract field of group theory can do with the Khuluma text message data.

By Hector Durham:

My academic background is in mathematics, and I am nearing the formal completion of my PhD. My particular specialism is in the abstract field of group theory, a branch of algebra that formalises the notion of symmetry. This viewpoint unifies ideas from modern particle physics all the way to the (surprisingly deep!) mathematics of the Rubik’s cube, and is a rich and active area of research.

I now work as a data scientist, and it is in this capacity that I am part of the six voices framework and 160 Characters Project. My work involves looking at large amounts of data and extracting patterns, with a view to finding meaningful insights and making predictions about future data, by building mathematical models which reflect these insights.

I found the workshop fascinating - in particular I found the experience of working with an interdisciplinary team extremely rewarding. It is a rare opportunity and a privilege for a mathematician to be able to work with experts from the humanities, and I found the insights from literature particularly refreshing. For example, the idea that miscommunication is so central to genuine communication - the analogy being water swirling around the drain before disappearing down the plughole - was an insight I found particularly striking, not only for its acuity but also for its deceptive simplicity.

I also took to the distinction between listening and waiting to speak. Not only has this idea played on my mind in the days and weeks since the workshop, but in this context, I can see that it is a crucial distinction to make. In particular I am looking forward to formalising this idea mathematically.

Let me briefly outline how one might go about approaching this data from an analytical perspective. It is useful, first of all, to distinguish between exploratory and explanatory data analysis. The former involves trying to see what patterns are present in the dataset - we may have some initial assumptions, but we should ideally form our opinions based on what the data is saying. We will ideally notice certain correlations or patterns which may seem surprising or which may have changed our opinion about the data in some way. Then, it is the role of explanatory data analysis to communicate those findings in a compelling manner.

Thus, upon first encountering a dataset, one needs to get a sense of the data by continually performing certain initial tasks and by asking oneself the most basic of questions - questions that we already know the answer to. If we find that we can satisfactorily answer these questions from a purely analytical perspective, then we might have some initial faith in our models and we can start to delve a little deeper - can we build more complex hypotheses, or establish correlations between any hidden variables?

Take, for example, the really simple question: which users are most active? One might first approach this by visualising the participation of a user over time. But how do we quantify participation? Perhaps the user posts frequent short messages, or makes longer posts, but less frequently, or perhaps just neither. We could for example measure these different ways of engagement by assigning different weightings to the posts that a user makes over time, and seeing whether the results correspond to what we might think the answer should be. If they don’t: why not? Was our model too simplistic, or even: were we mistaken? If the former: add complexity, or start again. The latter is our goal at this stage: to find something that surprises us.

Hopefully by the end of this process of asking simple questions and gradually building up to more nuanced questions, we can find patterns in the data that were not quite visible at first glance. This is the stage at which explanatory data analysis takes over, where we try to effectively communicate these hopefully surprising findings.

The results of the 160 Characters Workshop will be presented on Thursday 11th October at the UCL Adolescent Lives Symposium by Director, Anna Kydd. You can find details about the event here, or tune in on our twitter @SHMFoundation.

Text: Hector Durham

Illustration: Maggie Li;

Published on: 06-10-2018

Get Involved

Help us empower adolescents living with HIV in South Africa by donating towards our Khuluma Khulisani Mentor Programme.

Donate Now

The SHM Foundation is a charitable foundation (registration number 1126568) | Privacy Policy