Facebook's 2.2-billion active users use the platform for sharing all kinds of things: Engagements. Group plans. Political misinformation. Cat photos. But as researchers reported this week, the words you post in your status updates could also contain hidden information about your mental health.
In research described in this week's issue of the Proceedings of the National Academy of Sciences, scientists analyzed language from study participants' Facebook status updates to predict future diagnoses of depression. The researchers say their technique could lead to a screening tool that identifies people in need of mental health support and formal diagnosis, while raising serious questions about health privacy.
If this line of inquiry sounds familiar, you're not imagining things: Scientists have been studying the association between Facebook and the mental state of its users for years—often without the consent of the people being examined. Earlier this decade, scientists at Facebook and Cornell conducted an infamous emotional contagion study, which targeted the moods and relationships of more than half a million Facebook users without their knowledge. More recently, Cambridge Analytica used ill-gotten data from some 87 million Facebook users to develop personality profiles it claimed would enable marketers and political campaigns to deliver more effective advertisements.
But many scientists continue to use above-board research methods to access Facebook's data. For instance: By asking study participants to provide their consent, log into their accounts, and share their data—all in person—to provide one-time access to said data. The overhead is tremendous; it can take years to amass a large enough sample population using in-person study recruitment. Yet the effort can be worth it to social science researchers, many of whom regard Facebook's trove of user information as the most significant data repository in the history of their field.
"We're increasingly understanding that what people do online is a form of behavior we can read with machine learning algorithms, the same way we can read any other kind of data in the world," says University of Pennsylvania psychologist Johannes Eichstaedt, first author of the new PNAS study and cofounder of the World Well-Being Project, a research organization investigating how the words people use on social media reflects their psychological state.
To study whether language on Facebook could predict a depression diagnosis, Eichstaedt and his colleagues needed access to two personal forms of data: Social media accounts and electronic medical records. Over the course of 26 months, they approached more than 11,000 patients in a Philadelphia emergency department and asked if they'd be willing to share their EMRs and up to seven years' worth of Facebook status updates.
Some 1,200 patients agreed. Of those, 114 had medical records indicating a depression diagnosis. Every year, roughly one in six Americans suffers from depression. To reproduce that ratio in their final research population, the researchers matched every person with a depression diagnosis with five who did not. That gave the researchers a final pool of 684 participants. Using those individuals’ more-than-half-a-million Facebook status updates, the researchers determined the most frequently used words and phrases and developed an algorithm to spot what they call depression-associated language markers.
They found that people with depression used more "I" language (i.e. first-person singular pronouns) and words reflecting hostility and loneliness in the months preceding their clinical diagnosis. By training their algorithm to identify these language patterns, the researchers were able to predict future depression diagnoses as much as three months before its appearance in their medical records as a formal condition.
The researchers' observation that depressive individuals use "I" language more often jibes with findings from past studies, including ones that have related social media usage patterns to self-reported depression. But this is the first study to compare the language people use on Facebook to clinical diagnoses using medical record data. "That's an important advance," says Matthias Mehl, a research psychologist at the University of Arizona who studies how language usage can reflect a person's psychological condition, "but the predictions are still far from perfect." The algorithm's probability of detecting symptoms of true depression, he says, was higher than the probability of false alarm—but nowhere near enough for it to replace a formal diagnosis.
Eichstaedt agrees. "It would be irresponsible to take this tool and use it to say: You’re depressed, you're not depressed," he says. What it could be suitable for is finding people who should follow up with more formal—and often more costly—screening methods. He adds that future studies will need to reproduce his team's findings in larger, more diverse populations (the participants in this study were predominantly black women).
That's assuming people will be willing to have the language they use on social media dissected in search of mental health signatures—one hell of an assumption, in light of Facebook's ongoing string of privacy scandals. And even if people do consent to share their personal information, Eichstaedt says you can't unlock its true predictive power until you combine it with another form of data: heart rate, activity, or sleep patterns, for example—all of which are recorded more and more by activity trackers.
"A benevolent dictator would connect all these data streams and use them for the public good," Eichstaedt says. But the moral and ethical alignments of tech's biggest companies face greater scrutiny today than at any point in recent history. If a screening tool like this one ever debuts on Facebook or any other social media platform, it's hard to envision it happening anytime soon.