The dark side of data

The dark side of data

“He’s got fancy charts. He must be right.”

Trevor Butterworth

“Complete bollox—does that phrase translate here?”

— David Spiegelhalter, Winton Professor of the Public Understanding of Risk in the Statistical Laboratory at the University of Cambridge on California’s decision to warn the public that acrylamide in coffee is carcinogenic.

“Nix was the best salesperson I ever met,” said Scott Tranter, one of the founding partners of Øptimus Consulting, a data analytics company that worked on political campaigns, notably that of Sen. Mark Rubio (R-Fl) bid to become the Republican presidential nominee.

He and his team had listened quietly to the presentation by Alexander Nix, founder of Cambridge Analytica, explain why his company’s personality data, mined—as the world now knows—from hundreds of quizzes on Facebook, gave them the analytic edge in decoding the psychology of the American voter. And then Tranter’s team began to ask questions. Detailed questions. Under pressure, Nix said, “No one checks our work, we’re PhDs.”

“He was very scary—almost like a Bond villain,” Tranter told the audience of statisticians. “His British accent added 15 IQ points. He’s got fancy charts. He must be right.”

But there was no peer-reviewed research on psychographic modeling. If this was Cambridge Analytica’s “secret sauce,” they were serving up waffle to credulous politicians and their campaign staff. This was obvious to Tranter and his team: they were data scientists; they could see the nothingness behind the curtain of fancy graphics.

But that was precisely what made Cambridge Analytica so disturbing and so dangerous to the future of data science: It was all math. Cambridge Analytica had math; Øptimus had math; and to those with no or little math or statistics, it all just looked the same, except Cambridge Analytica were, in effect, claiming to have the best math. “When do clients push back?” asked Tranter. “They have no critical skills to push back. There is no safety net to stop inaccuracies.”

Tranter was speaking at the world’s largest annual conference of statisticians, JSM 2018, this year held in Vancouver, Canada. The topic for the panel, sponsored by the International Statistical Institute and chaired by Liberty Vittert, was “The Good, the Bad, and The Ugly. The Future of Statistics and the Public.”

If Tranter described the ugly side of the data revolution, David Spiegelhalter, statistician and chair of Winton Centre for Risk and Evidence Communicationat the University of Cambridge, had described the bad—or, to use a Britishism, “the complete bollox”—that people misinterpreting data end up foisting on the public.

A prime example of “bollox” inflicted on Americans was the decision by California that coffee should be labeled a carcinogen because a chemical created by the roasting process, acrylamide, has been found to be carcinogenic at very high doses in animal studies. It simply wasn’t a meaningful risk, Spiegelhalter said, and it distracted public attention from risks they should pay attention to (to get a sense of how meaningless the risk is, read his Medium post “Coffee and cancer: what Starbucks might have argued.”)

Messed up and misleading numbers are the work of many hands in the pipeline from research to the public, said Spiegelhalter, but “charitable NGOs are the biggest risk mongers.” Citing Hans Rosling, the Swedish physician and statistician who transformedour understanding of the world using data visualization, Spiegelhalter said that “we need to distinguish what is frightening from what is dangerous.” This was the public role that statisticians needed to play.

 

David Spiegelhalter, Liberty Vittert
Photo: Trevor Butterworth
Rita Ko, Liberty Vittert
Photo: Trevor Butterworth

With so much that was bad, ugly, or both, where then was the good in the future of statistics for the public? Spiegelhalter pointed to Uganda, where school children were being taught how to question health claims, and where he himself had visited and came away enormously enthused.

Rita Ko, Director of the Hive, the UN’s Refugee Agency data innovation lab, showed how data science and statistics were transforming the UN’s ability to tackle refugee crises. Mark Hansen, a statistician and Director of the David and Helen Gurley Brown Institute for Media Innovationat Columbia and Stanford Universities, showed how Columbia journalism students had exposedsocial media’s black market for fake followers.Richard Coffin, Director of USA Facts, showed how the nonprofit—founded and funded by Steve Ballmer, former CEO of Microsoft—was systematically working to make government data available.

In many ways, the “good” of statistics was a work in progress. “Everybody loves the idea of more data, but there are huge barriers,” said Coffin. The US is, in effect, a network of 90,000 ‘governments;’ data is scattered, incomplete, and confusingly presented—if it’s presented at all. You go to the U.S. Census homepage, said Coffin, and there isn’t even an image of population change over time.

Coffin’s biggest lesson from trying to make all of this accessible and assessable? Design is critical, he said. “It turns out that a lot of people are not a fan of spreadsheets.”

Submit a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share This