Computers are people, too.
It’s something we’ve been thinking about over at Technical.ly Brooklyn HQ for a bit and it’s come up in the news recently a few times, most notably last month with the Facebook Trending Topic Imbroglio of 2016. In the end, the big deal was that people thought that the trending topics that show up on the sidebar on Facebook were value-neutral and reflected only what people were talking about on FB. Well, turns out, no, people have an editorial role, too, and there were serious claims of liberal bias.
But to take a step back, liberal bias, sensational bias, or otherwise, maybe we shouldn’t have expected that the stories on Facebook or machine-generated content to be some magical, neutral, mechanical thing, anyway. People make this stuff.
"Mathwashing can be thought of using math terms (algorithm, model, etc.) to paper over a more subjective reality."
Fred Benenson was the second hire at Kickstarter, and served as the crowdfunding company’s Vice President of Data. He coined a term online last week, “mathwashing” which explains a bit about the complexities of data. We were fortunate enough to interview him about his thoughts on the current zeitgeist of data worship.
Technical.ly Brooklyn: So I’ve come across your use of mathwashing and it’s a concept I’ve been thinking about a lot recently. What does is mean?
Fred Benenson: Mathwashing can be thought of using math terms (algorithm, model, etc.) to paper over a more subjective reality. For example, a lot of people believed Facebook was using an unbiased algorithm to determine its trending topics, even if Facebook had previously admitted that humans were involved in the process.
TB: Where does it come from? What made you say it?
FB: When news broke that Facebook editors were squelching some stories and injecting others, I thought it was important to examine why we believed these features were objective to begin with. So I coined mathwashing in an attempt to describe the tendency by technologists (and reporters!) to use the objective connotations of math terms to describe products and features that are probably more subjective than their users might think. This habit goes way back to the early days of computers when they were first entering businesses in the ’60s and ’70s: everyone hoped the answers they supplied were more true than what humans could come up with, but they eventually realized computers were only as good as their programmers.
I don’t want to get into the etymological history of who called it an algorithm and when, but I don’t think Facebook’s Trending Topics feature really qualifies. It’s important for us (and Facebook) to acknowledge that. It’s more like a data-influenced editorial feature.
The portmanteau (i.e., X-washing) was inspired by “pinkwashing” or “greenwashing” which themselves are derived from the more common “whitewashing.” In the case of pinkwashing, it was originally used to describe tactics used by companies to sell products to the breast cancer community. In terms of greenwashing, think about BP or Chevy using green or environmentally focused campaigns to sell their products: they’re benefitting from the connotations of “green” while selling products like gas and cars that aren’t really compatible with conscientious environmentalism.
TB: You are a data scientist. Why is it not heretical to not stick strictly to the numbers? Should we all just start making decisions based on our gut?
FB: I think the term mathwashing should be more of a warning to fellow technologists: don’t overlook the inherent subjectivity of building things with data just because you’re using math. Algorithm and data driven products will always reflect the design choices of the humans who built them, and it’s irresponsible to assume otherwise.
"There's no such thing as perfect data."
So while math can help describe underlying data in an objective way, data science is so much more than that — it’s about developing products using our observations about the real world by measuring messy things like human behavior, natural phenomena, etc. There’s no such thing as perfect data, and if you collect data incorrectly, no math can prevent you from making a bad or dangerous product. In other words, anything we build using data is going to reflect the biases and decisions we make when collecting that data.
So I think if we want to “stick to the numbers”, we have to be intellectually honest about understanding how we recorded those numbers. Who recorded them and what criteria did they use?
TB: What is certainty even? Have you found certainty in any of your data projects or other projects in your career?
FB: First, I think certainty is different from objectivity. It’s fine to use data to feel certain about all kinds of scenarios, but we shouldn’t confuse that with having necessarily discovered some objective truth. Especially when it comes to building data-driven products, feeling like mostly certain is sometimes all we have. And that’s OK! I think it is important to acknowledge that limitation. I’ve tried to do that throughout my career working with data, and I don’t think that outlook is fundamentally different from how a lot of scientists go about describing their theories: they’re true to the extent that they can explain (and not be disproven by) the data we’ve collected.
But to step back and try to answer what you’re getting at: one of my favorite subjects in college was incompleteness — Gödel’s incompleteness theorem states that even integer math can’t prove its own consistency (i.e., it relies on assumptions about axioms that we just have to take for granted). His theorem is beyond the kind of day-to-day data science I do but I think it proves an important point how illusive perceived objectivity in math is. -30-