Diversity & Inclusion

When it comes to data, let’s just agree to disaggregate

Nonprofits often choose over-aggregated data because it’s easier to access and understand, but this results in harmful oversimplification, writes this data editor.

There are dozens of examples of data that over-aggregate gender to a “default male.” (Image by Gerd Altmann from Pixabay)
ZIP Codes suck, and I want to tell you why.

They’re an over-aggregation of an area that doesn’t make sense for all the ways that we use them for data. Many places, including the org I work at, Resolve Philly, use ZIP codes to help us find where we need to focus our efforts. It’s great when we’re conducting surveys so we have an insight into where people are from; they are easy to recognize and relate data or trends to your own community.

But ZIP Codes weren’t intentionally designed for this — the U.S. Postal Service designed them in 1963 to help with mail delivery speeds. They’re influenced by population density and how streets are laid out, but ultimately serve as an aggregation of an area and don’t account for socioeconomic factors or even a standardized number of people within them.

For example, in Philadelphia, 19103 encompasses a population where 17% of households make more than $200,000 and 17% of households make less than $25,000. Yet, this range of household income gets lost when nonprofits, for instance, make their decisions about where to deploy services based on the median income in a ZIP code.

This is over-aggregation. It’s the lumping together of people with very different lived experiences, socioeconomic status, race, gender and more to create a simplified figure. It happens a lot within nonprofits because they don’t have the resources to conduct in-depth analysis for their work.

And it doesn’t happen just with locations. In “Invisible Women,” author Caroline Criado Perez documents dozens of examples of data that over-aggregate gender to a “default male,” which results in continued harm and disadvantage toward women. These issues range from medicine to corporate or academic policies to even city planning. In one example, Criado Perez highlights what happens when crash test dummies are built off an aggregate human.

Over-aggregated data results in harmful oversimplification that omits life-sustaining and life-affirming information.

Over-aggregation also results in cultural data gaps. For example, most accessible data on Asian American and Pacific Islander (AAPI) communities aggregate racial and ethnic information to just “Asian.” This oversimplifies people who represent 40 nationalities, more than 40 languages, and more than 54 ethnicities, each with complex and essential cultural and political differences like U.S. policies targeting different immigrant and refugee groups. As a result, data on AAPI communities is almost always oversimplified to a single category or entirely omitted from the discussion.

For example, a National Academy of Sciences study explored the reduction in U.S. life expectancy with a focus on Black and Latinx populations compared to white people. Nowhere in the study does it even mention Asian populations, despite their making up nearly 6% of the U.S. This and many other studies on COVID-19 excluded data on AAPI communities, and the organizations trying to serve those communities have even less access to data that could help them.

Nonprofits often choose this over-aggregated data because it’s easier to access and understand. But this results in harmful oversimplification that omits life-sustaining and life-affirming information on the communities they serve. The nonprofit world is a heavy practitioner of excluding the people they’re serving from decision-making and strategic conversations, so these people are only ever represented by inaccurate, one-dimensional data.

To address this, the simple answer would be to increase nonprofits’ capacities to collect data and better analyze what is out there. However, funders hold this needed money hostage oftentimes on the condition of … data.

Funders should recognize that meaningful, accurate and ethical data requires a strong investment in nonprofits’ abilities to essentially perform data science at the intersection of community engagement. This is doubly essential for marginalized communities, where data collection is extractive and historically used for further discrimination.

Democratizing the data a nonprofit uses not only creates transparency and builds trust, it will inevitably improve the quality of the data we’re using and the conversations we’re having with it. Funders are the ones in a position of financial power and can push the nonprofit world into a space rooted in data equity.

This is a guest post written by Julie Christie, data and impact editor at Resolve Philly. It originally appeared on sister site Generocity and is republished here with permission.

Before you go...

Please consider supporting Technical.ly to keep our independent journalism strong. Unlike most business-focused media outlets, we don’t have a paywall. Instead, we count on your personal and organizational support.

Our services Preferred partners The journalism fund
Engagement

Join our growing Slack community

Join 5,000 tech professionals and entrepreneurs in our community Slack today!

Trending

The person charged in the UnitedHealthcare CEO shooting had a ton of tech connections

From rejection to innovation: How I built a tool to beat AI hiring algorithms at their own game

How a laid-off AI enthusiast pivoted to become a founder — while holding down a day job

Where are the country’s most vibrant tech and startup communities?

Technically Media