Why Pittsburgh’s Innovation and Performance team takes an open-source approach to open data

Picture it: Pittsburgh, Pennsylvania. October 2015.

It was the birth of what would be named the Western Pennsylvania Regional Data Center (also known as the WPRDC, also known as “Whopper Duck”), an all-star collaboration between the City of Pittsburgh, Allegheny County and the University of Pittsburgh.

This put us in the unique position of hosting not just city and county data, but data from non-governmental organizations such as the Carnegie Library and Bike PGH, as well as other local service providers such as the Port Authority of Allegheny County. This required a specialized set-up, which is why WPRDC is based in CKAN, an open-source data management system that allowed for a completely custom configuration.

The Data Center launch coincided with the kickoff of the city’s Open Data program, managed by the city’s Department of Innovation and Performance. When the Open Data program was in its infancy, I was an intern tasked with crowdsourcing ideas for our first open datasets. We asked a simple question: “What data, owned by the City of Pittsburgh, would you most like to see?” This project, known as the Open Data Forum, received dozens of suggestions, hundreds of votes and helped influence our data priorities moving forward.

How useful is published data if no one’s using it?

Three years later, I was handed the keys to the Open Data program — which has grown significantly since launch. One thing that hasn’t changed, however, is our commitment to a customized, DIY approach using open-source tools.

From the beginning, it was important that we were given the freedom by executive management to develop our own tools and build exactly what we wanted. This allowed us to focus our initial efforts into an effective ETL — Extract, Transform and Load — system to ensure that we could easily host a variety of datasets from all kinds of systems, from complex databases to simple spreadsheets.

But how useful is published data if no one’s using it?

Our second priority was to democratize our data — to make it available in a way where anyone, whether they were a seasoned power user or a total novice — could engage with city data, through maps, dashboards and interactive tools. We tried a variety of solutions but none of them gave us exactly what we needed — which was the ability to build something that was tailored exactly to our needs, while being able to adapt quickly to feedback from city departments and members of the public.

This lead to the creation of Burgh’s Eye View (BEV), our flagship tool for visualizing city operations data. It was developed and maintained by Geoffrey Arnold, who built it entirely from the free program R-Studio, using R Shiny.

While our next major initiative in the pipeline doesn't face the general public in the same way, it is easily as important.

The initial customers for this tool were city departments — BEV provided a simple way to show their work on an interactive map. It became clear that this would be a useful tool for the public as well. We wanted to know what people actually wanted, so we took our show on the road and visited over 30 neighborhood meetings all over the city to get community feedback. We heard a lot of requests and suggestions, many of which became new features.

The tool has grown from a map of 311 complaints to an entire suite of tools that contain information ranging from incidents of crime to the tax delinquency status of a property to the location of all city-affiliated recycling sites, all with a cute penguin mascot.

While our next major initiative in the pipeline doesn’t face the general public in the same way, it is easily as important — and it is called Data Rivers. Returning to our priority of easily making data accessible, we are redesigning our entire data delivery process from the ground up, using open-source tools for a streamlined solution that is managed in-house. The plan for Data Rivers is to transform our old data delivery model, which we affectionately built out of duct tape and bubble gum — into a professionalized, elegant system that will make it easier to publish the data that our users want to see.

While Data Rivers is still in the development phase, we want to keep our work as transparent as possible. In the coming months, we will be publishing our Data Publisher Resources, as well as the source code to many of our applications on our GitHub page.

In the meantime, we plan to keep that do-it-yourself attitude as we continue to build our digital services team. If we’re hoping that our residents can take data into their own hands, why can’t we?

Get Pittsburgh stories in your inbox weekly

Series: Pittsburgh / Open Data PGH

Why Pittsburgh’s Innovation and Performance team takes an open-source approach to open data

Senior Digital Services Analyst Tara Matthews on the city department's commitment to democratizing the outcomes of its work.

Picture it: Pittsburgh, Pennsylvania. October 2015.

Join the conversation!

Pittsburgh weekly roundup: Neighborhood newspaper for tech; Bartel honored at CIO awards; $204M for broadband internet

Revitalizing our coverage to connect Pittsburgh's tech and startup scene

Will generative AI replace software developers?

A green tech Earth Day glossary