Using Data for Social Good
“Data for Social Good” partners with WPRDC to teach students how to create human-centered algorithms influenced by real-world problems
In today's digital age, datasets about everything are everywhere, from research labs and libraries to governments and industry. But in raw form, databases are useless without understanding what they contain and how to use them.
One novel use is poring through data to study and impact positive social change. Amin Rahimian, an assistant professor of industrial engineering at the University of Pittsburgh Swanson School of Engineering, is exploring this potential in his new class, “Data for Social Good.” Their findings could benefit city and county agencies in Pittsburgh and elsewhere.
“The class not only teaches students about algorithms, but how to apply them to real-world datasets while becoming more conscious of their societal consequences,” Rahimian said.
The class includes three modules: Essence of Data, AI in the Fabrics of Society, and Algorithms in the Wild. Every class session is paired with an outside speaker that’s influential in the region – whether that’d be through policy or data.
As students learn about these topics in class, they work on a real-life data project that applies these principles outside of class. Early in the semester, they were introduced to the Western Pennsylvania Regional Data Center (WPRDC), a leading open data portal based at Pitt and in partnership with Allegheny County and the City of Pittsburgh serving Western Pennsylvania, to begin crafting an algorithm that leveraged data provided by the WPRDC.
“These algorithms that students ultimately create are supposed to be human-centered,” Rahimian explained. “That’s why using data from the WPRDC is so important for the function of this class; it’s data that affects daily lives.”
David Walker, a developer at the WPRDC, was the bridge between the WPRDC and the students.
When he first met with the class, he described the eight principles of open government data: complete, primary, timely, accessible, machine-processable, non-discriminatory, non-proprietary, and license-free. Healso demonstrated the capabilities of the WPRDC’s data portal and the many interactive data visualizations tools that the WPRDC has developed to help people use the region’s civic data.
“The WPRDC’s role is to supply the data, but we like seeing the data used by other entities, which is why collaborating with Pitt faculty is such an important part of what we do,” Walker said. “When this type of data is applied, it can create real change for real people. We hope more groups leverage our data in this way in the future.”
Students ultimately chose projects that either interested them or personally affected them as private citizens of Pittsburgh. To successfully complete their projects, they were taught how predictive models can be trained on real data and how inferences from the data can inform realistic scenarios, like how to predict the risk of a disease, the click probability of an online ad, probability of a loan repayment, or wine quality. Students were given practice problems to practice these methods before applying them to their larger project.
“We covered canonical problem formulations in machine learning and common methods to address those using parametric and non-parametric models,” Rahimian explained.
The fall 2023 projects were:
Identifying Pittsburgh Neighborhoods’ Risk for Gentrification
By: Jessica Kneller, Maggie Kuehn, and Lauren Lenherr
Gentrification is considered as drastic changes to neighborhoods which increase market value and attraction while decreasing affordability and livability for current residents. Using indicators found through research, pre-existing models, and data from both 2012 and 2020, the students identified Allegheny County neighborhoods at risk for gentrification. The indicators included were age, educational attainment level, parcels to sales ratio, vacant addresses, housing choice vouchers, median housing prices. Using these indicators, the students created a random forest regressor to predict market values for neighborhoods at risk for gentrification within the next five years.
East End neighborhoods that the group anticipated (Lawrenceville, East Liberty, and Bloomfield) experiencing gentrification were confirmed by their results. Additionally, the group found a few other neighborhoods scattered on the city’s West End, such as Brighton Heights and Windgap. Their findings will be able to help predict future neighborhoods experiencing this phenomenon.
Understanding Synthetic Data in the Context of Social Services
By: Andrew Fox, Jack Kaye, and Declan Kelly
This project involved comparing real aggregate count data to synthetic data (fake data generated to model real data without the privacy issues) for datasets about social services that could prevent tragedy in Allegheny County, particularly overdoses, suicides and homicides. The goal was to understand more about how resources should be allocated to areas within the county and determine what information can be obtained from the synthetic data that may be difficult to interpret from the less granular real data. In the process, the generation of synthetic data was analyzed as well as its privacy and utility for providing insights to social issues.
The group concluded that synthetic data was a great method for protecting privacy and machine learning methods could be used in many cases to predict what services are needed for certain people.
Is the City of Pittsburgh truly a Bike Friendly City?
By: Anthony Robol, Jorge Cervantes Rodriguez, and Ruochong Zhu
In recent years bike lanes have been built throughout the City of Pittsburgh and POGOH has installed over 50 bike rental stations throughout the city, allowing non-bike owners to be able to use said bike lanes. These two developments are part of the goal of increasing bike friendliness of the city, as well as introducing an alternative transportation option for commuters. However, how many of the bike lanes are truly bike friendly when venturing outside of the immediate area around Pitt’s Oakland campus?
The city of Pittsburgh has made great strides towards bike friendliness in recent years. However, there is significant heterogeneity in bike usage across different neighborhoods. Factors such as geography, slope and proximity to stations are important. Students in the project also found that people associated with Pitt are a huge fraction of all POGOH rides.