Oakland; Oakland;

How data can prevent pandemic-related homelessness

In communities around the globe, low-income families have found themselves especially vulnerable to both the health and economic risks presented by COVID-19 and the resulting shutdowns.

Countless families face a looming eviction crisis because of lost jobs and income over the past several months. As part of our global response to the impacts of the pandemic, our Advanced Digital Engineering team has been working with an American NGO called New Story to target relief funds to those most in danger of losing their homes. The project has demonstrated how innovative data analytics can address community issues.  

Addressing homelessness at home

By New Story’s estimates, nearly a billion people worldwide have insufficient shelter. Although much of their work has been in Latin America, on their latest project, ‘The Neighborhood’ the team from New Story aimed to address housing issues in the US. The project has aimed to provide direct financial assistance to marginalised, often undocumented communities most in danger of losing their homes, those ineligible to receive federal support. 

Early action against homelessness has big impacts later. According to New Story, to prevent someone being evicted might require $2000-$5000, but one year of rapid rehousing support costs $14,000 and a year of homelessness incurs a cost of $100,000 to a community. Since New Story couldn’t build new housing during the lockdown, they chose to provide a safety net to those in need with direct funding.   

A data-driven process to identify the vulnerable

Our work was oriented around two questions: what communities are most at risk, and when does that risk occur? New Story had done an initial analysis to start to address these questions in the Atlanta area, but wanted our help to dig deeper into the sheer volume and variety of data related to this problem for the nine counties of the Bay Area in California. We assembled a small team with data engineering, data science, and GIS skills to manage and analyze the variety of data sources. 

Given the wave of unemployment hitting the USA, we knew we needed to move quickly. First, we needed to understand the existing policies on the federal, state, and local level and turn those into a timeline for each county in the Bay Area. These timelines acted like a countdown clock to show when people start to become at risk of eviction. We were able to get data about the percentage of households in poverty in each county, as well as other demographic and economic data, including in the overall use of homelessness shelters and a measure of how vulnerable each county was to COVID-19.

Using 11 datasets, we created a relative risk index to understand which counties were most vulnerable to an increase in evictions. We presented the risk for each county alongside its respective countdown clock to show which counties were most at risk related to the time left before policies expired. When these two metrics were combined, we were able to see which counties to prioritize based on their existing economic and demographic conditions and their own policy responses. 

Sharing our data and analysis

It is clear to us that this data-driven approach would also be useful in other areas of the United States, so we chose to open source our data analysis work for other organizations to use. We have published a repository with our analysis code and information about our database on GitHub.  

The repository that is currently published has several key features. Firstly, we published an open database for eviction-related datasets. This built off our work with New Story and includes up-to-date datasets from the Federal Reserve (FRED), the Department of Housing and Urban Development (HUD), the US Census, and others. This data is now available to anyone in a single database at the county level. Users can query the database and pull as much data as they are interested in. This database can also be expanded to include more datasets that users would find useful. 

Based on our work with New Story to specifically address direct relief to renters, we’ve updated our Python analysis and published workflows to evaluate the relative risk of eviction between counties. This analysis aims to give context around the socioeconomic indicators at the county level and show a direct comparison between counties to help decision makers prioritize how to direct aid. We’ve also documented our process to evaluate policies at a county level, using a methodology from EvictionLab, an organization devoted to publishing national eviction data, and provided a template for users. 

Develop together, benefit many

Like many other industries, the engineering industry does not traditionally publish work for others to use and build upon. However, open-sourcing socially valuable projects such as this are an opportunity to share what we’ve learned and allow others to use and improve upon what we’ve done. Taking contributions from the community makes the code more secure, efficient, stable and usable. It’s a great model for communities to address significant social issues, taking advantage of and building on tools they couldn’t have developed on their own.

Since publishing, we have already begun adding new features and data and we’re working on ways to make the data accessible to more people without digital skills. Our team has plans for new analyses, datasets, and visualizations that we would like to be added to this repository, but we also hope that as people use the data for their problems, they will have new ideas to contribute. Our aim is that this repository serves as a resource to NGOs, municipalities, and other firms as we all try to help people remain in their homes.

If you are interested in exploring the code and data, you can find documentation and instructions about how to report issues and contribute your own improvements at our GitHub repository