I recently read Virginia Eubanks’ wonderful book, “Automating Inequality,” and came away compelled to share some of my key takeaways on principles to design more inclusive and just technology. In the book, Eubanks discusses how software implemented by our government has recreated a digital version of the poorhouse, entrapping low-income Americans in a system of digital surveillance and decisioning. She concludes that these digital tools, whether deterministic or stochastic in nature, have only served to further income inequality despite the best of intentions.
I think the book offers some valuable lessons for all technologists, whether or not the products we build impact low-income or marginalized communities directly. The best software will eventually touch a diverse base of users and will attempt to automate decisions that are normally made by a human. The best practices I share below are critical to think through at any stage of technology development – in fact, the earlier the better!
Design Principles from “Automating Inequality”
- Algorithmic transparency – In 2006, the government of Indiana signed a $1b+ agreement with IBM and the now-defunct Affiliated Computer Services to build a tool to automate eligibility determination for food stamps, Medicaid, and TANF (Temporary Assistance for Needy Families). The program was a disaster. Most notably, the software removed millions of Hoosiers from vital public benefits citing the vague excuse of “failure to cooperate.” Without any transparency into the eligibility criteria and the reasons for their denial, people were outraged. They lost trust in their government and began filing complaints with their benefits offices. Ironically, the entire system ended up increasing the toll on caseworkers rather than improving efficiency as intended. The common critique against having fully transparent algorithms is that it allows systems to be “gamed” since people know exactly how decisions will be made. While this is fair, this is preferable to having users lose trust in systems altogether. In the best of cases, users churn and businesses lose money. In the worst of cases, when opaque algorithms are used by governments like in Indiana, millions of lives are endangered. While there is no standard guide on how to achieve algorithmic transparency, I think the ACM’s Seven Principles are a good starting point. For situations like Indiana’s, a simple first step is providing users a thorough explanation of outputs or error messages, an easy way to escalate higher-level queries, and answering frequently asked questions upfront.
- A human UI layer – In the same example from Indiana, Eubanks’ interviews with both benefits recipients and caseworkers make it clear that having a human caseworker in the loop was critical. Users benefited greatly from having another person to tell their story to, to empathize with their struggles, and to guide them through the often-complex details of social services. Contrary to popular belief, even caseworkers were able to make less biased, more informed decisions by having personal interactions with their clients. As Eubanks states, “justice sometimes requires an ability to bend the rules. By removing human discretion from frontline social servants and moving it instead to engineers and private contractors, the Indiana experiment supercharged discrimination.” Even if a product has the explicit goal of automating human busywork, it needs to have a UI layer that makes users feel cared for. Whether this is manifested in exceptional customer service, reachable 24/7, or thoughtful design that explains concepts simply and clearly, the best software feels human.
- Redemption. As the homelessness crisis in Los Angeles escalated over the turn of the last decade, the city decided in 2013 to implement a coordinated entry system. The idea was to collect as much data as possible on the homeless population in order to better match people with the city’s public resources like temporary shelter, food, and healthcare. Though well-intentioned at first, LA’s data collection efforts soon turned sinister, collecting personal information about people’s mental and sexual health, immigration status, and criminal activity. The data began to be shared with law enforcement agencies and used against people in their search for basic resources. The homeless were often faced with a conundrum; admit to risky or criminal behavior to get a higher “score” to access resources but face further scrutiny from law enforcement. Eubanks discusses several examples in the book where data collected about personal or even family history detrimentally impacts a person’s ability to access public resources. This is why a “right to be forgotten” – or at the very least, transparency into what data is being collected and who it is shared with – is an important design principle for any system that tracks user data. Eubanks puts it best: “justice requires the possibility of redemption and the ability to start over. It requires that we find ways to encourage our data collection systems to forget. No one’s past should entirely delimit their future.”
- Smart variable selection. In Allegheny County, Pennsylvania in the early 2010s, a risk assessment model (what some might call “machine learning”) was implemented in order to predict child abuse and assist with foster care placements. But rather than predicting actual child maltreatment, the algorithm used the proxy outcome variable of whether a complaint would be made to a child abuse hotline. Using such an outcome variable was clearly misguided; lower-income families and families of color disproportionately have child abuse reports filed on them, many of which are false. Moreover, the Allegheny County algorithm was using a training dataset composed only of families that had accessed welfare services in the past, further increasing bias against lower-income communities of color. When the outcomes of such algorithms may result in a child being removed from their family, this type of variable selection is truly unconscionable. Unfortunately, these sorts of risk assessment tools are being used all over the public sector. Most notably in policing, similar tools aim to predict where crimes will occur based on past data about criminal activity! When any sort of predictive model is being built, the most fundamental questions that need to be asked are “what are our predictive variables?,” “what are we trying to predict?,” and “how do we measure success?” The best software uses unbiased training data, predicts real outcome variables instead of proxy variables, and carefully measures and benchmarks performance so as to not replicate biases.
The promise of software is that it allows us to reimagine complex systems from the ground up. But all too often, we design systems to replicate human decision making, processes, and therefore biases. It is disheartening that large systems integrators and consultancies like IBM are paid billions to build tools that all too often hurt our most marginalized communities. There is a clear opportunity for mission-driven entrepreneurs to build point solutions that address some of the problem areas discussed above. But the reality is that for many of these problems, only policy solutions will fix years of systemic injustices and mismanagement. I will leave you with a wonderful quote from Eubanks about this very concept: “It is a mere fantasy to think that a statistical model or a ranking algorithm will magically upend culture, policies, and institutions built over centuries.” But until these broad policy changes can be made in tandem, the small steps of improving our software can make a big difference.