Dangers of blind faith in data

Asimov’s Foundation series warns of a world in which statistics can predict the future. This future is closer than ever due to machine learning (ML) technology, which enables computer algorithms to make predictions using observed data with little human input.

These futuristic technologies have inched their way into India’s civil space to measure economic disparity and urbanisation and even to predict natural calamities, but their potential remains largely untapped. It is imperative that a fair assessment is first made to understand its positive and negative outcomes.

Many western countries are capitalising on the world’s exploding data availability to leverage ML for urban government decision-making, particularly in law enforcement and justice. Digital programmes such as Aadhaar, Digital India, and Smart Cities make India a potential ground-breaker in data-driven governance.

Prime Minister Narendra Modi believes that technology “should be used for the betterment of the humankind” and that “science and technology are value-neutral; the human objective guides the outcome of the technology.” Unfortunately, good intentions are often not enough to ensure just outcomes and eliminate potential pitfalls in using ML technology.

Predictive policing

The Delhi Police recently launched CMAPS (Crime Mapping, Analytics and Predictive System), joining the international ranks of cities using ML for “predictive policing.” Aiming at crime prevention, predictive policing initiatives strategically distribute police resources by using records of crimes, police interactions, and social networks to forecast the places and/or people most likely to commit or suffer from crime. Despite good intentions, this use of ML can fall dangerously close to profiling.

Discrimination in these algorithms’ predictions originates from social bias implicit in the data used to train the algorithm. To be “value-neutral” in their learning processes, ML algorithms generally treat their teachers as perfectly just and thus replicate any skews. Places or individuals marked dangerous by the algorithm are patrolled more, inflating crime rates correlated with profiled places or individuals in police reports. The algorithm, including its magnified biases, is retrained on the results of its own forecasts, increasing the skew of predictions. This is evident from an Oakland study showing the algorithm’s transformation into racial profiling, but similar patterns could arise from urban inequality surrounding religion, caste, or class. Though targeting by human officers can be addressed through training, ML’s veneer of objectivity removes the ability to correct biases.

Risk scores

Researchers from Dehradun Institute of Technology created an algorithm that predicted New York parole decisions with over 75 percent accuracy using demographic and crime data. Similar technology is now used to create “risk scores” that advise sentencing, bail, and parole decisions. Advocates hope that this data-driven approach will trace subtle patterns that optimally balance the community’s safety against the resources required to imprison people, giving second chances exclusively to people who are not predicted to become repeat offenders. Risk score algorithms also fall victim to social bias in input data; Propublica, an American non-profit investigative journalism newsroom, found that in Broward County the risk score algorithm predicted criminality 77 percent more for black defendants than for whites despite isolating criminal history, age, gender, and recidivism. The risk scores were inaccurate; only a fifth of those predicted to commit violent crimes did so. Although this algorithm never explicitly asked for race, questions like “was one of your parents ever sent to prison” allowed structural inequality to influence results.

Risk score algorithms can also fall victim to reinforcement bias; the data collection method for updating the algorithm magnifies the original social biases. If the algorithm assigns a low risk score to a defendant who later commits a crime, it adjusts accordingly. However, no empirical data can show that an individual assigned a high risk score and thus not released would have behaved lawfully. The algorithm therefore cannot unlearn unjust negative associations as easily as it forms them. This reinforcement bias can be addressed through experimentation, ignoring randomly selected individuals’ risk scores, but high-stakes situations generate understandable resistance to these risks.

Doing it right

While social and reinforcement biases directly cause algorithms’ perpetuated injustice, inordinate power enables this injustice. The clash between the natures of good governance and ML makes powerful algorithms dangerous. Good governance and justice rely on accountability so every decision can be questioned on merit, explained, and appealed. The Indian government thus facilitates accountability in human institutions. However, the ability of the newest and most powerful algorithms to trace patterns undetectable through simpler methods relies on the opacity and complexity of their formulae. This intricacy means they cannot explain or justify their results through human concepts or easily adjust skews and so they cannot be held accountable. While bias in policing or the justice system can be mitigated through trainings, algorithms lack correction measures and thus accountability. Though ML has the potential to be a powerful tool for governance, its propensity towards bias and lack of accountability make it crucial to understand algorithms’ limits, restrain their power, and question their findings.

My Choices Foundation (MCF) is an Indian charity intervening in modern slavery through local campaigns informed by predictive technology. Their partner Quantium built an algorithm that incorporates factors like poverty levels, job opportunities, drought risk, and health statistics to predict which neighborhoods across India are most at risk of targeting by human traffickers. MCF uses these predictions to target grassroots initiatives that educate communities about trafficking to prevent ensnarement. The alignment of intention between MCF and the communities analysed by its algorithm is crucial to ensuring a positive result. By intelligently distributing support and resources to at-risk communities, MCF pursues of crime prevention more ethically than previously described initiatives. While mis-targeting of resources could lower the campaign’s effectiveness, these are supplemental beneficial resources so potential damage is limited. Though the data may be biased, the next year’s trafficking data is less dependent on the algorithm’s predictions and so algorithmic bias is temporary. This initiative is currently pursued by an NGO, but the government’s access to more precise data could make a similar programme beneficial in trafficking prevention and identification of risk factors in urban areas.

ML’s tendency to replicate biases can be inverted and used as a tool to quantify leanings in training data. Stanford researchers used the word embedding framework to find correlations between gendered/racialised words and occupations over a century of text data. The correlations aligned closely with alternative measures of social bias, suggesting that this framework can accurately measure stereotypes. By leveraging ML’s intrinsic weakness, susceptibility to bias, these researchers created a tool that can address social inequality rather than perpetuate it. Though this research could be difficult to implement in India due to the diversity of languages and the dearth of parseable text data, the principle of utilizing bias to measure complex social phenomena can be applied to a host of other settings.

ML algorithms are, as Prime Minister Modi described, “value-neutral” only before they are implemented. Once trained, they learn the values encoded in their data regardless of the values’ merit. Reckless replacement of human decisions with algorithm power can make this bias more dangerous. ML approaches still brilliantly trace patterns and improve measurements or predictions, and research is currently being undertaken to identify ways to “de-bias” algorithms. However, before this tendency towards bias is solved, implementers must consciously overcome the problems that create unjust algorithms. By becoming informed about the strengths and weaknesses of ML algorithms as a governance tool, governments can leverage these attributes to improve and modernise policies in a just and accountable way.

Shohini Stout is a Research Intern at Observer Research Foundation, Mumbai. She is a student of Massachusetts Institute of Technology, studying mathematics, computer science, and public policy

The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.

PREV NEXT

Expert Speak Urban Futures

Published on Aug 07, 2018

Predictive policing

Risk scores

Doing it right

Publications

Foreign Policy Contestation and Indian Democracy

International Affairs

Apr 26, 2024

China debates the current churn in the South China Sea

International Affairs

Apr 26, 2024

Essay Series

Long-form

Progammes & Centres

Location

About ORF

Engage

People

Dangers of blind faith in data

Published on Aug 07, 2018

Predictive policing

Risk scores

Doing it right

Publications

Apr 26, 2024

Apr 26, 2024