Author : Anulekha Nandi

Expert Speak Raisina Debates
Published on Jul 10, 2024

Data silences pervade both national population statistics and Big Data leading to a mode of unquantified exclusion. Overcoming it requires recognition and acknowledgement for redressal.

The fault in our data: Data silences and inequalities

This is part of the essay series: World Population Day 2024


Nearly 850 million people globally do not have access to any form of legal identification and are predominantly from low- and lower-middle-income countries in Sub-Saharan Africa and South Asia with inadequate civil registration systems. For nearly half of this population, the lack of birth registrations prevented them from obtaining a national ID. These exclusions are further intersected by gender and socio-economic status with women 8 percent less likely to have an ID than a man. The gap is exacerbated for adults below 25 years, with low education, unemployment, or with low incomes belonging to the bottom 40 percent of the income distribution. Identity and identification systems are essential for individuals to participate in the economy and society whether it be in the form of employment, voting, opening a bank account, or accessing social protection. They remove friction and reduce transaction costs for an individual’s interaction with the state and market. Nearly 40 percent of adults without an ID reported difficulties in obtaining a SIM card or a mobile phone, with 25 percent facing barriers to accessing healthcare. 

United Nations Population Fund (UNFPA) estimates that tens of thousands of births and two-thirds of global deaths go unregistered. This is compounded by outdated population data.

United Nations Population Fund (UNFPA) estimates that tens of thousands of births and two-thirds of global deaths go unregistered. This is compounded by outdated population data. According to the United Nations Statistics Division (UNSD), during the COVID-19 pandemic, out of the 121 countries scheduled to conduct population and housing census in 2020 or 2021, nearly 50 percent postponed it to 2021 and 15 percent postponed it to 2022 or beyond. Access to health, social protections, and other basic services are determined based on enumeration exercises. Without new census data, old census data tends to be used for disbursements under welfare programmes which leaves a substantial proportion of the population excluded. For example in India, according to the National Food Security Act, 2013, the Public Distribution System providing subsidised food grains is supposed to cover 75 percent of the population in rural areas and 50 percent in urban areas. Public data collection and national statistical estimates are the mode by which citizens become legible or visible to the government to enable the administration and provision of public services. As the data remains silent about these uncounted and excluded populations, datasets come to persist within blind spots and gaps that structure a form of unquantified exclusion. 

Big data, small data: Paradox of potential and pitfall

The increased penetration of digital technologies gave rise to vast quantities of digital-born data or Big Data. Big Data, in many ways, was deemed to be the answer to enduring data silences in national population statistics. However, what Big Data can contribute in terms of volume, it lacks in terms of granularity and developers are faced with missing data on parameters of interest. This often leads to the identification of spurious causal relationships. During the 2014-16 Ebola crisis, call data records were heralded as the Big Data solution for containment by tracking the geographical spread of the disease by tracking cell phone signals. A study in Sierra Leone showed how this failed because experts and development professionals from the Global North failed to appreciate the cell phone usage patterns within the country. Cell phones tended to be a shared commodity traded and borrowed between friends and family with individuals having multiple connections to optimised data costs. 

The increased penetration of digital technologies gave rise to vast quantities of digital-born data or Big Data. Big Data, in many ways, was deemed to be the answer to enduring data silences in national population statistics.

Concurrently, Big Data tends to be messy and noisy and taming it for use engenders a different set of silencing practices. To make them fit for modelling and analysis, they need to undergo processes of cleaning, wrangling, curating, and feature-engineering. These entail decision-making about how the data is prepared i.e. inclusions and exclusions. For e.g. the reduction of gender identity to male/female binary has led to exclusions for gender minorities. This has translated to the silencing of non-conformist data categories where data may come to be distorted to fit existing preconceptions. Sometimes these non-conformist data categories comprise a relatively small proportion of the datasets, and are often considered insignificant. As a result, they are not converted from raw data to data categories with semantics and meaning. These syntactic silences lead to undercounting or even uncounting of certain populations. 

Listening to silences: Interoperability and contextual fit

Data gaps are often filled by spurious relationships as highlighted above. However, sometimes the correlations drawn can lead to heightened levels of bias and discrimination. For example. linking a person’s zip code or language to evaluate their creditworthiness or ability to hold down a job. Leading up to the global economic meltdown in 2008, African American and Hispanic applicants were prime targets for sub-prime loans. African Americans were 2.8 times more likely to be denied a loan with the likelihood for Hispanics being 2 times. Both were 2.4 times more likely to receive a sub-prime loan compared to a white applicant. This highlights the importance of recognising the nature of data silences to mitigate their adverse fallouts. 

Data gaps are often filled by spurious relationships as highlighted above. However, sometimes the correlations drawn can lead to heightened levels of bias and discrimination.

Data silences stem from long-standing systemic issues and need careful evaluation, recognition and acknowledgement for redressal. Analysts, policymakers, and development professionals need to have better contextual awareness to identify appropriate parameters of interest for a given development outcome or policy scenario and to avoid modelling spurious relationships. Missing data tends to be an inescapable reality which is often mitigated by mathematical approaches like regression imputations which can obfuscate error rates. One of the ways in which data silences can be mitigated is through better linkages of existing repositories of data by instituting data standards and protocols for efficient data exchange between different departments in the public sector as well with the data that companies and multilateral organisations can provide in the public domain. 


Anulekha Nandi is a Fellow at the Observer Research Foundation.

The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.

Author

Anulekha Nandi

Anulekha Nandi

Anulekha Nandi is a Fellow at ORF. Her primary area of research includes technology policy and digital innovation policy and management. She also works in ...

Read More +