Author : Anulekha Nandi

Expert Speak India Matters
Published on Mar 07, 2024

The AI development process from design to deployment perpetuates gender bias. Future legislation should recognise these gender-based risks to mitigate them

From data to deployment: Gender bias in the AI development lifecycle

This article is part of the series — International Women's Day


While developing Artificial Intelligence (AI) systems, raw data gets transformed into practical solutions through algorithmic systems. This process can broadly be divided into three phases: design, development, and deployment. While the design phase involves problem definition, data collection, processing, and preparation, the development phase involves experimenting with data to determine the right model followed by testing and evaluation. Once the model has met expected performance criteria, it is deployed in the intended application scenario and monitored to ensure it continues to produce expected results as it takes in live data.

Research on AI fairness, i.e., the goal of minimising unfavourable outcomes across demographic groups inflected by multiple intersectional identity attributes, highlights the interfaces or touch-points within its life-cycle whereby exclusionary conditions enter AI systems through biased datasets and processing parameters, gender insensitive classifications, and gender unresponsive evaluation and testing leading to amplification of existing biases. This stands to be exacerbated by the skewed gender balance in the AI workforce which have significant implications for the design and implementation of AI algorithms. 

Once the model has met expected performance criteria, it is deployed in the intended application scenario and monitored to ensure it continues to produce expected results as it takes in live data.

Systematic exclusion 

Algorithms represent computational codification and mathematical abstraction of complex social reality. As a result, algorithmic assumptions and parameters come to represent encoded knowledge and value systems of the designers who build them. Given how this depends on the interpretivity of the developers, it highlights the design conditions that instil certain perspectives, world views, and lived experiences to the exclusion of others. UNESCO's 2019 report “I'd blush if I could found that only 12 percent of AI researchers and 6 percent of professional software developers are women. Women are largely absent from frontier technology innovation, with only 21 percent of technology roles at Google filled by women. Within that, only 10 percent are working on machine learning. This underrepresentation entails the risk of developing new technologies that do not meet the needs of nearly half the population. The over-representation of men in designing AI technologies could undo decades of advancement and advocacy for gender equality resulting in systems that place women and other gender minorities at a disadvantage. Consequently, computational design comes to embody social inequalities as they continue to operate automatically and repeatedly. Delineating the contours of the operational domain or the problem space in which AI aims to intervene, what is included in the dataset, and what goes unquestioned or ignored becomes a form of power. This comes to be further compounded by the lack of gender-representative and gender-disaggregated datasets. However, AI outcomes, like social outcomes, depend on intersectionality or multiple intersecting attributes of identity that define haves and have-nots such as the intersections of caste, gender, or religion. Landmark research on bias in facial recognition systems highlighted how AI misgenders and misidentifies female faces in commercial facial recognition systems, misclassifying darker-skinned female faces the most with error rates up to 34.7 percent as opposed to 0.8 percent for lighter-skinned males. In Natural Language Processing, word embedding trained on Google News articles reinforced gender stereotypes where a doctor is a man and a nurse or receptionist is a woman. Even when datasets capture a single attribute of identity, their accuracy diminishes at the intersection of these attributes wherein a research on emergent intersectional biases showed that Mexican-American females had the worst algorithmic performance. Amplifying adverse outcomes through exclusion of women in the technology pipeline, AI assistants built into smartphones are not programmed to help women during a crisis as they do not understand words like ‘rape’ or situations of intimate partner violence in contrast to cases of say, cardiac arrest.

Delineating the contours of the operational domain or the problem space in which AI aims to intervene, what is included in the dataset, and what goes unquestioned or ignored becomes a form of power.

Risks of misrecognition of systemic harms

Prior choices of human decision-makers get reified in legacy datasets that come to be the basis of training for AI models. Like in the case of the résumé-screening model at Amazon that screened out women’s résumés. This was because the system had faithfully learnt from the data of past successful candidates that were predominantly male as a result of hiring practices, historical disadvantages of women’s access to education, and contemporary challenges of penetrating male dominated fields. Similarly, millimetre wave body scanners in large international airports cannot recognise trans bodies as they are encoded to recognised either male or female, thereby, subjecting transpersons to invasive humiliating security checks. Issues around systemic social contexts do not generally prefigure in conversations around data and algorithmic discrimination. Skewed datasets reveal pre-existing biases in legacy data collected by industries for e.g. according to popular statistics women are 47% more likely to be seriously injured and 17 percent more likely to die in a similar accident because seatbelts, airbags, and headrests in car crash tests are based on male dummies and their seating position and do not fit into standard measurements that can into account for women's breasts or pregnant bodies. Moreover, clinical trials largely exclude women including pregnant women, menopausal women, and women on birth control pills. Cardiovascular diseases were considered a man's illness while depression statistics were predominantly women's. These conditions foreshadow extant social biases in datasets and highlight the more systemic conditions beyond the immediate development context of AI. The pervasive nature of these harms tends to lead to their misrecognition i.e. their acceptance through institutionalisation and integration of AI systems across multiple consumer-facing business processes and public administration. These lead to the perpetuation of androcentric norms by either devaluing aspect coded as feminine or completely excluding them from the AI development exercise.

Women are missing from consequential decision-making in the AI pipeline from data preparation to algorithmic design and governance.

Operationalising principles and mainstreaming practices

The Global Dialogue on Gender Equality and AI organised by UNESCO highlighted the inadequacy of AI normative instruments or principles that focus on gender as a standalone issue. While advocacy groups have been working to raise awareness, third party intersectional evaluations and audits are still limited to the AI research community and not adopted for consumer facing applications. Even through commercial AI companies have been ramping up efforts for AI fairness, these still focus on single identity attributes and not on intersectional ones. Women are missing from consequential decision-making in the AI pipeline from data preparation to algorithmic design and governance. While proposed AI bills plan for risk mitigation, a number of critical questions remain unaddressed around definitions of harm, particularly from high impact AI systems, and individual and collective rights. There is limited global discussion on gender-sensitive AI policy, regulation, and legislation. Developing them would require a detailed analysis of challenges opportunities and understanding of discriminatory outcomes. The gendered effects produced by algorithms in operation tend to be less visible and apparent due to their inherent opacity and lack of transparency. This results in a time lag for direct gender effects to surface, if at all, in the form of direct violation of anti-discrimination laws such as the exclusion of female candidates for jobs or statistically associating women with lower creditworthiness. As the proposed Digital India Act aims to take a risk-based approach to AI regulation, it would be helpful to develop gender-sensitive and responsible AI standards, qualify gender-based risks of harms with clear guardrails for their redressal, promote representation of women and gender minorities in technology development, and ensure the inclusion of bias detection and evaluation methodologies and public access to such results for consumer and public-facing technologies.


Anulekha Nandi is a Fellow at Observer Research Foundation.

The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.