Image Source: Getty
2024 saw high-profile exits from OpenAI of co-founder John Schulman and researcher Jan Leike. They both joined its competitor Anthropic to focus on AI alignment research. AI alignment as a branch of AI research has been gaining traction with the increasing concerns around AI safety which is proving difficult to be tamed by legal and governance mechanisms alone. While definitions and approaches differ, AI alignment essentially refers to encoding human values to mitigate the risk of harm arising out of AI systems. This includes understanding broader social and contextual nuances within which AI systems operate.
The need for AI alignment continues to be reinforced with cases of misalignment that continue to plague implementation across sectors such as healthcare (with AI suggesting unsafe and incorrect cancer treatments), AI-driven content moderation on social media (balancing between freedom of expression and harmful content), financial markets (rapid algorithmic trading resulting in 2010 ‘Flash Crash’ in US stock markets), criminal justice (algorithms assigning skewed risk of recidivism scores based on race), among others.
AI alignment as a branch of AI research has been gaining traction with the increasing concerns around AI safety which is proving difficult to be tamed by legal and governance mechanisms alone.
AI alignment aims for scalable oversight to proactively monitor and address issues of misalignment, generalisation to ensure unbiased and context-appropriate responses, robustness to prevent erratic behaviour in unanticipated situations, interpretability to enable humans to trace the sequences of its decision-making, controllability to ensure human control over AI systems’ development and course correction, and governance mechanisms to develop standards and guidelines for AI development and use within principled and ethical boundaries.
Despite the relatively simplistic prescriptions of its operating principles, encoding the variety and variability of context, preferences, languages, and sensitivities is notoriously difficult. Ensuring alignment becomes difficult due to inscrutability and the opaque nature of AI algorithms. There is always a trade-off between transparency and interpretability and the computational economy of those models as well as their performance. Computational mathematical evaluation techniques for rigour and relevance cannot always fully encapsulate the diverse and dynamic nature of social reality.
As AI capabilities continue to advance, there remain outstanding dangers of models developing overconfidence, hallucinations, or sycophancy with the AI system overly agreeing with the user irrespective of factual accuracy. This highlights the need for continuous and ongoing oversight. Even with methods like Reinforcement Learning through Human Feedback (RLHF) models remain susceptible to the bias in the feedback itself heightening the tendency of models to seek approval from human annotators.
Computational mathematical evaluation techniques for rigour and relevance cannot always fully encapsulate the diverse and dynamic nature of social reality.
Responsible innovation as a techno-institutional approach
Research in the management of information systems has often viewed information technology alignment as a dynamic and continuous process of adjustments between technical systems and their contexts in attempting to ensure a fit between technological capabilities and organisational strategy and operations, existing practices, and the wider social context, in essence, to arrive at the technological-contextual fit. The responsible innovation paradigm has its roots in discussions around the social responsibility of science. In other words, it talks about collective responsibility for futures created, transformed, or disrupted by scientific and technological innovation. However, taking action for the future in the present is characterised by trade-offs or tensions in balancing precautions, scientific autonomy, and the risk of missed opportunities. Instituting responsible innovation requires deeper systemic transformation across policies, processes, and institutions along with technological guardrails.
Research highlights that responsibility is straightforward when characterised by low uncertainty between action and impact. For instance, algorithms discriminating against women candidates in hiring is a clearly identifiable case of bias to be mitigated. Similarly, it is considered complex in conditions of high uncertainty between action and impact, e.g. AI mediated determination of the boundaries of freedom of expression and harmful content. This is further compounded when issues of responsibility challenge or run counter to what counts as professional excellence in the field indicating the need for responsible alignment initiatives to be congruent with operational and feasibility imperatives. Moreover, given AI development increasingly occurs within an ecosystem involving multiple stakeholders, it highlights the need to understand the locus of responsibility and control.
Large foundational models are increasingly becoming the basis for AI application development across sectors where bias in their compositions can creep into the application developed on top of it.
Responsible AI ecosystems
AI alignment as responsible innovation is more proactive than reactive, i.e. it extends beyond accountability for potentially unwanted outcomes to encompass anticipation and responsiveness and iterative engagement with the design and product to ensure constant reflection, inclusion, and alignment with expected values. Iterative value alignment becomes imperative as the development of AI systems is brought about through the interdependency among a wide range of resources and stakeholders with distributed ownership and control. Understanding how these different stakeholders are arranged in an ecosystem helps in the ascription of associated responsibilities depending on the segment of the value chain they contribute to. For e.g., large foundational models are increasingly becoming the basis for AI application development across sectors where bias in their compositions can creep into the application developed on top of it. Both developers and deployers of the said technology have an important role to play in risk mitigation. Ascribing responsibility based on stakeholder position within the ecosystem helps find not only external congruency on norms and values but also internal fit based on organisation capabilities i.e. whether responsible AI initiatives are in line with operational imperatives to ensure their sustainability. Tiered responsibility structures that build upwards into higher-order implications for responsibility depending on the ownership and control of the product. This highlights the need for trilateral management of AI alignment as responsible innovation through technical management, ecosystem approach, and institutional capability.
Anulekha Nandi is a Fellow at the Observer Research Foundation.
The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.