-
CENTRES
Progammes & Centres
Location
Attribution: Gaurav Sharma, “AI Systems as Digital Public Goods: Exploring the Potential of Open-Source AI,” ORF Issue Brief No. 612, February 2023, Observer Research Foundation.
Introduction
Addressing current global challenges requires innovative solutions, and Artificial Intelligence (AI)-enabled systems could be among them. The design and development of such innovations, however, is occurring mostly in developed economies, as countries of the Global South grapple with questions such as assigning accountability and the protection of privacy.
This brief makes a case for assigning AI systems the characteristics of a Digital Public Good (DPG), while underlining the impediments to such a strategy. In the financial sector, AI tools are already being used for repetitive task automation, fraud detection, and faster processing and response time for various transactions. The ease of use of AI in the financial sector can be attributed to the strict rules and regulations on customer data protection. In India, for example, the Reserve Bank of India (RBI) issues directives that require all banks and payment systems to safeguard customer information. Furthermore, the Public Financial Institutions (Obligation as to Fidelity and Secrecy) Act, 1983[1] prohibits financial institutions from divulging any information relating to the affairs of their clients.
This is not yet the case in social sectors such as healthcare and education. This brief explores the potential of open-source AI to make AI-enabled systems function in the various social sectors and scalable to larger populations. The merit of open-source high-value datasets has been showcased in projects such as the High Energy Physics (HEP) experiments where data are open to public at the Large Hadron Collider at CERN, Geneva.[2] Does open-source provide a similar opportunity to build collaborative advancements in AI for public good? Is the creation of open-source AI initiatives in the social sector feasible, say in healthcare or agriculture?
In the study of economics, public goods are defined as “non-excludable and non-rivalrous”.[3] This means that public goods adhere to two broad principles: (i) people cannot be “excluded from consuming” public goods; and (ii) “one person’s consumption does not reduce the amount available to other consumers.” A Digital Public Good (DPG), meanwhile, is defined as “open-source software, open data, open Artificial Intelligence models, open standards and open content that adhere to privacy and other applicable international and domestic laws, standards and best practices and do no harm and help attain the sustainable development goals (SDGs).”[4] Therefore, integrating the definition of ‘public good’ with the ‘digital’ domain, is to suggest free and open availability of digital products and services and free distribution, use, and reuse by all.
The UN high-level panel on digital cooperation sought to simplify the definition of DPGs by providing a number of indicator open standards.[5] These standards state that DPGs are platform-independent; use approved open licenses; produce detailed technical and software documentation such as source codes, use-cases, and functional requirements; have clear ownership and defined mechanisms for extracting data; and adhere to guidelines on the protection of data privacy and security.[6]
Pushing Artificial Intelligence (AI) systems to the domain of DPGs is a difficult task. To begin with, most AI systems are governed by rules on intellectual property (IP), and the free and open use of algorithms and datasets is minimal. As the ‘public’ notion of ‘digital public goods’ lies in free use, distribution, and adoption—there is a lacuna in legislation and regulation. This is true in many parts of the globe, and more so in developing countries. The applicability of AI systems as DPGs is related more to their use-case applicability in a particular sector. For example, AI systems in healthcare, to be designated as DPGs, must be able to provide healthcare service accessibility to all under strict data protection guidelines. For AI systems, there is an additional requirement for control over the original code of the digital good, to avoid alteration and misuse. This is also where AI systems suffer, as algorithmic logic and learnings are mostly owned by private enterprises and AI systems evolve as datasets grow. Thus, AI systems are bound by IP rights and are also in a perpetual state of change.
A.I. for Social Good
The principle of ‘AI for social good’ is interpreted to mean the use of AI technology for applications that redound to the welfare of communities. It encompasses AI applications, design and assessment frameworks and policy initiatives that are focused on benefitting not individuals, but societies as a whole. ‘AI for social good’ also refers to a set of principles that can inspire the design and assessment of AI systems, and provide a means to advance development of AI policies that prioritise action plans for the adoption of AI in the public interest. Table 1 lists some initiatives that aim to promote ‘AI for social good’.
Table 1. A.I. With a Social Impact: Examples
Organisation | Area | Social Purpose | Impact Sectors | |
Climate Change AI[7] | Climate Change | Use the power of AI and machine learning to help reduce greenhouse gas emissions (GHG) | Sectors such as energy and urban infrastructure development, and scalable to other sectors | |
Organization for Economic Co-operation and Development (OECD).[8] | AI Policy guidelines | Recommendation of the Council on Artificial Intelligence | All sectors | |
Bill and Melinda Gates Foundation (BMGF) and the German Development Cooperation (GIZ) [9] | Vernacular Languages – To democratise voice technology. | Save local languages and availability of AI services in local languages | Multiple sectors – healthcare, education, agriculture, financial inclusion | |
Lacuna Fund[10] – multiple international partners: The Rockefeller Foundation, Google.org, International Development Research Institute (IDRC), FAIR Forward: AI for All initiative of German Development Cooperation | Open Datasets: Funding creation of open datasets for social impact | World’s first collaborative effort to provide data scientists, researchers, and social entrepreneurs in low- and middle-income contexts globally with the resources they need to produce labeled datasets that address urgent problems in their communities. | Current: Language, Agriculture and Health. Scalable to other sectors | |
UNESCO: Ethics of AI[11] | Policy Recommendations on the Ethics of AI | The very first global standard-setting instrument on ethical use of AI | All sectors |
Healthcare is one important area where AI-enabled systems hold promise. For an AI system to be used in healthcare, the first imperative is to recognise that people are at both the supply side and the demand side of the healthcare sector. This means that people are generating the datasets e.g., through X-rays, ECG reports, Retina scans, among others—or around whose environment the data is generated; and people would also be at the receiving end of the output data generated by AI systems—for example, predictive analysis undertaken by an AI system, based on X-ray data, whether a patient has tuberculosis (TB) or not. Simply put, the AI system for social good would inculcate ‘data creation’ as a human-centric process, as most data would be created by people or would be implicitly about people; or is created by people for other people; or is a measurement of the environment that people live in (e.g. tracing a virus outbreak).[12]
At present, most large AI models are scraping datasets from the internet for voice, text, images, videos, and other data. This has little use for social-impact sectors such as healthcare. This is because for targeted solutions such as detection of TB, X-ray datasets are required and this in itself is a meticulous exercise demanding the organised collection of datasets from people who donate their health data based on trust and with appropriate data-protection mechanisms in place. A country that has experience in this regard is Israel, with its MIMIC-III[13] – a critical care database or over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC). The access to this critical information is via cloud platforms, but requires becoming a credentialed user of PhysioNet and adherence to strict data-use agreement. However, such detailed, secure and massive efforts of data standardisation, coupled with strict personal data protection schemes, are often absent in countries of the global South.
The application of AI systems for social good in sectors that collect datasets from human subjects—such as healthcare, education, nutrition, and poverty alleviation—is labour-intensive. The task of comprehensive data collection, tailored to solve a particular problem, has yet to gain adequate attention in countries of the global South.
The Missing Global South Representation
There are a number of large AI models existing at present, with the most prominent ones being developed in the ‘Silicon Valley’ of the United States. These are: GPT-3 and chatGPT (natural language models), by OpenAI in San Francisco;[14] BERT (natural language models) and ViT/G14[15] (computer vision models), by Google based in California;[16] Galactica by Meta (Facebook) also in California. These AI models are focused on the supply side of data generation, feeding enormous amounts of data by crawling the internet for training and fine-tuning. As data generation is a generic activity by online users, and data such as text, images, audio, and video are widely generated and shared openly by users on the internet, it is easy for large AI systems to crunch these digital datasets and train the big AI models.
The problem, however, is that these current, large AI model datasets represent mostly the populations of the global North. These exclude the three billion people who do not have access to internet, 96 percent of whom are residing in developing countries.[17] Therefore, there is a yawning wealth gap in inclusion in large AI models. This is one reason why very large AI models are difficult to classify as DPG: there is simply no equity in data representation.
Moreover, there is little sharing of knowledge regarding the deployment and use of AI systems in the socio-economic sectors. This can be attributed to the fact that most AI systems use cases are narrow in their application and designed to solve a specific problem, and therefore are difficult to replicate in other contexts. For example, the project ‘Sunroof’,[18] which helps in the estimation of rooftop solar potential in different parts of the United States, may be working well in those places but may not be easily deployable, for example, in unorganised settlements in cities like Delhi. The question remains, therefore, whether these successful global North AI systems are scalable and replicable.
Furthermore, the current knowledge and technical research about most AI systems, studies contexts in developed countries. These scholarly research, demonstrations, and conferences do not account for the impact of AI-enabled systems on populations of developing economies.
AI Systems as Digital Public Goods: Obstacles
As discussed briefly earlier, AI systems are being used extensively in banking and financial services and insurance in countries like the United Kingdom and Switzerland, where regulatory and legislative frameworks are in place. The European Union, with its General Data Protection Regulation in place, is also advancing the use of AI in the financial sector. Thus, AI does already credit itself, to a certain extent, as a ‘digital public good’ in the banking and financial sectors. The applicability in social sectors such as healthcare remains largely unexplored.
Much of primary research on AI systems and their development and deployment is taking place in countries of the global North, via either public sector institutions or the industrialised laboratories of Big Tech companies such as Google, Meta, Apple, IBM, Huawei, and Microsoft. These institutions are generally governed by proprietary practices and work with closed datasets and proprietary AI models; there is very limited open-sharing, if at all, of these AI models or datasets. Most AI systems, therefore, do not qualify as a ’digital public good’. The following points outline the most crucial challenges that inhibit the infusion of DPG characteristics into AI systems.
Recommendations
This brief makes the following recommendations to push AI systems into the realm of Digital Public Goods.
Conclusion
Advancements made by AI systems, especially with the explosion of large AI models, are bound to have a more pervasive effect on significant populations across the globe. Infusing notions of DPG for AI systems demands greater assurances and standards of care and governance structures. These begin from the conception stage of designing an AI system for social sectors. The notion of open AI datasets and open AI models, open sharing and cross-transfer of knowledge rooted in responsible integration of AI systems into the public sector—can pave the way for AI systems alignment as a digital public good. Open-source AI is one alternative approach that can be explored, coupled with governmental accountability, as open-source AI could provide a greater level of transparency and make it easier for AI systems to be accepted as a DPG by end users.
Furthermore, ethical and responsible AI practices must be put in place that are inclusive and would incorporate the aspirations of developing countries. A thoughtful process that embeds the societal beliefs and undertakes risk assessment of AI systems is imperative in all discussions. The public goods foundation of ‘non exclusion and nontrivial’ and support for an open-source AI definition can serve as the pillars to push discussions on AI systems as digital public goods. An approach that is based on use-cases, displaying examples from developing countries, could further strengthen the use of open-source AI. There is a need for the AI community to invest in open-source AI from a public-use perspective.
Gaurav Sharma is Artificial Intelligence Fellow at the Academy of International Affairs.
Endnotes
[1] The Public Financial Institutions (Obligation as a Fidelity and Secrecy) Act, 1983; December 1983.
[2] The European Organization for Nuclear Research (CERN) provides open datasets from particle physics. See
[3] Stanford Encyclopedia of Philosophy, “Public Goods”.
[4] United Nations, Report of the Secretary-General, “Roadmap for Digital Cooperation,” June 2020.
[5] Digital Public Goods Alliance, “Digital Public Goods Standard”.
[6] Digital Public Goods Alliance, “Digital Public Goods Standard”
[7] Climate Change AI is a global non-profit that catalyses impactful work at the intersection of climate change and machine learning.
[8] The Organisation for Economic Co-operation and Development (OECD) – Legal Instruments, “Recommendation of the Council on Artificial Intelligence,” 2019.
[9] Mozilla Foundation blog, “Mozilla Common Voice Received $3.4 Million Investment to Democratize and Diversity Voice Tech in East Africa,” May 24, 2021. (accessed December 28, 2022)
[10] Lacuna Fund is a global collaborative effort to fund labelled data for social impact in various domains – Language, Agriculture, Healthcare. See: https://lacunafund.org/about/
[11] Ethics of Artificial Intelligence, UNESCO.
[12] Rishi Bommasani, et al., “On the Opportunities and Risks of Foundation Models,” Stanford University Human-Centered Artificial Intelligence and Centre for Research on Foundation Models, 2022, https://arxiv.org/abs/2108.07258
[13] Alistair Johnson, Lucas Bulgarelli, et al., “MIMIC-IV,” PhysioNet – The Research Resource for Complex Physiologic Signals (2022), https://physionet.org/content/mimiciv/2.1/
[14] GPT-3 is a set of large AI models that can understand and generate natural language. See: https://beta.openai.com/docs/models/overview
[15] Xiaohua Zhai, et al., Google Research, Brain Team, Zürich, Scaling Vision Transformers, 2022, https://arxiv.org/pdf/2106.04560v2.pdf
[16] “BERT 101: State Of The Art NLP Model Explained,” Huggingface blog, comment posted March 2, 2022, https://huggingface.co/blog/bert-101 (accessed November 25, 2022)
[17] International Telecommunication Union, “Facts and Figures 2021: 2.9 Billion People Still Offline”.
[18] Project Sunroof, https://sunroof.withgoogle.com/
[19] GPT3 is a third-generation large-scale language model that can understand and generate natural language human-like text output. Not only can it produce text, but it can also generate code, stories, poems, and others. GPT-3 is trained on nearly 45 Tera Bytes (TB) of text data. GPT-3’s training data is still primarily English (93 percent by word count). See: https://arxiv.org/pdf/2005.14165.pdf, p. 14
[20] Chinasa T. Okolo, Nicola Dell, and Aditya Vashistha, “Making AI Explainable in the Global South: A Systematic Review,” Paper presented at the Conference on Computing and Sustainable Societies, June 29 – July 01, 2022.
[21] Gred Corrado, “Partnering with iCAD to Improve Breast Cancer Screening,” Google Blog, November 28, 2022.
[22] Daniel Kpienbaareh, et al., “Crop Type and Land Cover Mapping in Northern Malawi using the Integration of Sentinel-1, Sentinel-2, and PlanetScope Satellite Data,” Special Issue, Environmental Mapping Using Remote Sensing, 13 (4), 700, (2021).
[23] Open-source initiative, “Open- Source software is software that can be freely accessed, used, changed, and shared (in modified or unmodified form) by anyone”.
[24] Chiradeep Basu Mallick, “Top 10 Open Source AI Software in 2021,” Spice Works.
[25] Mallick, “Top 10 Open Source AI Software in 2021”
[26] Global Forest Watch offers the latest data, technology and tools that empower people everywhere to better protect forests. See
[27] Open-Source Initiative, “Frequently Answered Questions”.
[28] Alexandra Theben, et. al., “Challenges and Limits of an Open Source Approach to Artificial Intelligence,” Artificial Intelligence in a Digital Age, May 2021.
[29] The Global Partnership on Artificial Intelligence (GPAI) is a multi-stakeholder initiative which aims to bridge the gap between theory and practice on AI by supporting cutting-edge research and applied activities on AI-related priorities. See
[30] Mark D. Wilkinson, et al., “The FAIR Guiding Principles for Scientific Data Management and Stewardship,” National Library of Medicine: National Centre for Biotechnology Information USA, Sci Data, 3:160018, (2016).
The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.
Gaurav Sharma is Artificial Intelligence Fellow at the Academy of International Affairs.
Read More +