Toward a polycentric or distributed approach to artificial intelligence & science
Dr. Stefaan G. Verhulst
Co-Founder and Chief Research and Development Officer of the Governance Laboratory at New York University (NYU)
Co-Founder and Principal Scientific Advisor of The Data Tank
Research Professor, NYU Center for Urban Science + Progress
DOI: https://doi.org/10.25453/plabs.25964740.v1
Published on May 22nd, 2024
Even as enthusiasm grows over the potential of artificial intelligence (AI), concerns have arisen in equal measure about a possible domination of the field by Big Tech. Such an outcome would replicate many of the mistakes of preceding decades, when a handful of companies accumulated unprecedented market power and often acted as de facto regulators in the global digital ecosystem. In response, the European Group of Chief Scientific Advisors [1] has recently proposed establishing a “state-of-the-art facility for academic research,” to be called the European Distributed Institute for AI in Science (EDIRAS). According to the Group, the facility would be modeled on Geneva's high-energy physics lab, CERN, with the goal of creating a “CERN for AI” [2] to counterbalance the growing AI prowess of the US and China.
While the comparison to CERN is flawed in some respects–see below–the overall emphasis on a distributed, decentralized approach to AI is highly commendable. In what follows, we outline three key areas where such an approach can help advance the field. These areas–access to computational resources, access to high quality data, and access to purposeful modeling–represent three current pain points (“friction”) in the AI ecosystem. Addressing them through a distributed approach can not only help address the immediate challenges, but more generally advance the cause of open science and ensure that AI and data serve the broader public interest.
Distributed Solutions to Three Challenges
The European Organization for Nuclear Research (now widely known as CERN, an acronym for the organization's French title) has in many respects been a notable success. Yet the center, which was established in 1954, is in some ways an institution from another era. In particular, as I have written elsewhere in the context of global digital governance, [3] a single actor is insufficient to manage 21st-century complexity, and to scale scientific progress in an inclusive and open manner.
Instead, borrowing from Elinor Ostrom, I argue that the challenges of our era require “polycentric” [4] governance - where multiple, overlapping, and autonomous centers collaboratively manage and provide access to shared resources; we cannot rely on any single institution or center, no matter how well-funded or otherwise impressive. Hence the potential of a globally and sectorally distributed model that would facilitate coordination, access, and collaboration across all geographies and stakeholders.
As noted, a distributed model may be helpful in addressing three particular challenges or pain points in the current AI research and development ecosystem:
Access to Computational Resources: AI is a resource intensive field, a reality that poses hurdles to the broader goal of inclusive innovation. In addition, the growing need for data storage presents additional challenges, with ever-increasing demands for storage space and significant costs, especially for researchers of the Global South. Wider availability of computational resources–including data, data algorithms, cloud infrastructure and processing power–is therefore one of the chief potential benefits of a more distributed approach. Distributed computational resources, similar for instance to CERN's LHC Grid model, [5] can help democratize AI development, in the process promoting greater competition and innovation.
One approach that could be helpful in this area is to set up decentralized infrastructure that would allow researchers to access necessary computational resources irrespective of their geographic location or technical conditions. In addition, an independent assessment of the emergent field of open source federated learning systems [6] (such as Flower, Substra, FATE (Federated AI Technology Enabler), PySyft, OpenFL and TensorFlow Federated) would help researchers determine what might be most fit for purpose.
Data Access and Collaboration: Data is the lifeblood of AI, and access to data is a vital underpinning of any healthy AI ecosystem. Yet even as we live through an era of unprecedented data plenty, the emergence of new silos and chokepoints threatens to limit and effectively privatize access. This is especially challenging for researchers from low-resource regions who often face significant barriers to accessing and utilizing high-quality data. Once again, a distributed approach can prove helpful, allowing for the sharing and reuse of vast data sets (e.g., through initiatives like the European Science Cloud) [7].
Data collaboratives [8] are an essential structure that can help achieve the goal of decentralized data access. A relatively new form of private-public collaboration, data collaboratives break down silos and allow researchers and other stakeholders, with the support of data stewards, [9] to share and reuse private data for the public good. Recent European legislation, such as the Data Governance Act [10] and the Data Act, [11] seeks to provide the policy foundation to accelerate these collaboratives.
In order to facilitate such sharing, we also need to broaden existing notions of consent to encompass the concept of a social license [12]. Social licenses go beyond the current–and limited–focus on individual consent to enable sharing by considering the preferences and expectations of communities. Building notions of social license into the data ecosystem can help ensure that a distributed approach to AI is attuned to the diverse needs and societal values of various populations, thereby promoting responsible and inclusive AI development.
Access to Purposeful AI Modeling: AI is not just a function of access to data and computational resources. A third, equally important, component of distributed AI is access to purposeful modeling–the platforms and algorithms that provide an interface or “middle layer” between the underlying data and end users. Access to these models, which are technically sophisticated and resource intensive, is crucial to ensure that the benefits of AI are equitably distributed, and that AI responds to genuine social and public needs.
Purposeful, in this context, refers to the relevance and genuine applicability of AI and AI models. In an era of constrained public budgets and apparently unrestrained public problems, prioritization of resources–financial and technical–is essential. Distributed access to AI modeling should be focused on those areas (e.g., health, climate, etc.) that most directly impact the public good, and that are often underserved by the private sector. Identifying such areas is a key challenge. A model for one successful approach can be found in The GovLabs’ 100 Questions Initiative, [13] which sought to enlist domain and data specialists to identify the most pressing public challenges that would be amenable to data solutions.
Toward Distributed Open Science
The approach outlined here can help harness the potential of AI to address a range of wicked public problems. Equally, it’s worth pointing out that a distributed, decentralized AI institute of the type currently under discussion can help advance the cause of open science.
Open science is, in essence, aimed at establishing a more inclusive, innovative, and transparent research environment. It would open up foundational AI models, [14] ensure greater transparency in methodologies and data, foster global collaboration, and set new standards for scientific inquiry, both related to AI and more generally (e.g., in drug development, combating global warming, and more). Advancing open science through distributed AI is therefore not only an ethical imperative in its own right; it has potentially huge ripple effects for humanity, and contains the possibility of solving a wide range of our most pressing–and apparently intractable–public problems.
[1] https://op.europa.eu/en/publication-detail/-/publication/2a6e3d4f-fae0-11ee-a251-01aa75ed71a1/language-en
[2] https://sciencebusiness.net/news/ai/eu-science-advisers-back-call-cern-ai-aid-research
[3] https://www.gp-digital.org/publication/a-distributed-model-of-internet-governance/
[4] https://www.routledge.com/Global-Digital-Data-Governance-Polycentric-Perspectives/Aguerre-Campbell-Verduyn-Scholte/p/book/9781032483108
[5] https://home.cern/science/computing/grid
[6] https://www.apheris.com/resources/blog/top-7-open-source-frameworks-for-federated-learning
[7] https://eosc-portal.eu/
[8] https://datacollaboratives.org/
[9] https://www.linkedin.com/learning/global-data-stewardship/the-growing-role-of-data-stewardship?u=2131553
[10] https://digital-strategy.ec.europa.eu/en/policies/data-governance-act
[11] https://digital-strategy.ec.europa.eu/en/policies/data-act
[12] https://ssir.org/articles/entry/the_urgent_need_to_reimagine_data_consent#
[13] https://the100questions.org/
[14] https://www.technologyreview.com/2024/03/25/1090111/tech-industry-open-source-ai-definition-problem/?utm_campaign=site_visitor.unpaid.engagement&utm_source=LinkedIn&utm_medium=tr_social
Copyright: © 2024 [author(s)]. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in Frontiers Policy Labs is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.