Does the hype of Generative AI need top-down regulation, or will it implode?

Sep 13

Written By George Thomas

Barend Mons

Professor of BioSemantics, Human Genetics Department

Leiden University Medical Center

Introduction

The large language model (LLM) tools we see today are essentially using a form of advanced autocompletion based on massive input, which is potentially itself of questionable validity. The infamous ‘hallucinations’ we see being produced are at least in part a result of poor inputs as well as of a lack of validated conceptual models to constrain the LLM’s algorithms and output. Attempts to regulate these tools, and the concomitant hype, may only play into the commercial interests of their creators.

The ‘blind’ use of computational models to analyze anything (data or information), without the proper underpinning of conceptual modelling (data and algorithms), is dangerous and leads to all kinds of meaningless extrapolations, including the famous ‘hallucinations’ of LLM outputs.

Regulation or leading by example?

Two basic good practices in data stewardship and analysis could be more effective than regulation and could mitigate the well-known downsides of so called ‘AI’ models for human knowledge acquisition and validation. First, it has been well established that unless machine reasoning models are fed and constrained by foundational ontologies and conceptual models, their output will contain many violations of such conceptual models in the minds of recipients (humans for the near future). The underlying information science of this phenomenon has been thoroughly studied and described in Guizzardi et. al (2023)[1].

Second, the substrate for reasoning, both for humans and machines, is data and information of various kinds and origins. The term ‘data’ covers a variety of concepts. To begin with, there is raw data coming from sensors. These data are not extensively used raw to feed analytical models, but are processed, at least minimally, to specify their intended ‘meaning.’ So, models ingest processed data, based on a minimal conceptual model, regarding the ‘meaning of the individual datums,’ their values, and their internal relations. Some people refer to this process as ‘curation.’

We should also distinguish between three categories of such pre-processed input. These categories include experimental data, created in the context of a directed experiment, and real world observations, collected by regular observation (of a hospital patient, for example), without any particular discovery in mind, but rather to gain insight into the immediate situation.

Both processed experimental data (ED) and real world observations (RWO) contribute to the growth of a third category known as established knowledge (EK). This latter term is chosen for lack of a better one, as it suggests ‘truth’ beyond the fact that these insights (or associations) are published in typical EK communication channels for most people’s benefit (such as narrative, figures, tables, videos, or podcasts). It should be intuitive that if we feed any model with a mix of these categories, machines will do what they are good at: namely recognize myriads of patterns and correlations without any conceptual model ‘in mind.’

A first bias could thus be to feed models only with (peer reviewed) EK. However, as we are increasingly learning, even formal scientific literature, including published clinical trials, can be plagued by inconsistencies, errors and in some cases fraud.[2, 3] Unless we handle the system much better than we do now, machine-generated output will greatly aggravate this problem and will ultimately, when ‘fed with its own dog poop’, head to model collapse.[4]

Figure 1 summarizes the problem of the current ‘indiscriminate’ input mix of ED, RWO and EK. This problem is true of both small language models (SLMs) and of LLMs, and essentially of any non-supervised model. Without a proper distinction between three categories of input, with machines fed as if everything were EK (panel A). Without a conceptual model constraining the analysis, we face the likelihood of many hallucinations, here defined as patterns that make perfect sense to a machine without any constraints based on a conceptual model, but that do not contain any actionable knowledge for people. [5]

In these cases, any output will rely on the extensive human post hoc evaluation of results and their provenance in the analytical procedure. If we assume that a substantial part of EK is highly ambiguous (certainly for machines), and in some cases plain wrong, we also know that any model based purely on EK will propagate and potentially enlarge ambiguities and errors.

I would argue this is why, above all, input should be Fully AI Ready (an alternative wordplay on the original FAIR acronym, which Findable, Accessible, Interoperable and Reusable for machines as well as people).[6]

The first step is to realize that experimental data (ED) and real world observations (RWO) can augment and in some cases correct what we call established knowledge (EK). Even ED (analyzed through a particular conceptual model underlying the experiment and its methods), as well as RWO (through the conceptual framework of the observer), are not without bias and errors, but are to be separated from each other – and definitely from EK – for several reasons (see figure 1, panel B). Above all, if they are mixed (see figure 1, panel A), machines will generate an unmanageable number of hypotheses.

Mutatis mutandis, when massive calculus is used to reveal more associations by machines (in essence by simply spitting out recognizable patterns), we will end up with an unimaginable number of hypotheses that cannot be properly validated as established knowledge. That obviously does not make them untrue per se, but their networks of associations are far too complex for the brain of H. Sapiens to comprehend or meaningfully share, and never lead to actionable knowledge for humans. If we leave machines to freely play around with the 99% of potential knowledge we do not understand, the results may look exciting and even amazing at first sight, and indeed there may be jewel-needles hidden in the insurmountable haystack of associations. But what we are ultimately looking for in science and innovation is actionable and verified knowledge.

It follows that the substrate we feed any model is minimally as important for the ultimate output of a knowledge modelling pipeline as the model(s) used. It also implies that we should treat, and publish, ED, RWO and EK, each with their own inconsistency problems, separately, and never mix them in the analytical process. We should use these distinct categories of information as mutually correcting and enhancing and do so in a controlled manner. This approach is depicted in panel C (figure 1).

We should publish ED, RWO and EK wherever possible as machine actionable units of information (also known as FAIR digital objects, or FDO).[7] For the atomic unit of meaningful information with rich provenance, formatted in machine readable elements, the term ‘nanopublication’ was coined in 2009. The accumulation of ‘cardinal assertions’ (multiple assertions with identical content, but different provenance), and ultimately ‘knowlets’ as FAIR digital twins, are described in detail in a recent Frontiers-published article.[8]

Conclusion

Misinformation is likely to plague us for decades to come, so we need to use machines and human-designed models in a responsible way. We should not throw unfinished products into the wild and abuse humans as experimental subjects to improve the output of these tools which in turn compound the misinformation pandemic.

However, seeking a top-down approach to curbing such outcomes would drive machine learning into areas where regulations cannot reach. Trustworthy input and the consistent exposure and critique of hype are vital – as are the principles of leading by example and feeding models with proper substrates and conceptual constraints.

[1]https://arxiv.org/abs/2304.11124

[2] https://www.theguardian.com/science/2017/jun/05/dozens-of-recent-clinical-trials-contain-wrong-or-falsified-data-claims-study.

[3] Cyranoski, D. Retraction record rocks community. Nature 489, 346–347 (2012). https://doi.org/10.1038/489346a

[4] https://arxiv.org/abs/2305.17493

[5] Nota bene: this does not mean they are necessarily ‘wrong’, now or in the future.

[6] https://www.nature.com/articles/sdata201618 and for updates: https://www.gofair.foundation/interpretation

[7] https://research.utwente.nl/files/300207928/2302.11894.pdf

[8] https://doi.org/10.3389/fdata.2022.883341

Artificial intelligence