Friday, May 5, 2023

GIGO and evil computers

The rise and rise of Generative Artificial Intelligence (AI) has been the main technology talking point of the last 6 months. Text services like ChatGPT and Google Bard are now able to engage in dialogue that may evenpass the Turing Test. And image generation services such as DALL-E & Stable Diffusion can create realistic and almost life-like pictures just from a text description.

To perform these feats means training these large language models (LLMs) on vast amounts of data that is computationally expensive and therefore prohibitive for many organizations.

For example, OpenAI's GPT-3 LLM was trained on a dataset of 175 billion words. This dataset was collected from a variety of sources, including books, articles, websites, and code repositories. Google's PaLM LLM was trained on a dataset of 540 billion words,

But as the flurry of initial excitement about Generative AI now dies down a bit, focus is turning to the sources of data used. With the worry being that unless that data is reputable and trustworthy, these systems will have the wrong inputs to base their machine learning algorithms upon.

Perhaps more than ever, the computing phrase of “Garbage in, Garbage out” is worth more than a passing consideration.



Image created using Stable Diffusion and the prompt "an evil computer plotting the downfall of civilisation"

No comments: