The rise and rise of Generative Artificial Intelligence (AI) has been the main technology talking point of the last 6 months. Text services like ChatGPT and Google Bard are now able to engage in dialogue that may evenpass the Turing Test. And image generation services such as DALL-E & Stable Diffusion can create realistic and almost life-like pictures just from a text description.
To perform these feats means training these large language
models (LLMs) on vast amounts of data that is computationally expensive
and therefore prohibitive for many organizations.
For example, OpenAI's GPT-3 LLM was trained on a dataset of
175 billion words. This dataset was collected from a variety of sources,
including books, articles, websites, and code repositories. Google's PaLM LLM
was trained on a dataset of 540 billion words,
But as the flurry of initial excitement about Generative AI now
dies down a bit, focus is turning to the sources of data used. With the worry
being that unless that data is reputable and trustworthy, these systems will have
the wrong inputs to base their machine learning algorithms upon.
Perhaps more than ever, the computing phrase of “Garbage in,
Garbage out” is worth more than a passing consideration.
Image created using Stable Diffusion and the prompt "an evil computer plotting the downfall of civilisation"
No comments:
Post a Comment