Generative AI is an umbrella term for any kind of automated process that uses algorithms to produce, manipulate, or synthesize data, often in the form of images or human-readable text. It’s called generative because the AI creates something that didn’t previously exist. That’s what makes it different from discriminative AI, which draws distinctions between different kinds of input. To say it differently, discriminative AI tries to answer a question like “Is this image a drawing of a rabbit or a lion?” whereas generative AI responds to prompts like “Draw me a picture of a lion and a rabbit sitting next to each other.”
This article introduces you to generative AI and its uses with popular models like ChatGPT and DALL-E. We’ll also consider the limitations of the technology, including why “too many fingers” has become a dead giveaway for artificially generated art.
The emergence of generative AI
Generative AI has been around for years, arguably since ELIZA, a chatbot that simulates talking to a therapist, was developed at MIT in 1966. But years of work on AI and machine learning have recently come to fruition with the release of new generative AI systems. You’ve almost certainly heard about ChatGPT, a text-based AI chatbot that produces remarkably human-like prose. DALL-E and Stable Diffusion have also drawn attention for their ability to create vibrant and realistic images based on text prompts. We often refer to these systems and others like them as models because they represent an attempt to simulate or model some aspect of the real world based on a subset (sometimes a very large one) of information about it.
Output from these systems is so uncanny that it has many people asking philosophical questions about the nature of consciousness—and worrying about the economic impact of generative AI on human jobs. But while all these artificial intelligence creations are undeniably big news, there is arguably less going on beneath the surface than some may assume. We’ll get to some of those big-picture questions in a moment. First, let’s look at what’s going on under the hood of models like ChatGPT and DALL-E.
How does generative AI work?
Generative AI uses machine learning to process a huge amount of visual or textual data, much of which is scraped from the internet, and then determine what things are most likely to appear near other things. Much of the programming work of generative AI goes into creating algorithms that can distinguish the “things” of interest to the AI’s creators—words and sentences in the case of chatbots like ChatGPT, or visual elements for DALL-E. But fundamentally, generative AI creates its output by assessing an enormous corpus of data on which it’s been trained, then responding to prompts with something that falls within the realm of probability as determined by that corpus.
Autocomplete—when your cell phone or Gmail suggests what the remainder of the word or sentence you’re typing might be—is a low-level form of generative AI. Models like ChatGPT and DALL-E just take the idea to significantly more advanced heights.
Training generative AI models
The process by which models are developed to accommodate all this data is called training. A couple of underlying techniques are at play here for different types of models. ChatGPT uses what’s called a transformer (that’s what the T stands for). A transformer derives meaning from long sequences of text to understand how different words or semantic components might be related to one another, then determine how likely they are to occur in proximity to one another. These transformers are run unsupervised on a vast corpus of natural language text in a process called pretraining (that’s the Pin ChatGPT), before being fine-tuned by human beings interacting with the model.
Another technique used to train models is what’s known as a generative adversarial network, or GAN. In this technique, you have two algorithms competing against one another. One is generating text or images based on probabilities derived from a big data set; the other is a discriminative AI, which has been trained by humans to assess whether that output is real or AI-generated. The generative AI repeatedly tries to “trick” the discriminative AI, automatically adapting to favor outcomes that are successful. Once the generative AI consistently “wins” this competition, the discriminative AI gets fine-tuned by humans and the process begins anew.
One of the most important things to keep in mind here is that, while there is human intervention in the training process, most of the learning and adapting happens automatically. So many iterations are required to get the models to the point where they produce interesting results that automation is essential. The process is quite computationally intensive.
Is generative AI sentient?
The mathematics and coding that go into creating and training generative AI models are quite complex, and well beyond the scope of this article. But if you interact with the models that are the end result of this process, the experience can be decidedly uncanny. You can get DALL-E to produce things that look like real works of art. You can have conversations with ChatGPT that feel like a conversation with another human. Have researchers truly created a thinking machine?
Chris Phipps, a former IBM natural language processing lead who worked on Watson AI products, says no. He describes ChatGPT as a “very good prediction machine.”
It’s very good at predicting what humans will find coherent. It’s not always coherent (it mostly is) but that’s not because ChatGPT “understands.” It’s the opposite: humans who consume the output are really good at making any implicit assumption we need in order to make the output make sense.
Phipps, who’s also a comedy performer, draws a comparison to a common improv game called Mind Meld.
Two people each think of a word, then say it aloud simultaneously—you might say “boot” and I say “tree.” We came up with those words completely independently and at first, they had nothing to do with each other. The next two participants take those two words and try to come up with something they have in common and say that aloud at the same time. The game continues until two participants say the same word.
Maybe two people both say “lumberjack.” It seems like magic, but really it’s that we use our human brains to reason about the input (“boot” and “tree”) and find a connection. We do the work of understanding, not the machine. There’s a lot more of that going on with ChatGPT and DALL-E than people are admitting. ChatGPT can write a story, but we humans do a lot of work to make it make sense.
Testing the limits of computer intelligence
Certain prompts that we can give to these AI models will make Phipps’ point fairly evident. For instance, consider the riddle “What weighs more, a pound of lead or a pound of feathers?” The answer, of course, is that they weigh the same (one pound), even though our instinct or common sense might tell us that the feathers are lighter.
ChatGPT will answer this riddle correctly, and you might assume it does so because it is a coldly logical computer that doesn’t have any “common sense” to trip it up. But that’s not what’s going on under the hood. ChatGPT isn’t logically reasoning out the answer; it’s just generating output based on its predictions of what should follow a question about a pound of feathers and a pound of lead. Since its training set includes a bunch of text explaining the riddle, it assembles a version of that correct answer. But if you ask ChatGPT whether two pounds of feathers are heavier than a pound of lead, it will confidently tell you they weigh the same amount, because that’s still the most likely output to a prompt about feathers and lead, based on its training set. It can be fun to tell the AI that it’s wrong and watch it flounder in response; I got it to apologize to me for its mistake and then suggest that two pounds of feathers weigh four times as much as a pound of lead.