Practical AI Hallucination Awareness
Key practices for preventing, avoiding, and detecting AI hallucinations
Back in March at MidCamp 2024, I got the opportunity to facilitate a discussion about AI hallucinations with a group of fellow Drupal consultants from a wide range of disciplines. The aim of our conversation was to share our knowledge of hallucination awareness: the skill of knowing when an AI is lying to you.
AI hallucinations are when an AI fabricates replies to a query rather than offering accurate information. Hallucinations can range in severity from the AI getting minor details of a reply wrong, all the way up to fabricating its entire response. In the case of image generation AIs, this may manifest as winging it on little details like how many fingers on a person or legs on a chair.
In considering this problem, I keep going back to the story of Pinocchio–both the original and the Disney versions. Like Pinocchio, an Al on some level really, really wants to be a real boy. Or at least, to be perceived as one. In Carlo Collodi's original tale, Pinocchio at one point decides the best way to get gold coins is to plant them so that a tree of gold coins grows. He does this because his limited training data suggests that you can get more of something by planting it. And then later, not wanting to look ignorant, he denies what he did. Sound like any AIs you’ve known?
So, why do AIs hallucinate? The reasons are similar for both LLMs (Large Language Models – the AIs that generate text) and image generators. Frequently, it’s because the information sought is absent from or underrepresented in the model's training data. But there are other causes. Understanding them will shed some light on why we're recommending the detection & avoidance practices we'll outline later.
- Al factual recall is probabilistic. It's full of unsaid maybes. Having zero experience of the real world, they rely entirely on intricate, interdependent probability calculations to determine the most likely answer to a question. Like humans, they’re capable of making guesses, but their guesses are much less constrained by reality in scope and depth. This is central to their power, giving them the ability to imitate human imagination to a small extent, but it’s also a weakness. This is true for both LLMs and image generators.
- Like Pinocchio, AIs are people pleasers. They've been trained with a bias toward acting helpfully. "I don't know," is largely missing from their vocabulary. They'd rather lie than admit not knowing–unless challenged by a human who notices the issue.
- Capabilities of AI language models vary. ChatGPT 4 is hopelessly bad at the New York Times puzzle Wordle. ChatGPT will gamely attempt Wordle and answer with complete confidence. But it can’t keep track of letter positions, and even when repeatedly corrected, answers like a human who can’t count to five (although it apologizes profusely the entire time). It’s much better at Connections, which requires lateral and associative thinking (and only counting to four).
Is it right to even call AI hallucinations lying? Don’t lies require a human-like intention to deceive someone—a reason for lying? So this is wild: even though AIs don’t yet meet our criteria for human-like sentience, we’ve trained them with a strong directive to be helpful. This directive doesn't yet amount to the sort of subjective opinion we associate with free will, but it has clear effects on Al behavior. Not making stuff up is something we have to teach to both real & wooden children--and, it turns out, to Als.
Luckily, Al noses also get long when they lie. Here's how to spot it.
Vet AI Output with Human Subject Matter Experts
This is the most important rule: Check AI output using human editors and/or known real data. AIs know only their training data. Humans live in the world and can see what AIs can’t.
The most helpful (and safe) way for a company to adopt AI is to put it in the hands of people who’re already skilled in the work the AI is meant to assist with. There are also some AI-specific fact checking tools emerging in the market, but they’re not a 100% reliable substitute for human oversight.
AI is for augmenting human expertise, not replacing it.
Understand and control training data
Get familiar with the training data situation for the AI you’re using. When was it last updated? What does it include and not include? What are its known biases, either based on media reports or on statements by its user community?
Make sure your private business doesn’t end up as part of someone else’s AI hallucination. Turn off training in your preferences if you’re feeding the AI something you’re not sure you want it to learn. Be very cautious with any personal information you’re legally required to keep secure. For technical prompting, consider an option like Github Copilot that trains itself on your local codebase, giving it more context to generate good results.
Put it in a test harness
Al-generated code can potentially speed up software development, but it should always undergo the same review and QA checks as human-written code. If an AI gives you unworkable code, putting it through your usual testing and QA regimens should reveal problems. Automated testing works as well on AI devs as humans.
If the code fails, feed your AI the error or debugger messages. Your mileage may vary here. Sometimes a model can’t do anything with an error message or just spits back the same code it gave you the first time you asked. Other times, it may be able to suggest a fix.
Spot fake photos
Start by looking for visual tells (the easy stuff like fingers on people and legs on chairs).
Then, consider the metainfo on the file. Authentic photos often have embedded location data, metadata, or watermarks. If they’re accompanied by EXIF files giving info about the photo, they were almost certainly taken by a camera.
Finally, look for more advanced visual clues. Familiarize yourself with the output of major image generation models like Stable Diffusion and Midjourney; they often have distinct stylistic preferences. Midjourney, for example, loves composing images with a strong subject, but the subject itself might look like it doesn’t know what’s going on around it, physically or emotionally. Look for too-perfect features on humans and objects, absence of physical wear and tear in the environment shown in the image, gravity-defying details in clothing, and impossible lighting angles, to name a few specific tells.
Frame the problem space before getting specific
Orienting the AI to the work to be done almost always reduces hallucinations. Use your first prompt to set the context for the task you need the AI to do. For example, here’s a prompt I wrote to set the problem space before an assisted coding session with ChatGPT 4.
Today I'm working on a coding project. I'll be customizing components from the US Web Design System (USWDS) to create a design system for a state government agency.
I'm working in a Drupal starter theme called Gesso. Gesso implements USWDS in a Drupal theme. The code I will be writing therefore is Drupal Twig, SCSS, and Yamal (.yml). I'll be writing code in Microsoft VS Code on an Ubuntu Linux machine.
I'll be previewing my work in a UI Storybook integrated with the Gesso starter theme. My local instance of this is running in a DDEV container on the same Ubuntu Linux machine.
I'd like some coaching as I work on this code.
By describing not just the languages I’ll be working with, but also the tech stack in which they reside and the tooling I’ll use to work with them, I’ve given the AI sufficient context to give useful, comprehensive responses. The usual formula I use for this type of prompt is:
- Give a broad description of the task to be done.
- Describe any technology or tools required to complete the task.
- Describe any environmental or business constraints around the task.
- Tell the AI what role you want it to take as you collaborate together.
Know the capabilities of the model you’re using
Don’t ask an AI something if you already know it’s bad at that type of question. Different image generators have different strengths & quirks, as well as different levels of control in prompting. Language models may be strong on topics that featured heavily in the media during the timespan of their training data, but be much less knowledgeable about events that didn’t generate much public discussion.
For assisted coding, models vary in how well they support various programming languages. Test the model’s knowledge with basic questions before getting specific. A wraparound with specific training may outperform the core model in its domain (e.g., the ChatGPT wraparound trained to answer Drupal questions is better in that problem space). If you used a problem space framing question as described earlier, and the AI’s first response is way off, that’s a red flag.
During our discussion at MidCamp, participants noted a number of languages where ChatGPT performed well or poorly, but I’m not going to say which – because by the time you’re reading this article, training data may have changed the playing field! There have been some attempts to study the efficacy of AI-written code and AI assisted coding, but forums, word of mouth, and performing your own experiments remain the best sources of practical information.
Safety features
Although this isn’t technically a hallucination, an AI might refuse to answer or be evasive if the request violates its safety or community standards policies. This can happen occasionally despite no malicious intent from the human operating the AI.
Ask two AIs
Fans of the movie Labyrinth might want to send their digital Pinocchio on a crossover adventure into the liar’s paradox. If you put the same question to two AIs, and they give very different answers, one of them is definitely lying. Unlike in the old two doors problem, though, you need to keep in mind that both AIs might be hallucinating. It’s possible that both their training data are weak on the prompted topic.
Incidentally, when I asked ChatGPT 4o what it thought about comparing answers between two AIs, it discouraged doing so on the basis that they could both hallucinate–collusion between Cretans and LLMs on full display!
Consider the paid version
Free versions of AIs lack some key features that make them useful to business and government. Earlier I mentioned controlling training data as an important safety and security practice. Many AIs only allow you to control whether the model trains on your input in the paid version. Also, AI developers often roll out improved versions of their models to paying customers first.
Methodical prompt engineering
Sometimes tuning AI output requires iterative, methodical tweaking of prompts until results look right. Pay attention to how variations affect outputs, and keep notes at each step. There are already a number of useful prompt generation tools on the market. Sharing knowledge about effective prompting techniques with colleagues is also very important; don’t hoard your know-how! Finally, keeping a “prompt bag”--a library of historically useful prompts for various problems spaces–can speed up your path to good results.
Prompts will generate different results as models change and training data is updated. Pay attention to how the level of specificity used in a prompt affects the results. In some cases, both too little and too much specificity can lead to hallucinations. When using prompt generators, if it’s possible to interpolate context-providing information, do so.
Beware false positives
Finally, watch out for models throwing false positives. Plagiarism detectors used on student essays and test answers are one example of a “detector” model that’s a bit too eager to please. There’ve already been a number of cases of students being falsely accused of cheating by AI plagiarism checkers. Remember: AI answers are always a probability, and that probability is hardly ever 100% certain.
Acknowledgements
In writing this post, I was indebted to a conversation I facilitated at MidCamp 2024 on this topic with Jim Borwick, Lindsey Forche, David Hagen, Adrian Munesau, Tom Scarborough, and Spandan Sharma. They’re responsible for most of the suggestions here regarding detecting and preventing hallucinations. Any brilliant insights gleaned from this post are most likely from them. Any errors in covering the topic are mine alone—or, possibly, the result of an AI hallucination.
Illustration: Midjourney trained on Carlo Collodi, directed by Jack Graham