In a new technical report from OpenAI (via The New York Times), the company details how its latest o3 and o4-mini models hallucinate 51 percent and 79 percent, respectively, on an AI benchmark known as SimpleQA. For the earlier o1 model, the SimpleQA hallucination rate stands at 44 percent.
False facts are by no means limited to OpenAI and ChatGPT. For example, it didn't take me long when testing Google's AI Overview search feature to get it to make a mistake, and AI's inability to properly pull out information from the web has been well-documented. Recently, a support bot for AI coding app Cursor announced a policy change that hadn't actually been made.
Anecdotally, I haven't noticed too many inaccuracies when using AI search and bots—the error rate is certainly nowhere near 79 percent, though mistakes are made. However, it looks like this is a problem that might never go away, particularly as the teams working on these AI models don't fully understand why hallucinations happen.
In tests run by AI platform developer Vectera, the results are much better, though not perfect: Here, many models are showing hallucination rates of one to three percent. OpenAI's o3 model stands at 6.8 percent, with the newer (and smaller) o4-mini at 4.6 percent. That's more in line with my experience interacting with these tools, but even a very low number of hallucinations can mean a big problem—especially as we transfer more and more tasks and responsibilities to these AI systems.
Finding the causes of hallucinations
ChatGPT knows not to put glue on pizza, at least. Credit: LifehackerUniversity of Washington professor Hannaneh Hajishirzi, who is working on ways to reverse engineer answers from AI, told the NYT that "we still don't know how these models work exactly." Just like troubleshooting a problem with your car or your PC, you need to know what's gone wrong to do something about it.
In OpenAI's own performance report, meanwhile, the issue of "less world knowledge" is mentioned, while it's also noted that the o3 model tends to make more claims than its predecessor—which then leads to more hallucinations. Ultimately, though, "more research is needed to understand the cause of these results," according to OpenAI.
While letting AI models check their facts on the web can help in certain situations, they're not particularly good at this either. They lack (and will never have) simple human common sense that says glue shouldn't be put on a pizza or that $410 for a Starbucks coffee is clearly a mistake.
Disclosure: Lifehacker’s parent company, Ziff Davis, filed a lawsuit against OpenAI in April, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.
Read More Details
Finally We wish PressBee provided you with enough information of ( AI Models Are Hallucinating More (and It's Not Clear Why) )
Also on site :