AI Models Are Hallucinating More (and It's Not Clear Why) ...Middle East

Live Hacker - News
AI Models Are Hallucinating More (and Its Not Clear Why)

Hallucinations have always been an issue for generative AI models: The same structure that enables them to be creative and produce text and images also makes them prone to making stuff up. And the hallucination problem isn't getting better as AI models progress—in fact, it's getting worse.

In a new technical report from OpenAI (via The New York Times), the company details how its latest o3 and o4-mini models hallucinate 51 percent and 79 percent, respectively, on an AI benchmark known as SimpleQA. For the earlier o1 model, the SimpleQA hallucination rate stands at 44 percent.

    Those are surprisingly high figures, and heading in the wrong direction. These models are known as reasoning models because they think through their answers and deliver them more slowly. Clearly, based on OpenAI's own testing, this mulling over of responses is leaving more room for mistakes and inaccuracies to be introduced.

    False facts are by no means limited to OpenAI and ChatGPT. For example, it didn't take me long when testing Google's AI Overview search feature to get it to make a mistake, and AI's inability to properly pull out information from the web has been well-documented. Recently, a support bot for AI coding app Cursor announced a policy change that hadn't actually been made.

    But you won't find many mentions of these hallucinations in the announcements AI companies make about their latest and greatest products. Together with energy use and copyright infringement, hallucinations are something that the big names in AI would rather not talk about.

    Anecdotally, I haven't noticed too many inaccuracies when using AI search and bots—the error rate is certainly nowhere near 79 percent, though mistakes are made. However, it looks like this is a problem that might never go away, particularly as the teams working on these AI models don't fully understand why hallucinations happen.

    In tests run by AI platform developer Vectera, the results are much better, though not perfect: Here, many models are showing hallucination rates of one to three percent. OpenAI's o3 model stands at 6.8 percent, with the newer (and smaller) o4-mini at 4.6 percent. That's more in line with my experience interacting with these tools, but even a very low number of hallucinations can mean a big problem—especially as we transfer more and more tasks and responsibilities to these AI systems.

    Finding the causes of hallucinations

    ChatGPT knows not to put glue on pizza, at least. Credit: Lifehacker

    No one really knows how to fix hallucinations, or fully identify their causes: These models aren't built to follow rules set by their programmers, but to choose their own way of working and responding. Vectara chief executive Amr Awadallah told the New York Times that AI models will "always hallucinate," and that these problems will "never go away."

    University of Washington professor Hannaneh Hajishirzi, who is working on ways to reverse engineer answers from AI, told the NYT that "we still don't know how these models work exactly." Just like troubleshooting a problem with your car or your PC, you need to know what's gone wrong to do something about it.

    According to researcher Neil Chowdhury, from AI analysis lab Transluce, the way reasoning models are built may be making the problem worse. "Our hypothesis is that the kind of reinforcement learning used for o-series models may amplify issues that are usually mitigated (but not fully erased) by standard post-training pipelines," he told TechCrunch.

    In OpenAI's own performance report, meanwhile, the issue of "less world knowledge" is mentioned, while it's also noted that the o3 model tends to make more claims than its predecessor—which then leads to more hallucinations. Ultimately, though, "more research is needed to understand the cause of these results," according to OpenAI.

    And there are plenty of people undertaking that research. For example, Oxford University academics have published a method for detecting the probability of hallucinations by measuring the variation between multiple AI outputs. However, this costs more in terms of time and processing power, and doesn't really solve the issue of hallucinations—it just tells you when they're more likely.

    While letting AI models check their facts on the web can help in certain situations, they're not particularly good at this either. They lack (and will never have) simple human common sense that says glue shouldn't be put on a pizza or that $410 for a Starbucks coffee is clearly a mistake.

    What's definite is that AI bots can't be trusted all of the time, despite their confident tone—whether they're giving you news summaries, legal advice, or interview transcripts. That's important to remember as these AI models show up more and more in our personal and work lives, and it's a good idea to limit AI to use cases where hallucinations matter less.

    Disclosure: Lifehacker’s parent company, Ziff Davis, filed a lawsuit against OpenAI in April, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.

    Read More Details
    Finally We wish PressBee provided you with enough information of ( AI Models Are Hallucinating More (and It's Not Clear Why) )

    Also on site :