AI Models Are Hallucinating More (and It's Not Clear Why) ...Middle East

In a new technical report from OpenAI (via The New York Times), the company details how its latest o3 and o4-mini models hallucinate 51 percent and 79 percent, respectively, on an AI benchmark known as SimpleQA. For the earlier o1 model, the SimpleQA hallucination rate stands at 44 percent.

False facts are by no means limited to OpenAI and ChatGPT. For example, it didn't take me long when testing Google's AI Overview search feature to get it to make a mistake, and AI's inability to properly pull out information from the web has been well-documented. Recently, a support bot for AI coding app Cursor announced a policy change that hadn't actually been made.

Anecdotally, I haven't noticed too many inaccuracies when using AI search and bots—the error rate is certainly nowhere near 79 percent, though mistakes are made. However, it looks like this is a problem that might never go away, particularly as the teams working on these AI models don't fully understand why hallucinations happen.

In tests run by AI platform developer Vectera, the results are much better, though not perfect: Here, many models are showing hallucination rates of one to three percent. OpenAI's o3 model stands at 6.8 percent, with the newer (and smaller) o4-mini at 4.6 percent. That's more in line with my experience interacting with these tools, but even a very low number of hallucinations can mean a big problem—especially as we transfer more and more tasks and responsibilities to these AI systems.

Finding the causes of hallucinations

ChatGPT knows not to put glue on pizza, at least. Credit: Lifehacker

University of Washington professor Hannaneh Hajishirzi, who is working on ways to reverse engineer answers from AI, told the NYT that "we still don't know how these models work exactly." Just like troubleshooting a problem with your car or your PC, you need to know what's gone wrong to do something about it.

In OpenAI's own performance report, meanwhile, the issue of "less world knowledge" is mentioned, while it's also noted that the o3 model tends to make more claims than its predecessor—which then leads to more hallucinations. Ultimately, though, "more research is needed to understand the cause of these results," according to OpenAI.

While letting AI models check their facts on the web can help in certain situations, they're not particularly good at this either. They lack (and will never have) simple human common sense that says glue shouldn't be put on a pizza or that $410 for a Starbucks coffee is clearly a mistake.

Disclosure: Lifehacker’s parent company, Ziff Davis, filed a lawsuit against OpenAI in April, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.

Read More Details
Finally We wish PressBee provided you with enough information of ( AI Models Are Hallucinating More (and It's Not Clear Why) )

For more, read the news from the source

Also on site :

AI Models Are Hallucinating More (and It's Not Clear Why) ...Middle East

Finding the causes of hallucinations

'Bachelorette' Star Hannah Brown Reveals Rare & Shocking Health Condition

Trump threatens tariffs on foreign films: Who could be hit?

'Polyfamily' Sneak Peek: Baby Quentin's Arrival Sparks Bold Confession From Tyler (Exclusive Clip)

Walmart Has an 'Awesome' $420 Steel Storage Shed for Just $220, and Shoppers Say 'You Can't Go Wrong with this One'

9 Biggest Summer 2025 Beauty Trends, According to Celebrity Makeup Artists

Sole survivor of deadly mushroom lunch testifies at Australian murder trial

Political chaos as German coalition fails to elect chancellor

Biden administration’s Gaza stance

Hama countryside: Families sell lands to build and renovate homes

Toy giant issues warning over price increase thanks to Trump’s tariffs

Israeli plan to seize Gaza alarms many: ‘What’s left for you to bomb?’

Canada trade balance Canada trade balance for March (C$) -0.51B vs -1.56B estimate

Kelly Clarkson Turns Heads With Slim Figure in Skintight Catsuit on 'Today'

Rihanna, A$AP Rocky & More of the 2025 Met Gala’s Best-Dressed Couples

Diddy's Hair Appears Fully Gray in Courtroom Sketches From Federal Trial

Anne Robinson’s You Be the Judge turns true crime into a queasy gameshow

Selling in the UK as an overseas business

Carney tells Trump that Canada 'won't be for sale, ever'

Chatham County Roundup: Chair of the County Commissioners Karen Howard from Washington, D.C.

Call The Midwife film and Second World War prequel series announced

‘The Studio’ Renewed For Season 2 At Apple

Migration alert issued as thousands of birds to fly over Illinois overnight

A$AP Rocky Gets Criminal Case Record Sealed After Acquittal

Reddit will tighten verification to keep out human-like AI bots

Residents recount damages from flooding in Livingston County

INVESTOR ALERT: Pomerantz Law Firm Investigates Claims On Behalf of Investors of The Greenbrier Companies, Inc. - GBX

The Studio Renewed for Season 2 at Apple TV+