devxlogo

AI industry battles rising hallucination rates

AI hallucinations
AI hallucinations

The newest and most powerful AI technologies from companies like OpenAI, Google, and the Chinese start-up DeepSeek generate more errors, not fewer. Although their math skills have markedly improved, their accuracy with factual information has become more unreliable. It is not entirely clear why this is happening.

According to internal tests, OpenAI’s newest o3 and o4-mini models have hallucinated 30-50% of the time. Testing of OpenAI’s o3 and o4-mini reasoning models revealed concerning results. The O3 model hallucinated 33% of the time during PersonQA tests, which involve asking questions about public figures.

https://x.com/natalieben/status/1919723316513763631

In the SimpleQA tests, which focus on short fact-based questions, the hallucination rate was 51%. The o4-mini model performed worse, with a 41% hallucination rate during PersonQA tests and 79% in SimpleQA tests.

https://x.com/realDanWagner/status/1919473701541552253

Chinese company DeepSeek’s R1 reasoning model hallucinates more than its traditional models, according to tests by the AI research firm Vectara.

These models can hallucinate at each step throughout their advanced “thinking” processes, increasing chances for incorrect responses. Some suggest that the training behind these reasoning models may be the root cause. AI models are trained on large datasets and respond to queries with the most statistically likely answer.

Addressing AI hallucination challenges

When asked questions outside their training data, they can create incorrect information. Incomplete or biased data sets and flaws in training further exacerbate this issue.

OpenAI’s o3 model, for instance, is designed to maximize the chance of giving an answer, which increases the likelihood of producing incorrect responses instead of admitting it doesn’t know an answer. OpenAI has acknowledged their models’ hallucination rates in research papers and stated that more research is needed to address these issues. OpenAI’s CEO has even suggested that hallucinations add value to AI systems, though this perspective is widely debated.

See also  AI’s Ad War Is Good For Users

Companies are actively working on potential fixes. Microsoft and Google have released products aimed at flagging potentially incorrect information in AI responses, though experts doubt these will fully eliminate hallucinations. Many researchers consider stopping AI bots from hallucinating impossible, but efforts are ongoing to reduce these rates.

Some propose teaching AI models to express uncertainty, avoiding the creation of falsehoods. Others employ “retrieval-augmented generation,” a technique where AI retrieves relevant documents to reference before generating an answer. Not all researchers agree on terminology; some criticize the term “hallucination,” arguing that it anthropomorphizes AI models unnecessarily, attributing intent and consciousness they do not possess.

Understanding and addressing AI hallucinations remains a critical challenge as these systems become more integrated into everyday life. Until these issues are resolved, users should cautiously approach AI-generated responses, double-checking critical information.

Image Credits: Photo by David Pupăză on Unsplash

Johannah Lopez is a versatile professional who seamlessly navigates two worlds. By day, she excels as a SaaS freelance writer, crafting informative and persuasive content for tech companies. By night, she showcases her vibrant personality and customer service skills as a part-time bartender. Johannah's ability to blend her writing expertise with her social finesse makes her a well-rounded and engaging storyteller in any setting.

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.