OpenAI paper warns of ongoing AI hallucination issues

OpenAI has released a new research paper examining why large language models, including GPT-5 and chatbots such as ChatGPT, continue to produce hallucinations—false but plausible statements—and whether this issue can be reduced.

According to TechCrunch, in a blog post summarising the findings, OpenAI describes hallucinations as “plausible but false statements generated by language models” and acknowledges that, despite improvements, they “remain a fundamental challenge for all large language models”—a challenge that is unlikely to be fully resolved.

To illustrate the problem, the researchers tested a popular chatbot by asking for the title of Adam Tauman Kalai’s Ph.D. dissertation. The chatbot provided three different responses, all incorrect. When asked for his date of birth, it again gave three differing answers, none accurate. Kalai is one of the authors of the paper.

According to the researchers, part of the issue stems from the pretraining process. During this phase, models learn to predict the next word in a sequence, without being shown whether statements are true or false. As the paper explains: “The model sees only positive examples of fluent language and must approximate the overall distribution.”

It adds: “Spelling and parentheses follow consistent patterns, so errors there disappear with scale. But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations.”

Rather than focusing on pretraining, the paper directs attention to how these models are evaluated. It argues that the evaluations themselves do not cause hallucinations but create misleading incentives.

The researchers liken this to multiple-choice tests, where guessing may yield a correct answer by chance, whereas leaving a question blank guarantees no credit. “In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say ‘I don’t know’,” the paper states.

To address this, the authors propose an evaluation method similar to certain standardised tests, where wrong answers are penalised and uncertainty is treated more favourably. According to the paper, evaluations should “penalise confident errors more than [they] penalise uncertainty, and give partial credit for appropriate expressions of uncertainty.”

They stress that minor adjustments are insufficient. Rather than introducing “a few new uncertainty-aware tests on the side,” the researchers argue that “the widely used, accuracy-based evals need to be updated so that their scoring discourages guessing.”

Related Posts

Leave a Reply Cancel reply