Silicon Investor (SI) -- The First Internet Community

STOCKTALK

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

Strategies & Market Trends : Technical analysis for shorts & longs -- Ignore unavailable to you. Want to Upgrade?

To: Johnny Canuck who wrote (64658)

7/22/2025 3:35:59 AM

From: Johnny Canuck

Respond to of 67710

The ‘hallucinations’ that haunt AI: why chatbots struggle to tell the truth

Tech groups step up efforts to reduce fabricated responses but eliminating them appears impossible

© Alex Wheeler/FT montage

Melissa Heikkilä in London

Publishedan hour ago

3

The world’s leading artificial intelligence groups are stepping up efforts to reduce the number of “hallucinations” in large language models, as they seek to solve one of the big obstacles limiting take-up of the powerful technology.
Google, Amazon, Cohere and Mistral are among those trying to bring down the rate of these fabricated answers by rolling out technical fixes, improving the quality of the data in AI models, and building verification and fact-checking systems across their generative AI products.
The move to reduce these so-called hallucinations is seen as crucial to increase the use of AI tools across industries such as law and health, which require accurate information, and help boost the AI sector’s revenues.
It comes as chatbot errors have already resulted in costly mistakes and litigation. Last year, a tribunal ordered Air Canada to honour a discount that its customer service chatbot had made up, and lawyers who have used AI tools in court documents have faced sanctions after it made up citations.
But AI experts warn that eliminating hallucinations completely from large language models is impossible because of how the systems operate.
“Hallucinations are a very hard problem to fix because of the probabilistic nature of how these models work,” said Amr Awadallah, a former Google executive and founder of Vectara, a generative AI agent start-up. “You will never get them to not hallucinate.”
These errors occur because large language models are designed to predict the next likely word in a sentence based on statistics they have learned from their training data.
These mistakes can look like either factual inaccuracies or the model not aligning with instructions by, for example, summarising events from the wrong year. What data goes into an AI models’ training set matters, because the more often a piece of information appears, the more likely it is that the model will repeat it.

At its simplest, the AI model’s aim is now to predict the next word in a sequence and do this repeatedly until the output is complete.

To do this, the model gives a probability score to each token, which represents the likelihood of it being the next word in the sequence.

And it continues to do this until it is “happy” with the text it has produced.

But this method of predicting the following word in isolation — known as “greedy search” — can introduce problems. Sometimes, while each individual token might be the next best fit, the full phrase can be less relevant.

Transformers, the architecture powering large language models, use a number of approaches to address this problem and enhance the quality of their output. One example is called beam search.

With beam search, the model is able to consider multiple routes and find the best option.

This produces better results, ultimately leading to more coherent, human-like text.

The amount that AI models hallucinate varies significantly. Vectara, which has created a leaderboard tracking these errors, found that some can hallucinate as little as 0.8 per cent of the time while others as much as 29.9 per cent when asked to summarise a document.
The rate of hallucinations initially went up with a new generation of AI models that are able to “reason,” or solve problems step by step. This is likely because they are iterating inside of themselves a lot longer to solve problems in different ways, which leads to a higher probability of making mistakes, said Awadallah. The rate of hallucinations has however gone down, as companies have learned how to build better safeguards for these models, he added.
But its research showed that when AI systems are “grounded” in other sources of information, such as online search, news articles or internal company documents — rather than just relying on their training data — the amount of errors are reduced significantly.

Vectara’s leaderboard evaluates how frequently models hallucinate information not in source material when summarising a document

Model

Google	Gemini-2.0-Flash-001	0.7	99.3	100.0	65.2
Google	Gemini-2.0-Pro-Exp	0.8	99.2	99.7	61.5
OpenAI	o3-mini-high	0.8	99.2	100.0	79.5
Vectara	Mockingbird-2-Echo	0.9	99.1	100.0	74
Google	Gemini-2.5-Pro-Exp-0325	1.1	98.9	95.1	72.9
Google	Gemini-2.0-Flash-Lite-Preview	1.2	98.8	99.5	60.9
OpenAI	GPT-4.5-Preview	1.2	98.8	100.0	77
Zhipu AI	GLM-4-9B-Chat	1.3	98.7	100.0	58.1
Google	Gemini-2.0-Flash-Exp	1.3	98.7	99.9	60
Google	Gemini-2.5-Flash-Preview	1.3	98.7	91.2	71.1



































































































































Source: Vectara

AI groups have been focusing on this “grounded” approach to work out the best methods to help reduce hallucinations as much as possible.
A common technique used by labs is called retrieval-augmented generation (RAG), which searches for information from outside sources, which can be used to fact check claims made by AI systems.
Chatbots from companies such as Cohere, Mistral, Google DeepMind, OpenAI and Anthropic offer citations, which show users the sources they used to base their generated texts on.
In January, French AI start-up Mistral struck a multimillion-euro deal with Agence France-Presse to incorporate thousands of the newswire’s articles into its chatbot to boost its fact checking.
Both Mistral and Canadian AI group Cohere also allow their models to be plugged into the internal data sources of their customers, which allows them to refer to internal documents for fact checking. That way, the model is grounded in the context and information that customers want it to process.
“If you want the model to not hallucinate and to be very precise?.?.?.?the best way is to plug it to the internal databases,” said Alexandre Sablayrolles, who leads work on agents and RAG at Mistral.
Byron Cook, vice-president and scientist at Amazon Web Services, believes that applying mathematical logic and reasoning can also help minimise hallucinations.
This led the group to introduce a new safeguard in December, called automated reasoning checks, that works as an additional test by its models to validate the accuracy of its responses.
Some companies, such as Google DeepMind, Cohere, Mistral and Vectara, also use smaller “evaluator” language models specifically trained to check the outputs of another language model for errors.
Most models have the option for developers to toggle the so-called temperature or how randomly the model should generate the next likely word.
But even the most rigorous technical fixes will not ensure complete truthfulness, said Nick Frosst, co-founder of Cohere.
“It’s not like we can train a model to only say true things because what is true is dependent on what is going on in the world,” said Frosst. “And that’s changing and it’s dependent on your own point of view in some situations.”
Allowing AI models to search the internet for information also makes them susceptible to an attack called prompt injection, said Vectara’s Awadallah.

Recommended

News in-depth Artificial intelligence
The struggle to get inside how AI models really work

This is where a third party can insert false information into places where large language models check their information, such as websites, Reddit or Wikipedia. This information can then make the AI model present false information as facts, or disregard its guard rails and “misbehave”.
When Google launched its new AI search tool last year, it told users to put glue on pizza and eat rocks after someone had jokingly posted that on Reddit.
Part of the challenge for AI developers is reaching a balance between verifying information for accuracy and enabling the model to be “creative”.
“Building creativity into models makes them more useful but then it can also lead to more creative, rather than factual, answers,” said Google DeepMind.
AWS’s Cook noted that hallucinations could also sometimes be “desired” by users.
“If you’re trying to write poetry with them, then that’s actually pretty good. You are looking for weird answers that find weird connections between things,” said Cook. “And that’s what the transformer models are really good at.”
Cohere’s Frosst warned that even the term hallucination was misleading. “It sounds too much like [the model] working the way a human brain works, and it does not,” he said. “It works in a completely different way.”

Copyright The Financial Times Limited 2025. All rights reserved.
Reuse this content (opens in new window) CommentsJump to comments section

Latest on Artificial intelligence