TwitterFacebookInstagramPinterestYouTubeTumblrRedditWhatsAppThreads

AI Accuracy Concerns: Top Chatbots Produce False Information 15% of the Time

AI Accuracy Concerns: Top Chatbots Produce False Information 15% of the Time

According to a December 2025 report on AI chatbots’ role in everyday jobs, ChatGPT ranks as the least reliable tool for work tasks. The study by casino games aggregator Relum examined 10 major chatbots to reveal how they perform in real workplace situations.

  • ChatGPT scores worst for workplace reliability, making up incorrect information 35% of the time despite having the highest market share at 81%.
  • Google Gemini has the highest hallucination rate at 38%, giving false information more often than any chatbot tested.
  • Grok and DeepSeek are the most reliable options for work, recording almost no downtime.

The research evaluated each chatbot based on four main factors: how often they make up false information (hallucination rate), what customers think of them (product ratings), how well they maintain quality across different tasks (response consistency), and how often the service goes down (downtime rate). Each chatbot received a reliability risk score from 0 to 99, with higher scores showing bigger problems. The hallucination rate mattered most in the scoring, counting for 50% of the total.

Here’s a look at the 10 chatbots ranked by their reliability risk scores:

ChatbotHallucination Rate (%)Product Rating According to Customers (0-5)Quality and Response Consistency (0-5)Downtime Rate (%)Reliability Risk Score
ChatGPT354.74.00.8199
Claude174.43.50.7775
Meta AI153.43.40.1070
Botpress154.54.50.3741
Perplexity AI134.63.50.1831
Google Gemini384.44.00.0513
Microsoft Copilot274.44.00.1011
Grok84.53.50.076
DeepSeek144.73.50.004
Kimi134.54.30.063

You can access the complete research findings here.

1. ChatGPT

  • Hallucination rate: 35%
  • Product rating: 4.7 out of 5
  • Quality and response consistency: 4 out of 5
  • Downtime rate: 0.81%

ChatGPT is the least reliable chatbot for work tasks. Despite having the highest customer rating at 4.7 and controlling 81% of the market, ChatGPT makes up false information in more than one-third of its responses. The chatbot also experiences more downtime than most competitors, going offline 0.81% of the time over the past two months.

2. Claude

Claude is the second most unreliable chatbot. It hallucinates 17% of the time, less than half ChatGPT’s rate, but still gets things wrong often enough to potentially cause problems at work. Claude holds only 0.99% of the market and scores 4.4 for customer satisfaction. The service goes down 0.77% of the time, nearly matching ChatGPT’s downtime issues.

LambdaTest Rebrands to TestMu AI, the Agentic AI platform for Quality Engineering

3. Meta AI

While Meta AI’s hallucination rate sits at 15%, the chatbot struggles with overall performance quality. Customers give it just 3.4 out of 5 stars, the lowest rating among major chatbots tested. Response consistency is also low (3.4 score), meaning users get answers of varying quality depending on what they ask. Meta AI still holds a 20% market share and rarely goes down.

4. Botpress

Botpress is next on the list. The chatbot hallucinates 15% of the time and maintains a 4.5 rating for both customer satisfaction and response consistency. Botpress also charges $89 per month, making it the priciest option among business chatbots. The service experiences 0.37% downtime, which adds up to several hours of unavailability each month.

5. Perplexity AI

Perplexity AI ranks as a relatively more reliable option among the five. The chatbot has a 13% hallucination rate and earns a 4.6 customer rating. Response quality sits at 3.5 out of 5, showing room for consistency improvements. Perplexity AI takes up more than 10% of market share despite charging $40 monthly, double what ChatGPT costs. The service rarely experiences problems, showing only 0.18% downtime during the two-month testing period.

Razvan-Lucian Haiduc, Chief Product Officer at Relum, commented on the study.

“About 65% of US companies now use AI chatbots in their daily work, and nearly 45% of employees admit they’ve shared sensitive company information with these tools. These numbers show well how important chatbots have become in everyday work. Dependence on AI tools will likely increase even more, so companies should choose their chatbots based on how reliable and fit they are for their specific business needs. A chatbot that everyone uses isn’t necessarily the one that works best for your industry or gives accurate answers for your tasks.”

VoM News Desk
VoM News Desk

VoM News is an online web portal in jammu Kashmir offers regional, National & global news.

Scroll to Top