IT Story

Is GPT-5 really a more advanced LLM for everyday users?

Katherine Grace Luna

12 Aug 2025 • 3 min read

GPT-5: The Gap Between Hype and Reality

GPT-5 has emerged as a significant improvement over GPT-4o across all major benchmarks. In mathematics (AIME 2025), it scored 94.6%, surpassing GPT-4o’s 88% by 6.6 percentage points. In coding performance (SWE-bench Verified), it achieved 74.9%, a 6.9 percentage point improvement over GPT-4o’s 68%.

Most notably, hallucination issues have been dramatically reduced—45% fewer when using web search functionality compared to GPT-4o, and an impressive 80% reduction compared to o3 in reasoning mode. Factual error rates also improved significantly from GPT-4o’s 20.6% to GPT-5’s 4.8%, representing a major leap in reliability.

User Experience Improvements

Convenience Through Model Integration

GPT-5’s biggest change is that users no longer need to choose between models. Previously, users had to select between general models and reasoning models (o-series), but GPT-5 introduces a router system that automatically selects the optimal model based on question complexity.

Enhanced Speed and Accuracy

GPT-5 features a doubled context window of 256K tokens while maintaining faster response times. It provides more accurate responses without losing context, even during long document processing or complex conversations.

The Reality Behind the Marketing Glitz

User reactions have been predominantly disappointing, with a stark gap emerging between OpenAI’s flashy marketing and actual user experiences.
OpenAI heavily promoted GPT-5 as delivering “expert-level intelligence in everyone’s hands,” with major media outlets echoing praise like “having a PhD-level expert by your side at all times” and celebrating its “top-tier 74.9% benchmark score in coding.” CEO Sam Altman even claimed on a pre-release podcast that GPT-5 instantly solved problems he couldn’t, making him feel “powerless compared to AI.”
However, the reality users experienced was quite different. On Reddit’s ChatGPT community, a post titled “GPT-5 is horrible” received the highest upvotes, and Bloomberg reported “mixed reviews” from day one. Even in communities most passionate about AGI and technological singularity discussions, there was significant disappointment that GPT-5 represented incremental improvement rather than revolutionary advancement.
Many users found it difficult to perceive breakthrough differences compared to previous models, with some even pointing to regressions in certain areas. Technical issues like repetitive follow-up questions and malfunctioning reasoning triggers were continuously reported, leading users to conclude that the model failed to live up to its “next-generation” billing. Criticisms included “short and insufficient responses,” “overly AI-like tone,” and “lack of personality,” with some users cynically calling it “shrinkflation for cost reduction.”
Positive evaluations were limited to specific areas like coding, particularly CSS writing, and some improvements in deep research functionality. While some users acknowledged reduced misinformation generation and more honest admissions of uncertainty, these partial improvements were deemed insufficient to meet the expectations raised by the “GPT-5” name.
Most concerning, many users expressed suspicion that the low pricing might actually reflect a lack of confidence in the model’s performance, raising fundamental questions about the company’s technological leadership and suggesting competitors may have already surpassed OpenAI. Sam Altman eventually had to acknowledge on Reddit AMA that GPT-5 didn’t function properly on launch day due to router bugs and promised to restore access to the previous GPT-4o model based on user demands.

GPT-5 is suspected of being OpenAI’s strategy to reduce operating costs by encouraging users to more frequently use GPT-5 (which has relatively lower operational costs) instead of high-performance reasoning models, all under the guise of a “unified model.”

GPT-5: Evolution, Not Revolution

GPT-5’s launch offers important insights for the AI industry. While there were clear improvements in objective performance metrics, it clearly wasn’t the breakthrough “next-generation AI” leap users expected.
This release reveals several key points. First, AI technology development has likely entered a phase of gradual improvement rather than explosive growth. Second, the gap between flashy marketing and actual user experience can actually damage credibility.
Nevertheless, GPT-5’s reduced hallucinations, improved factual accuracy, and automatic router system implementation represent meaningful progress. Particularly, enabling general users to access optimal AI performance without complex configurations is significant advancement for AI democratization.
Ultimately, GPT-5 is better evaluated as an evolution toward “a more stable and practical AI tool” rather than “a giant leap toward AGI.” OpenAI’s ability to quickly incorporate user feedback and focus on substantial improvements rather than over-promising will be crucial for maintaining leadership in the AI market.

AGI is a theoretical AI research field aimed at creating software with human-like intelligence, with the goal of enabling software to perform tasks it wasn’t specifically trained or developed for. Current AI systems like ChatGPT and Claude are called “Narrow AI”—excellent at specific tasks but limited when venturing into other domains. However, AGI systems would be able to solve problems across various domains like humans, without manual intervention, unrestricted by specific ranges, learning independently and solving untrained problems.