Advancements in AI: A Closer Look at GPT-4.5

Contrary to reasoning models such as o1 and o3, which systematically work through answers, most large language models (LLMs) including GPT-4.5 generate their first response almost instantaneously. Positioned as a more versatile AI, GPT-4.5 has undergone rigorous testing across various domains, showcasing performance that surpasses previous models.

Performance Metrics of GPT-4.5

GPT-4.5 has demonstrated its capabilities through its performance on SimpleQA, a general-knowledge quiz designed by OpenAI. This assessment covered a wide range of topics, from science and technology to entertainment and gaming. The results indicated that GPT-4.5 achieved a score of 62.5%, significantly higher than GPT-4o, which scored 38.6%, and o3-mini, which managed only 15%. This showcases GPT-4.5’s proficiency in processing and understanding diverse subject matter.

In addition to overall performance, GPT-4.5 has shown a marked improvement in accuracy, particularly regarding false information generation, commonly referred to as “hallucinations.” During the SimpleQA test, the model produced incorrect answers 37.1% of the time, a notable reduction when compared to its predecessor GPT-4o, which exhibited a 59.8% rate of erroneous responses, and o3-mini at 80.3%. This advancement points to GPT-4.5’s enhanced reliability in providing accurate information.

Beyond SimpleQA: The Evaluation Landscape

However, SimpleQA is merely one of several benchmarks used to evaluate large language models. Other assessments, such as the MMLU, are more widely recognized in the AI community. On these tests, GPT-4.5 outperformed previous OpenAI models, albeit by a smaller margin. It’s also worth noting that when faced with standard science and mathematics benchmarks, GPT-4.5 didn’t perform as well as o3-mini, which raises questions about its overall versatility across different domains.

The Conversational Advantage of GPT-4.5

One of the standout features of GPT-4.5 is its improved conversational ability. Users who were involved in testing reported a preference for GPT-4.5 over GPT-4o in a variety of scenarios, including everyday questions and creative tasks like poetry writing. This enhanced conversational skill allows the model to engage more effectively with users. For instance, if someone confides in GPT-4.5 about their challenges, the model may respond with empathy, suggesting, “Would you like to discuss what happened, or do you need a distraction? I’m here for either choice.”

In comparison, GPT-4o often struggles with emotional cues, tending to provide unsolicited advice or solutions. For example, it might produce a list of suggestions aimed at alleviating the user’s distress, rather than first establishing a rapport or understanding the user’s needs. This improvement in emotional intelligence makes GPT-4.5 a more appealing choice for users seeking both support and problem-solving interaction.

The Verdict: Is GPT-4.5 a True Revolution?

Despite its advancements, OpenAI faces challenges from a discerning audience that is continually seeking more innovative solutions. Waseem Alshikh, co-founder and CTO of a startup focusing on developing LLMs for businesses, expressed skepticism regarding the incremental changes in GPT-4.5. He commented, “The emphasis on emotional intelligence and creativity is intriguing for specific applications, such as writing assistance and idea generation; however, ultimately, GPT-4.5 appears to be just improved aesthetics on a familiar framework.”

Alshikh further criticized the trend of increasing computational power and data consumption without fundamentally altering the user experience. He argued that the marginal improvements offered by GPT-4.5 may not justify the substantial energy demands. Instead, he advocates for a shift towards creating more efficient models or addressing niche problems, suggesting that this direction could deliver tangible benefits to users rather than simply stretching existing capabilities.