The GPT-4.5 Preview model, codenamed Orion, is OpenAI’s largest and most advanced general-purpose large language model (LLM) to date, released as a research preview on February 27, 2025. Built on the GPT architecture, it emphasizes scaling unsupervised learning, resulting in a model with a broader knowledge base, enhanced conversational abilities, and improved emotional intelligence (EQ). GPT-4.5 is designed for natural, human-like interactions, making it a significant step forward in conversational AI.
Capabilities and Strengths: GPT-4.5 excels in tasks requiring creativity, nuanced understanding, and natural conversation. It outperforms its predecessor, GPT-4o, in writing assistance, programming support, and practical problem-solving due to its ability to recognize patterns, draw connections, and follow user intent with greater precision. Early testing shows a 62.5% score on OpenAI’s SimpleQA benchmark, surpassing GPT-4o’s 38.6%, and a reduced hallucination rate (37.1% vs. 59.8% for GPT-4o), making it more reliable for factual queries. Its conversational tone is warmer and more intuitive, with human testers preferring it over GPT-4o for everyday and creative tasks, such as crafting compelling headlines or providing empathetic advice. The model supports file and image uploads, real-time search, and ChatGPT’s canvas tool, enhancing its utility for diverse applications like tutoring, marketing copy, and light coding.
Limitations and Weaknesses: Despite its advancements, GPT-4.5 is not a frontier model and falls short in complex reasoning tasks compared to OpenAI’s o1 and o3 models. It scores lower on math (36.7% on AIME) and science benchmarks than o3-mini, limiting its effectiveness for logic-heavy or technical workflows. The model’s compute-intensive nature makes it expensive, raising questions about its long-term API availability. It also lacks multimodal features like Voice Mode, video, and screen sharing, restricting its versatility. Additionally, its performance on coding benchmarks like SWE-Bench (38%) is decent but trails competitors like Anthropic’s Claude 3.7 Sonnet. GPU shortages have delayed broader access, highlighting scalability challenges.
Conclusion: GPT-4.5 is a conversational powerhouse, ideal for creative and human-facing tasks, but its high cost and weaker reasoning capabilities make it less suited for specialised technical applications. OpenAI’s ongoing evaluation will shape its future role.