Technology & AI

GPT-5.4 Unleashed: AI Masters Computer Use, Outperforms Humans

by John Digweed · 2 months ago · 6 mins read · 0 Views

GPT-5.4 Unleashed: AI Masters Computer Use, Outperforms Humans

GPT-5.4 Arrives with Native Computer Control, Shocks Industry

The artificial intelligence landscape is buzzing with monumental news following the release of GPT-5.4, a new flagship model from OpenAI that appears to be rewriting the rules of AI capability. Early reports and benchmarks indicate a significant leap forward, particularly in its ability to understand and interact with computer systems, a feature described as natively built-in, a first for general-purpose models.

The implications of this development are profound, with industry observers noting that AI progress is accelerating at an unprecedented pace. As one commentator put it, “we see no wall,” suggesting that the limits of AI’s current trajectory are not yet apparent, especially in economically valuable tasks. This rapid advancement comes alongside other significant industry shifts, including a controversial designation for Anthropic’s Claude AI and new research on AI’s impact on the labor market.

GPT-5.4 Tackles Complex Tasks, Outperforming Human Experts

One of the most striking revelations about GPT-5.4 is its performance on the GDP Val benchmark. This rigorous evaluation assesses AI models against human experts across various industries, using a rubric developed by professionals with an average of 12 to 14 years of experience. These experts, often from top-tier companies like Deloitte, Wells Fargo, and Google, create grading systems for completed projects representative of real-world tasks.

GPT-5.4 Pro has demonstrated an impressive 82% win or tie rate against human experts in this benchmark. More remarkably, its win rate—where it is judged superior to human work—stands at around 70%. This level of performance suggests that GPT-5.4 is not merely assisting in complex tasks but is capable of exceeding the output of experienced human professionals in specific domains.

Revolutionary Computer Interaction: AI Navigates Desktops

Perhaps the most groundbreaking feature of GPT-5.4 is its native computer use and vision capabilities. This allows the AI to act as an agent, capable of completing tasks across websites and software systems. Developers can leverage this to build agents that can execute actions, write code for browser automation using libraries like Playwright, and even issue mouse and keyboard commands in response to visual input from screenshots.

The OS World benchmark, which tests a model’s ability to navigate desktop environments via screenshots and simulated input, highlights GPT-5.4’s prowess. It achieved a state-of-the-art 75% success rate, a significant jump from GPT-5.2’s 47%.

Astonishingly, GPT-5.4 surpassed human performance on this benchmark, which currently sits at 72.4%. This capability opens up new avenues for AI-driven troubleshooting, game testing, and automated software development, areas that have historically been challenging for AI.

Early demonstrations showcase this power. One developer, Cory Ching, reportedly used GPT-5.4 and other AI tools to build a tactical turn-based RPG, employing Playwright for testing and image generation for visuals. This marks a new era where AI can not only generate code but also visually inspect and interact with the output, iterating and improving without constant human intervention—a significant departure from previous models that often struggled with even basic visual feedback loops.

Anthropic Faces Scrutiny, Labeled Supply Chain Risk

In parallel to OpenAI’s advancements, AI company Anthropic has faced a significant setback. The company has been officially designated as a supply chain risk, a move that has raised concerns within the AI community. Anthropic has stated its intention to challenge this designation in court.

The company clarified that the designation primarily impacts the use of its Claude AI by customers as a direct part of contracts with the Department of War, rather than all customer use under such contracts. While this limits the scope of the impact, Anthropic is now engaged in renewed negotiations with the government. This development highlights the complex regulatory and geopolitical considerations surrounding advanced AI technologies.

AI’s Evolving Impact on the Labor Market

Anthropic also released new research titled “Labor Market Impacts of AI: New Measures and Early Evidence.” The report suggests that while widespread, immediate impacts on the labor market are not yet evident, there is a noticeable slowdown in hiring for individuals in the early stages of their careers. Those just graduating college and entering the workforce appear to be disproportionately affected, with job growth slowing in these entry-level positions.

This aligns with previous research, including a notable Stanford paper that utilized Anthropic’s data. The current level of workplace automation is considered a small fraction of what is technically possible, indicating that the full impact of AI on employment is likely yet to be seen.

OpenAI Enhances Offerings with Financial Tools and Priority Mode

OpenAI appears to be adopting strategies pioneered by Anthropic, introducing features like “skills” and tools that facilitate migration from Claude to OpenAI platforms. OpenAI has launched a suite of financial service tools, mirroring Anthropic’s move into specialized industry applications.

OpenAI has identified the financial sector as a key area for AI integration, with a spokesperson stating that finance, after software engineering, is expected to benefit most acutely from AI advancements. This includes tools for financial modeling, scenario analysis, data extraction, and long-form research, tasks that can consume hours or days for human analysts. GPT-5.4 reportedly scores 87% on an internal investment banking benchmark, significantly outperforming previous models like GPT-5.2 Pro (71%) and Opus 4.6 (64%).

Additional features in the GPT-5.4 release include a “priority mode” for faster responses, potentially leveraging advanced hardware like Cerebrus chips, and the ability to interrupt model generation mid-stream for redirection. This level of user control and responsiveness signals a more integrated and interactive AI experience.

Talent Movement: Key OpenAI Researcher Joins Anthropic

In a significant talent shift, Max Schwarzer, a prominent researcher from OpenAI who worked on GPT-5 and reasoning paradigms, has joined Anthropic. Schwarzer cited a desire to work with trusted colleagues who have moved to Anthropic over the past couple of years, including Sam Altman, who has also reportedly moved. This move highlights the dynamic and competitive nature of AI talent acquisition.

Looking Ahead: A New Era of AI Interaction

The release of GPT-5.4 marks a key moment, signifying a potential paradigm shift in how humans interact with and utilize AI. Its native computer interaction capabilities, coupled with enhanced performance on complex benchmarks, suggest that AI is moving beyond information processing to active participation in digital environments. While challenges remain, including regulatory hurdles and the ongoing impact on the workforce, the pace of innovation suggests that the “wall” of AI progress is still a distant horizon.

Source: GPT 5.4 "we see no wall" (YouTube)

Leave a Reply Cancel reply

Written by

John Digweed

3,110 articles

Life-long learner.