OpenAI just dropped GPT-5.4, and it’s not a minor update. This is the first general-purpose AI model with native computer use capabilities that actually beats human performance on desktop tasks. It also packs a 1 million token context window, a new tool search system that cuts API costs by 47%, and benchmark scores that put serious pressure on Claude and Gemini.
We dug into the features, tested the claims, and compared it head-to-head with the competition. Here’s everything you need to know about GPT-5.4 in 2026.
What Is GPT-5.4?
GPT-5.4 is OpenAI’s latest frontier AI model, released on March 5, 2026. It’s available across three platforms simultaneously: ChatGPT (as GPT-5.4 Thinking), the OpenAI API, and Codex.
In plain terms, GPT-5.4 combines the best of OpenAI’s recent advances in reasoning, coding, and agentic workflows into one model. It takes the coding chops from GPT-5.3-Codex, improves how the model handles professional tasks like spreadsheets and presentations, and adds a major new capability: it can control your computer.
OpenAI also released a premium tier called GPT-5.4 Pro for users who need maximum performance on the hardest problems.
Key specs at a glance:
- Context window: up to 1 million tokens (standard 272K, extended 1M)
- Max output: 128K tokens
- Native computer use (API and Codex)
- Tool search for efficient tool selection
- Available to Plus, Team, and Pro ChatGPT users
Key Features That Matter
1. Native Computer Use
This is the big one. GPT-5.4 is the first general-purpose model OpenAI has shipped with built-in computer use. It can look at screenshots, figure out what’s on screen, and return structured actions — clicks, typing, scrolling — for an agent harness to execute.
This isn’t theoretical. On the OSWorld benchmark, which tests whether AI can actually navigate operating systems and complete real desktop tasks, GPT-5.4 scored 75.0%. Human experts score 72.4%. That makes it the first AI model to beat human performance on this benchmark.
2. 1 Million Token Context Window
GPT-5.4 supports up to 1.05 million tokens of context. That’s roughly 750,000 words — enough to feed it entire codebases, legal document sets, or multi-year financial reports in a single call.
There’s a catch: the 1M context is an experimental feature you enable explicitly through API parameters. Without configuration, you get the standard 272K window. And once your prompt crosses 272K tokens, the input token rate doubles from $2.50 to $5.00 per million.
3. Tool Search
When you’re building agents that work with dozens of tools, GPT-5.4’s tool search feature is a game-changer. Instead of stuffing every tool definition into the prompt (burning tokens), the model intelligently searches and selects the right tools on the fly.
The result: 47% reduction in token costs for tool-heavy workflows with zero loss in accuracy.
4. GPT-5.4 Thinking (ChatGPT)
In ChatGPT, the model shows you its plan before executing. You can see what it intends to do and redirect it mid-response. This is particularly useful for complex multi-step tasks where you want to course-correct early rather than wait for a wrong answer.
5. Improved Professional Work
OpenAI put serious effort into document handling:
- Spreadsheet modeling tasks: 87.3% accuracy (up from 68.4% with GPT-5.2)
- Presentations: Human raters preferred GPT-5.4 output 68% of the time over GPT-5.2
- Legal analysis on BigLaw Bench: 91% score
Benchmark Breakdown
Here’s how GPT-5.4 performs across the benchmarks that matter:
| Benchmark | GPT-5.4 | GPT-5.3-Codex | GPT-5.2 | What It Tests |
|---|---|---|---|---|
| GDPval (wins or ties) | 83.0% | 70.9% | 70.9% | Professional knowledge work across 44 occupations |
| SWE-Bench Pro (Public) | 57.7% | 56.8% | 55.6% | Real-world software engineering tasks |
| OSWorld-Verified | 75.0% | 74.0%* | 47.3% | Desktop computer use tasks |
| Toolathlon | 54.6% | 51.9% | 46.3% | Multi-tool agent performance |
| BrowseComp | 82.7% | 77.3% | 65.8% | Web browsing comprehension |
| ARC-AGI-2 | 73.3% | — | 52.9% | Abstract reasoning |
The standout numbers: GDPval jumped from 70.9% to 83% — that’s a 17% relative improvement in matching human professionals. OSWorld went from 47.3% to 75%, nearly doubling performance on computer use tasks.
OpenAI also reports GPT-5.4 is 33% less likely to make factual errors in individual claims and 18% less likely to produce responses with any errors at all, compared to GPT-5.2.
GPT-5.4 Computer Use: The Headline Feature
Computer use is what sets GPT-5.4 apart from every other frontier model right now. Here’s how it actually works:
The Process:
- Your application sends a task description plus a screenshot of the current screen to GPT-5.4
- The model analyzes the screenshot — identifying buttons, text fields, menus, and other UI elements
- It returns structured actions: click at coordinates (x, y), type “this text”, scroll down, press a key
- Your agent harness executes those actions, captures a new screenshot, and sends it back
- The cycle repeats until the task is complete
What it’s good for:
- Navigating websites and filling out forms
- Testing web applications
- Automating repetitive desktop workflows
- Creating spreadsheets from web data
- Cross-application tasks (copy from browser, paste into Excel, etc.)
What it’s not (yet):
- Not available in regular ChatGPT — API and Codex only for now
- Best suited for UI-based workflows, not raw computation
- Requires a developer harness to execute the actions
GPT-5.4 Pricing: What It Costs
| Tier | Input (per 1M tokens) | Output (per 1M tokens) | Cached Input |
|---|---|---|---|
| GPT-5.4 | $2.50 | $15.00 | $0.25 |
| GPT-5.4 Pro | $30.00 | $180.00 | — |
| GPT-5 mini | $0.25 | $2.00 | $0.025 |
Important pricing notes:
- Context above 272K tokens: input rate doubles to $5.00/1M
- Batch API: Save 50% on inputs and outputs (async, 24-hour turnaround)
- Data residency/regional processing: additional 10% surcharge
- Tool search saves ~47% on token costs for tool-heavy workflows
For ChatGPT users: GPT-5.4 Thinking is included with Plus ($20/month), Team ($30/month), and Pro ($200/month) subscriptions. Pro users also get access to GPT-5.4 Pro for the hardest tasks.
GPT-5.4 vs Claude Opus 4.6 vs Gemini 3.1 Pro
March 2026 is the most competitive frontier AI landscape we’ve ever seen. Here’s how the three flagship models stack up:
| Category | GPT-5.4 | Claude Opus 4.6 | Gemini 3.1 Pro |
|---|---|---|---|
| Knowledge Work (GDPval) | 83.0% | — | — |
| Coding (SWE-bench) | 57.7% | 80.8% | 80.6% |
| Computer Use (OSWorld) | 75.0% | — | — |
| PhD-Level Science (GPQA Diamond) | — | — | 94.3% |
| Context Window | 1M tokens | 200K tokens | 1M tokens |
| Multimodal | Text + Image | Text + Image | Text + Image + Audio + Video |
| API Input Cost (per 1M) | $2.50 | $15.00 | $1.25 |
| API Output Cost (per 1M) | $15.00 | $75.00 | $10.00 |
The takeaway:
- GPT-5.4 wins on knowledge work, computer use, and professional document tasks. If you need an AI that can operate software and produce business deliverables, this is currently the best choice.
- Claude Opus 4.6 wins on coding quality and complex agentic tasks. For software engineering workflows, Claude still holds the edge.
- Gemini 3.1 Pro wins on value, multimodal capabilities, and PhD-level science reasoning. It’s the cheapest of the three and the only one with native audio/video understanding.
There’s no single “best model” anymore. The right choice depends on your use case.
Who Should Use GPT-5.4?
Use GPT-5.4 if you:
- Build agents that need to interact with desktop or web UIs
- Work with long documents, codebases, or data sets (1M context)
- Need high-quality spreadsheets, presentations, or legal analysis
- Want the most token-efficient reasoning model from OpenAI
- Are already in the OpenAI/ChatGPT ecosystem
Consider alternatives if you:
- Primarily need coding assistance (Claude Opus 4.6 is stronger)
- Need audio/video understanding (Gemini 3.1 Pro is your only option)
- Are cost-sensitive for high-volume API usage (Gemini is cheaper)
- Need the deepest scientific reasoning (Gemini leads on GPQA)
FAQ
How do I access GPT-5.4?
GPT-5.4 Thinking is available now in ChatGPT for Plus, Team, and Pro subscribers. API access is available through the OpenAI developer platform. Computer use is currently API and Codex only.
Is GPT-5.4 better than Claude?
It depends on the task. GPT-5.4 leads on knowledge work (83% GDPval) and computer use (75% OSWorld). Claude Opus 4.6 leads on coding (80.8% SWE-bench). For most professional workflows, GPT-5.4 has the edge. For software engineering, Claude is still ahead.
What is GPT-5.4 computer use?
Computer use lets GPT-5.4 look at screenshots of your screen and return structured actions (clicks, typing, scrolling) that an agent can execute. It works via the API — you send a screenshot, the model tells you what to click, your code executes it, and the cycle repeats.
How much does GPT-5.4 cost?
API pricing is $2.50 per million input tokens and $15.00 per million output tokens. ChatGPT Plus ($20/month) includes GPT-5.4 Thinking. The Pro tier costs $30/$180 per million tokens for maximum performance.
Is GPT-5.4 worth upgrading to?
If you’re currently on GPT-5.2, yes. The 33% reduction in errors alone justifies the switch, and you get computer use, tool search, and a massive context window upgrade. If you’re on GPT-5.3-Codex, the upgrade is more incremental unless you need computer use or the 1M context.
The Bottom Line
GPT-5.4 is a significant release. The computer use capability isn’t a gimmick — it’s the first time any frontier model has beaten human performance on real desktop tasks. Combined with the 1M context window and tool search, it makes GPT-5.4 the best model available for building AI agents that need to interact with the real world.
That said, it’s not the best at everything. Claude still codes better, and Gemini offers more for less money. The AI model landscape in 2026 isn’t about one model winning — it’s about choosing the right tool for the job.
Ready to try GPT-5.4? Access it through ChatGPT or the OpenAI API.
