OpenAI has introduced GPT‑5.1-Codex-Max, a new frontier agentic coding model now available in its Codex developer environment. The release marks a significant step forward in AI-assisted software engineering, offering improved long-horizon reasoning, efficiency, and real-time interactive capabilities. GPT‑5.1-Codex-Max will now replace GPT‑5.1-Codex as the default model across Codex-integrated surfaces.The new model is designed to serve as a persistent, high-context software development agent, capable of managing complex refactors, debugging workflows, and project-scale tasks across multiple context windows.It comes on the heels of Google releasing its powerful new Gemini 3 Pro model yesterday, yet still outperforms or matches it on key coding benchmarks: On SWE-Bench Verified, GPT‑5.1-Codex-Max achieved 77.9% accuracy at extra-high reasoning effort, edging past Gemini 3 Pro’s 76.2%. It also led on Terminal-Bench 2.0, with 58.1% accuracy versus Gemini’s 54.2%, and matched Gemini’s score of 2,439 on LiveCodeBench Pro, a competitive coding Elo benchmark.When measured against Gemini 3 Pro’s most advanced configuration — its Deep Thinking model — Codex-Max holds a slight edge in agentic coding benchmarks, as well. Performance Benchmarks: Incremental Gains Across Key TasksGPT‑5.1-Codex-Max demonstrates measurable improvements over GPT‑5.1-Codex across a range of standard software engineering benchmarks. On SWE-Lancer IC SWE, it achieved 79.9% accuracy, a significant increase from GPT‑5.1-Codex’s 66.3%. In SWE-Bench Verified (n=500), it reached 77.9% accuracy at extra-high reasoning effort, outperforming GPT‑5.1-Codex’s 73.7%.Performance on Terminal Bench 2.0 (n=89) showed more modest improvements, with GPT‑5.1-Codex-Max achieving 58.1% accuracy compared to 52.8% for GPT‑5.1-Codex. All evaluations were run with compaction and extra-high reasoning effort enabled.These results indicate that the new model offers a higher ceiling on both benchmarked correctness and real-world usability under extended reasoning loads.Technical Architecture: Long-Horizon Reasoning via CompactionA major architectural improvement in GPT‑5.1-Codex-Max is its ability to reason effec …