趋近智
参数
-
上下文长度
131.072K
模态
Code
架构
Dense
许可证
Proprietary
发布日期
12 Feb 2026
训练数据截止日期
-
注意力结构
Multi-Head Attention
隐藏维度大小
-
层数
-
注意力头
-
键值头
-
激活函数
-
归一化
-
位置嵌入
Absolute Position Embedding
GPT-5.3-Codex-Spark is a specialized, low-latency large language model designed for real-time, interactive software development. Developed through a collaboration between OpenAI and Cerebras Systems, it functions as a streamlined variant of the broader GPT-5.3-Codex family. The model is engineered to provide a responsive experience during live coding sessions, enabling immediate feedback for tasks such as targeted logic adjustments, interface refinements, and incremental refactoring. By prioritizing inference speed, the model facilitates a collaborative workflow where developers can steer code generation in real time, effectively reducing the temporal gap between intent and execution.
The technical foundation of GPT-5.3-Codex-Spark is defined by its deployment on the Cerebras Wafer-Scale Engine 3 (WSE-3). Unlike traditional distributed GPU architectures that are often constrained by the 'memory wall' and interconnect latency, the WSE-3 utilizes a single, massive silicon wafer with integrated high-bandwidth memory (SRAM) and hundreds of thousands of optimized cores. This hardware synergy allows the model to achieve a throughput exceeding 1,000 tokens per second. To further minimize end-to-end latency, the system utilizes persistent WebSocket connections and a revised inference stack that accelerates session initialization and reduces network overhead by approximately 80 percent compared to standard RESTful API implementations.
Architecturally, the model is a dense transformer optimized for high-velocity text-only generation. It supports a 128k token context window, which is tailored to handle significant portions of active files and immediate project dependencies. The model's behavior is specifically tuned for a lightweight interaction style, favoring precise, minimal edits over extensive, autonomous code rewrites. This design choice ensures that the developer remains the primary driver of the logic while the model serves as an ultra-fast, interruptible completion engine. It is delivered via a dedicated latency-first serving tier that operates alongside OpenAI's existing GPU infrastructure.
GPT-5.3 Codex is OpenAI's ultra-fast coding model family, developed in partnership with Cerebras and the Wafer Scale Engine 3, purpose-built for real-time interactive coding with minimal latency.
排名
#58
| 基准 | 分数 | 排名 |
|---|---|---|
Instruction Following IFEval | 0.92 | 🥈 2 |
Grade School Math GSM8K | 0.97 | ⭐ 4 |
Software Engineering (Verified) SWE-bench Verified | 0.66 | 20 |
General Knowledge MMLU | 0.89 | ⭐ 21 |
Graduate-Level QA GPQA | 0.51 | 84 |
排名
#58
编程排名
#73