GPT-5.4

Closed Source

Closed Weights

Parameters

Context Length

272K

Modality

Multimodal

Architecture

Dense

License

Proprietary

Release Date

5 Mar 2026

Knowledge Cutoff

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

Position Embedding

Absolute Position Embedding

RoPE Theta

Sliding Window Attention

Sliding Window Size

Normalization

Activation Function

Dimensions

Hidden Dimension Size

Number of Layers

FFN Intermediate Size (Dense)

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

GPT-5.4

GPT-5.4 is OpenAI's most capable and efficient frontier model for professional work. It brings together advances in reasoning, coding, and agentic workflows into a single model. Features industry-leading coding capabilities from GPT-5.3-Codex, native state-of-the-art computer-use capabilities, and improved tool use across large ecosystems. Excels at professional tasks involving spreadsheets, presentations, and documents. Achieves 83.0% on GDPval, 75.0% on OSWorld-Verified, 82.7% on BrowseComp, 57.7% on SWE-Bench Pro, and 81.2% on MMMU Pro. Supports up to 272K context (1M experimental) and delivers most token-efficient reasoning yet.

About GPT-5

OpenAI's latest generation of language models featuring advanced reasoning capabilities, extended context windows up to 400K tokens, and specialized variants for coding, general intelligence, and efficiency. GPT-5 series introduces improved thinking modes, superior performance across benchmarks, and variants optimized for different use cases from high-capacity Pro models to efficient Nano models. Features native multimodal understanding, enhanced mathematical reasoning, and state-of-the-art coding abilities through Codex variants.

Other GPT-5 Models

Evaluation Benchmarks

Rank

Benchmark	Score	Rank
Agentic Coding LiveBench Agentic	0.70	🥇 1
Mathematics LiveBench Mathematics	0.94	🥇 1
Data Analysis LiveBench Data Analysis	0.79	🥇 1
Reasoning LiveBench Reasoning	0.88	🥈 2
Professional Knowledge MMLU Pro	0.87	⭐ 6

Rankings

Overall Rank

Coding Rank

Model Integrity

Total Score

35 / 100

Upstream

10.0 / 30

Model

15.0 / 40

Downstream

10.0 / 30

GPT-5.4 Model Integrity Report

Total Score

/ 100

Audit Note

GPT-5.4 exhibits a high degree of opacity regarding its internal architecture, parameter density, and training data provenance. While it provides clear versioning and consistent self-identification, the lack of reproducible evaluation methodologies and compute disclosures significantly hinders independent verification. The model's transparency profile is characterized by detailed performance claims that lack the underlying technical documentation required for a frontier system.

Upstream

10.0 / 30

Architectural Provenance

3.0 / 10

OpenAI identifies GPT-5.4 as a 'unified frontier model' that integrates capabilities from previous iterations like GPT-5.3-Codex and GPT-5.2 Thinking. However, the underlying architecture is described only as 'dense' in the provided metadata, and official documentation lacks specific technical details regarding layer counts, attention mechanisms, or the specific methodology used to 'absorb' the specialist coding model into the mainline architecture. The training methodology is described in vague marketing terms such as 'advances in reasoning' without public technical papers or architectural diagrams.

Dataset Composition

2.0 / 10

There is no public disclosure of the specific datasets used to train GPT-5.4. Documentation mentions general categories like 'web research,' 'professional work,' and 'coding,' but provides no percentage breakdown, source naming, or detailed filtering/cleaning methodologies. Claims of being 'factually grounded' and '33% less likely to contain false claims' are assertions without verifiable data provenance or access to training samples.

Tokenizer Integrity

5.0 / 10

The model supports a standard 272K context window and an experimental 1M (1,050,000) token window in the API and Codex. While the API documentation provides a 'hard model contract' for token limits and pricing, the specific tokenizer vocabulary size and training alignment for GPT-5.4 are not explicitly documented in a public technical report. Users can observe tokenization behavior via the API, but the underlying BPE configuration for this specific version remains opaque.

Model

15.0 / 40

Parameter Density

1.0 / 10

The parameter count for GPT-5.4 is officially 'Unknown.' While the model is described as 'dense,' there is no verifiable information regarding the total number of parameters or the architectural breakdown (e.g., attention vs. FFN). This lack of disclosure is a significant transparency gap for a frontier model.

Training Compute

2.0 / 10

OpenAI has not disclosed the specific GPU/TPU hours, hardware counts, or energy consumption for GPT-5.4. While third-party estimates exist for the broader GPT-5 family (e.g., 50,000 H100s), official documentation for the 5.4 variant provides no carbon footprint calculations or hardware specifications, relying instead on vague claims of being 'most efficient.'

Benchmark Reproducibility

4.0 / 10

OpenAI provides specific scores for several benchmarks (83.0% GDPval, 75.0% OSWorld-Verified, 81.2% MMMU Pro). However, the evaluation code and exact prompts used to achieve these results are not fully public. While some benchmarks like 'CoT Controllability' are described as open-source, the 'GDPval' and 'internal finance evaluations' lack the necessary documentation for independent third-party reproduction.

Identity Consistency

8.0 / 10

The model maintains a consistent identity across platforms, correctly identifying itself as GPT-5.4 in the API and 'GPT-5.4 Thinking' in ChatGPT. It provides version-aware responses and is transparent about its 'Thinking' and 'Pro' variants. There is no evidence of the model claiming to be a competitor's product.

Downstream

10.0 / 30

License Clarity

3.0 / 10

The model is under a 'Proprietary' license. While the Terms of Service for the API and ChatGPT are public, they include restrictive clauses regarding 'abusive usage' and 'programmatic extraction' that are not clearly defined. The lack of an open-source license or clear derivative works policy for the weights limits transparency for developers.

Hardware Footprint

2.0 / 10

As a closed-source API-based model, there is no public documentation regarding the VRAM requirements or hardware footprint for local deployment. While OpenAI provides pricing based on token usage and context length, it offers no guidance on the quantization impact or memory scaling for the 1M token experimental window.

Versioning Drift

5.0 / 10

OpenAI uses a clear versioning scheme (5.2, 5.3, 5.4) and provides a 'snapshot' feature in the API to lock in specific versions. However, the 'experimental' nature of the 1M context window and the retirement of previous models (e.g., GPT-5.2 retiring June 2026) suggest potential for silent drift and breaking changes without comprehensive public changelogs for weight updates.

Resources

Official Documentation