Active Parameters
21B
Context Length
128K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
5 Aug 2025
Knowledge Cutoff
Jun 2024
Total Expert Parameters
3.6B
Number of Experts
32
Active Experts
4
Attention Structure
Multi-Head Attention
Hidden Dimension Size
2880
Number of Layers
24
Attention Heads
64
Key-Value Heads
8
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
GPT-OSS 20B is a text-based language model developed by OpenAI, specifically engineered to deliver high-performance reasoning on consumer-grade hardware. As part of the GPT-OSS family, this model balances computational efficiency with complex task execution, utilizing a sparse architecture to maintain a low memory footprint. It is designed to function as a flexible component in local and enterprise environments, where data privacy and low-latency response times are critical requirements.
The model utilizes a Mixture-of-Experts (MoE) transformer architecture consisting of 24 layers. While the total parameter count is 21 billion, the system only activates 3.6 billion parameters per token during the forward pass. This sparsity is achieved through a routing mechanism that selects four active experts from a pool of 32 for each token. The architecture incorporates several modern optimizations, including SwiGLU activation functions, Root Mean Square (RMS) normalization, and Grouped-Query Attention (GQA) with eight key-value heads to optimize memory throughput. It also supports a native context window of 128,000 tokens using Rotary Positional Embeddings (RoPE).
Functionally, GPT-OSS 20B is optimized for agentic workflows and complex reasoning tasks. It supports features such as native tool use, function calling, and a configurable reasoning effort system that allows developers to adjust the model's processing depth based on the specific latency needs of the application. The model is trained using a specialized response format to facilitate consistent structured outputs and long-form chain-of-thought reasoning, making it suitable for scientific analysis, code generation, and specialized technical assistance on local devices.
Rank
#70
| Benchmark | Score | Rank |
|---|---|---|
Summarization ProLLM Summarization | 0.86 | 6 |
General Knowledge MMLU | 0.85 | 11 |
Web Development WebDev Arena | 1317 | 38 |
Overall Rank
#70
Coding Rank
#51
Total Score
67
/ 100
GPT-OSS 20B exhibits a bifurcated transparency profile, offering industry-leading clarity on architecture, licensing, and hardware requirements while remaining almost entirely opaque regarding its training data and compute resources. The model's technical documentation for inference and quantization is exemplary, yet the lack of dataset provenance and training metrics prevents a full understanding of its developmental lifecycle. Its commitment to a permissive Apache 2.0 license and open weights marks a significant shift in transparency for the provider, though evaluation reproducibility remains hampered by undisclosed prompting strategies.
Architectural Provenance
The model architecture is extensively documented in the official model card and technical reports. It is a Mixture-of-Experts (MoE) Transformer with 24 layers, utilizing 32 experts with 4 active per token. Specific technical optimizations are disclosed, including SwiGLU activation functions (with clamping and residual connections), Grouped-Query Attention (GQA) with 8 key-value heads, and Rotary Positional Embeddings (RoPE). The model supports a native context window of 128k tokens. While the training methodology is described as a mix of reinforcement learning and advanced pre-training, the exact 'from scratch' vs. 'fine-tuned' lineage of the base weights is slightly obscured by references to 'internal frontier systems,' preventing a perfect score.
Dataset Composition
Transparency regarding the training data is minimal. Official documentation states the model was trained on a 'mostly English, text-only dataset' with a focus on STEM, coding, and general knowledge. However, there is no public breakdown of data sources, no specific proportions (e.g., web vs. books vs. code), and no detailed disclosure of filtering or cleaning methodologies. Third-party reports explicitly list training data collection and labeling as 'undisclosed.'
Tokenizer Integrity
The model uses the 'o200k_harmony' tokenizer, which is a BPE-style tokenizer with a vocabulary size of approximately 200,000 tokens. This tokenizer is publicly available via the 'tiktoken' library and is documented as a superset of the tokenizer used in GPT-4o. It is well-integrated into the 'openai-harmony' package and reference implementations, allowing for full verification of tokenization behavior and language support alignment.
Parameter Density
OpenAI provides precise figures for both total and active parameters. The model has 21.1 billion total parameters, with 3.6 billion active parameters per token. The MoE structure is clearly defined (32 experts, top-4 routing). Additionally, the use of MXFP4 quantization for MoE weights is explicitly documented, including how weights are packed and scaled, which provides high transparency into the model's density and memory efficiency.
Training Compute
There is almost no verifiable information regarding the compute resources used for training. While the hardware requirements for inference are well-documented, the training duration, GPU/TPU hours, hardware specifications used during training, and the resulting carbon footprint are entirely absent from public documentation. Claims of 'advanced pre-training' serve as marketing language rather than technical disclosure.
Benchmark Reproducibility
While OpenAI provides results for several benchmarks (MMLU, HumanEval, Tau-Bench, HealthBench), the evaluation methodology lacks full transparency. Evaluation code is not fully public, and exact prompts or few-shot examples used for official scores are not consistently disclosed. Third-party audits (e.g., arXiv:2508.17525) have noted discrepancies where the 20B model outperforms the 120B variant, suggesting non-monotonic scaling or evaluation inconsistencies that are not addressed in official docs. A -2 penalty was applied due to industry-wide concerns regarding training data overlap with common coding benchmarks like SWE-bench.
Identity Consistency
The model demonstrates strong identity consistency, correctly identifying itself as part of the GPT-OSS family in standard deployments. It is designed to use the 'Harmony' response format, which includes specific roles (System, Developer, User, Assistant, Tool) that help maintain its persona and operational boundaries. There are no documented cases of the model claiming to be a competitor's product or denying its AI nature.
License Clarity
The model is released under the Apache 2.0 license, which is a standard, highly permissive open-source license. This allows for commercial use, modification, and distribution without the 'copyleft' restrictions found in other licenses. The terms are clear, publicly accessible on GitHub and Hugging Face, and do not conflict with the model's stated 'open-weight' status.
Hardware Footprint
Hardware requirements are exceptionally well-documented. OpenAI and third-party partners (NVIDIA, Unsloth) provide specific VRAM targets: 16GB for the 20B model using native MXFP4 quantization. Performance metrics for different hardware (Mac, H100, consumer GPUs) and token-per-second estimates are widely available. The impact of the 'reasoning effort' setting on latency is also clearly explained.
Versioning Drift
The model supports 'snapshots' to lock in specific versions, and a basic versioning system is in place. However, there is no comprehensive public changelog or detailed history of weight updates since the initial release. While the 'openai-harmony' package is versioned, the underlying model weights lack the granular semantic versioning required for a higher score, making it difficult to track silent drift or minor optimizations.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens