GPT-OSS 120B: Specifications and GPU VRAM Requirements

GPT-OSS 120B

Open Source

Open Weights

Active Parameters

117B

Context Length

128K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

5 Aug 2025

Knowledge Cutoff

Jun 2024

Technical Specifications

Total Expert Parameters

5.1B

Number of Experts

128

Active Experts

Attention Structure

Multi-Head Attention

Hidden Dimension Size

2880

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

GPT-OSS 120B

GPT-OSS 120B is a large open-weight model from OpenAI, designed to operate in data centers and on high-end desktops and laptops. It is developed to support advanced reasoning, agentic tasks, and diverse developer use cases, functioning as a text-only model for both input and output modalities.

About GPT-OSS

Open-weight language models from OpenAI.

Other GPT-OSS Models

GPT-OSS 20B

Evaluation Benchmarks

Rank

#78

Benchmark	Score	Rank
Summarization ProLLM Summarization	0.98	🥇 1
General Knowledge MMLU	0.90	🥈 2
Coding Aider Coding	0.42	6
Professional Knowledge MMLU Pro	0.81	11
Graduate-Level QA GPQA	0.8	17
Mathematics LiveBench Mathematics	0.69	26
Web Development WebDev Arena	1354	28
Agentic Coding LiveBench Agentic	0.17	34
Reasoning LiveBench Reasoning	0.39	37
Coding LiveBench Coding	0.60	41
Data Analysis LiveBench Data Analysis	0.57	43

Rankings

Overall Rank

#78

Coding Rank

#78

Model Transparency

Total Score

67 / 100

Upstream

19.0 / 30

Model

24.0 / 40

Downstream

23.5 / 30

GPT-OSS 120B Transparency Report

Total Score

/ 100

Audit Note

GPT-OSS 120B demonstrates a significant shift toward transparency for its provider, offering a permissive license and clear architectural specifications regarding its Mixture-of-Experts design. While hardware requirements and tokenization are exceptionally well-documented, the model suffers from a total lack of transparency regarding its training data composition and compute resources. It represents a high-quality open-weight release that remains opaque in its upstream development process.

Upstream

19.0 / 30

Architectural Provenance

7.0 / 10

The model is explicitly identified as a Transformer-based Mixture-of-Experts (MoE) architecture with 36 layers and 128 experts. Documentation specifies the use of SwiGLU activations, Rotary Positional Embeddings (RoPE), and alternating dense and locally banded sparse attention patterns. While the high-level methodology is described in the official model card and technical reports, the specific pretraining procedure and exact architectural modifications are not fully detailed to the level of a peer-reviewed paper.

Dataset Composition

3.0 / 10

OpenAI has been notably circumspect regarding the training data. Official documentation only mentions a 'diverse corpus' of publicly available texts, including books, academic articles, and websites, with an emphasis on STEM and coding. There is no public breakdown of dataset proportions (e.g., web vs. code percentages), no disclosure of specific data sources, and no detailed filtering or cleaning methodology provided.

Tokenizer Integrity

9.0 / 10

The model uses the 'o200k_base' (or 'o200k_harmony') tokenizer, which is publicly available via the tiktoken library. The vocabulary size is precisely stated as 201,088 tokens. It is documented as a fast BPE implementation optimized for multilingual text and code, and its performance has been independently verified by third-party benchmarks showing high token efficiency.

Model

24.0 / 40

Parameter Density

8.0 / 10

The model's parameter counts are clearly disclosed: 117 billion total parameters with 5.1 billion active parameters per token. The MoE structure is well-documented, specifying 128 experts with a Top-4 routing mechanism. The impact of MXFP4 quantization on the MoE layers is also explicitly stated, explaining how the model fits into 80GB of VRAM.

Training Compute

2.0 / 10

There is almost no transparency regarding the compute resources used for training. While the model is optimized for inference on H100 GPUs, OpenAI has not disclosed the total GPU/TPU hours, hardware specifications of the training cluster, training duration, or the carbon footprint associated with the model's development.

Benchmark Reproducibility

5.0 / 10

OpenAI provides results for standard benchmarks (MMLU, GPQA, HumanEval) and specifies the use of 'high' reasoning effort for these tests. However, the exact evaluation code and full prompt sets used for internal scoring are not public. While third-party verification exists on platforms like OpenRouter and Hugging Face, the lack of official reproduction scripts and detailed few-shot examples limits full reproducibility.

Identity Consistency

9.0 / 10

The model consistently identifies itself as GPT-OSS 120B and maintains a clear versioning identity. It is transparent about its nature as an open-weight reasoning model and its relationship to the 'harmony' prompt format. There are no documented cases of the model claiming to be a competitor's product or misrepresenting its core capabilities.

Downstream

23.5 / 30

License Clarity

10.0 / 10

The model is released under the Apache 2.0 license, which is a standard, highly permissive open-source license. The terms are clear, allowing for commercial use, modification, and distribution without conflicting proprietary restrictions. This is verified across the official blog, GitHub repository, and Hugging Face model card.

Hardware Footprint

8.5 / 10

Hardware requirements are exceptionally well-documented. OpenAI and third parties provide specific VRAM targets: 80GB for the MXFP4 quantized version (fitting a single H100) and significantly higher for FP16. Documentation includes guidance for various quantization levels and multi-GPU setups, and these claims have been verified by community deployment reports.

Versioning Drift

5.0 / 10

While the model has a clear initial version and the weights are hosted on Hugging Face with commit history, there is no established long-term changelog or formal semantic versioning policy for future updates. The model is relatively new, and while the 'harmony' format provides some structure, a robust system for tracking and notifying users of behavioral drift is not yet evident.

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

63k

125k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Download Weights Source Code