OLMo 3 7B Base: Specifications and GPU VRAM Requirements

OLMo 3 7B Base

Open Source

Open Weights

Parameters

Context Length

65.536K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

25 Oct 2025

Knowledge Cutoff

Dec 2024

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

Position Embedding

Absolute Position Embedding

OLMo 3 7B Base

OLMo 3 7B Base represents a foundational component within the Allen Institute for AI's (AI2) OLMo 3 family of language models, designed to advance the scientific understanding and development of large language models. This variant features 7 billion parameters and is trained on 5.93 trillion tokens sourced from the Dolma 3 dataset. A key characteristic of the OLMo 3 project is its commitment to full transparency, offering public access to not only the model weights but also the comprehensive training data, code, intermediate checkpoints, logs, and evaluation methodologies. This approach facilitates reproducibility and supports detailed research into model behavior and development processes.

Architecturally, the OLMo 3 7B Base model is a dense, decoder-only transformer. Its training employs a staged approach, encompassing distinct pretraining, mid-training, and long-context phases to optimize for diverse linguistic capabilities and extended input handling. The model incorporates 32 layers, a hidden dimension size of 4096, and utilizes multi-head attention with 32 query heads and 32 key-value heads. Rotary Positional Embeddings (RoPE) are integrated, with scaling mechanisms implemented to support a substantial context length of 65,536 tokens.

As a base model, OLMo 3 7B is intended primarily for pretraining research and serves as a robust starting point for subsequent fine-tuning across various downstream tasks. Its design prioritizes general capabilities, laying the groundwork for specialized applications in areas such as reasoning, tool use, and instruction following through further post-training. The model's open licensing under Apache 2.0 permits broad usage, including commercial applications, fostering community collaboration and innovation in the AI ecosystem.

About OLMo 3

OLMo (Open Language Model) is a series of fully open language models designed to enable the science of language models. Released by the Allen Institute for AI (Ai2), OLMo 3 provides complete access to training data (Dolma 3), code, checkpoints, logs, and evaluation methodologies. The family includes Base models for pretraining research, Instruct variants for chat and tool use, and Think variants with chain-of-thought reasoning capabilities. All models are trained with staged approach including pretraining, mid-training, and long-context phases.

Other OLMo 3 Models

Evaluation Benchmarks

No evaluation benchmarks for OLMo 3 7B Base available.

Rankings

Overall Rank

Coding Rank

Model Transparency

Total Score

93 / 100

Upstream

28.0 / 30

Model

37.5 / 40

Downstream

27.0 / 30

OLMo 3 7B Base Transparency Report

Total Score

/ 100

Audit Note

OLMo 3 7B Base sets a benchmark for transparency in the AI industry by providing public access to its full training data, code, and intermediate checkpoints. The model's documentation is exceptionally detailed, covering everything from specific dataset percentages to precise GPU power consumption and training hours. This comprehensive disclosure enables a level of scientific auditability and reproducibility that is virtually unmatched by contemporary models.

Upstream

28.0 / 30

Architectural Provenance

9.5 / 10

OLMo 3 7B Base provides exemplary documentation of its architectural lineage. It is a dense, decoder-only transformer with 32 layers, a hidden dimension of 4096, and 32 attention heads. The training methodology is explicitly detailed as a three-stage process: initial pretraining (5.93T tokens), mid-training (100B tokens), and long-context extension (50B tokens). Unlike most models, the full training code is available in the 'OLMo-core' GitHub repository, and the technical report provides exhaustive details on the staged curriculum and architectural choices like RoPE scaling for its 65,536 context window.

Dataset Composition

9.5 / 10

The model's training data, Dolma 3, is fully disclosed with precise percentage breakdowns for each stage. The 5.93T token pretraining mix is documented as 76.07% Common Crawl, 13.57% scientific PDFs, 6.89% code, and 2.56% math. Mid-training and long-context mixes are similarly detailed (e.g., 20% code, 19% math for mid-training). AI2 provides the 'Dolma' toolkit for data processing and has released the actual dataset on Hugging Face, including intermediate checkpoints, which is a rare level of transparency.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly accessible via the Hugging Face repository and the OLMo-core library. It has a stated vocabulary size of 50,304 tokens. Documentation covers the tokenization approach, and the tokenizer is integrated into standard 'transformers' workflows, allowing for immediate verification of token counts and language support alignment.

Model

37.5 / 40

Parameter Density

10.0 / 10

The model is explicitly defined as a dense architecture with 7 billion total parameters. There is no ambiguity regarding active vs. total parameters as seen in MoE models. The architectural breakdown (layers, heads, dimensions) is clearly stated in the technical report and model card, and the parameter count is consistent across all official documentation and third-party implementations.

Training Compute

9.0 / 10

AI2 provides high-granularity compute data, disclosing that the 7B model required approximately 234,000 H100 GPU hours for pretraining. They also provide power consumption metrics (~621W average during pretraining) and total energy draw (~146 MWh). This level of detail allows for precise carbon footprint and cost estimation, far exceeding industry standards.

Benchmark Reproducibility

8.5 / 10

Evaluation is highly transparent through the 'OLMo-Eval' repository and 'OLMES' suite. AI2 discloses exact benchmarks, versions, and results across a wide array of tasks (MMLU, GSM8K, HumanEval, etc.). While they provide the code and prompts used, a minor deduction is made because the technical report notes that some scientific PDFs in the released dataset were redacted post-training for legal reasons, which may slightly impact exact bit-for-bit reproduction of the training run.

Identity Consistency

10.0 / 10

The model demonstrates high identity consistency, correctly identifying itself as an AI2 OLMo model in testing and documentation. There are no reports of the model claiming to be a competitor's product (e.g., GPT-4). Versioning is strictly maintained (OLMo 3 1025-7B), and the model's capabilities and limitations as a base model are clearly articulated.

Downstream

27.0 / 30

License Clarity

10.0 / 10

The model, weights, and code are all released under the Apache 2.0 license, which is a standard, permissive open-source license. There are no conflicting commercial restrictions or 'open-weights-but-not-open-source' ambiguities. The terms for derivative works and commercial use are clear and legally standard.

Hardware Footprint

8.0 / 10

Hardware requirements are well-documented, with VRAM estimates provided for various precisions (FP16 requires ~16GB). Third-party documentation and community testing (e.g., via Ollama and LM Studio) provide additional verification for quantization (Q4/Q8) and context scaling impacts. While AI2 provides the foundation, most detailed quantization trade-off data comes from the community, though the base documentation is sufficient.

Versioning Drift

9.0 / 10

AI2 uses clear semantic versioning and maintains a detailed changelog in the OLMo-core repository. They provide access to intermediate checkpoints (not just the final weights), allowing researchers to track the model's evolution throughout the training process. This 'model flow' approach is the gold standard for tracking drift and behavioral changes.

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

32k

64k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code