ApX logoApX logo

OLMo 3 7B Think

Parameters

7B

Context Length

65.536K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

25 Oct 2025

Knowledge Cutoff

Dec 2024

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

32

Key-Value Heads

32

Position Embedding

Absolute Position Embedding

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

4,096

Number of Layers

32

FFN Intermediate Size

11,008

Tokenizer

Vocabulary Size

100,278

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 4.1k · Context: 65.5k · Vocab: 100.3kx 32 layersRMSNormPre-AttentionMulti-Head Attention32Q / 32KV headsHead dim: 128+RMSNormPre-FFNFeed-Forward NetworkSwiGLUIntermediate: 11k+Final RMSNormOutput Logits

OLMo 3 7B Think

The OLMo 3 7B Think model is a specialized variant within the OLMo 3 family, developed by the Allen Institute for AI (Ai2). This model is engineered to address complex problems requiring multi-step logical inference by making its reasoning process transparent. It is designed to surface intermediate thinking steps, providing researchers and developers with explicit thinking tokens to examine the model's internal deliberations before reaching a final answer. This capability supports enhanced interpretability and auditability of AI systems.

Architecturally, OLMo 3 7B Think is a Transformer-style autoregressive language model with a dense architecture, comprising 7 billion parameters. It utilizes a multi-headed attention mechanism and incorporates Rotary Position Embeddings (RoPE) with scaling to support an extended context length of up to 65,536 tokens. The model's training methodology involves a multi-stage approach. It is initially pre-trained on the comprehensive Dolma 3 dataset and subsequently post-trained through Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning from Verifiable Rewards (RLVR) on custom Dolci-Think datasets. This layered training focuses on imbuing the model with robust reasoning skills, particularly in domains such as mathematics and coding, while ensuring the model's 'thought process' is explicitly generated.

This variant is optimized for reasoning-intensive tasks, providing a capable foundation for academic research and practical Natural Language Processing (NLP) workflows that demand transparent problem-solving. Its design allows for efficient, inspectable reasoning capabilities, making advanced AI accessible on more modest hardware. The full transparency of the OLMo project, which includes the release of all training data, code, checkpoints, and associated training details under an Apache 2.0 license, fosters reproducibility and further scientific inquiry into model development and behavior.

About OLMo 3

OLMo (Open Language Model) is a series of fully open language models designed to enable the science of language models. Released by the Allen Institute for AI (Ai2), OLMo 3 provides complete access to training data (Dolma 3), code, checkpoints, logs, and evaluation methodologies. The family includes Base models for pretraining research, Instruct variants for chat and tool use, and Think variants with chain-of-thought reasoning capabilities. All models are trained with staged approach including pretraining, mid-training, and long-context phases.


Other OLMo 3 Models

Evaluation Benchmarks

No evaluation benchmarks for OLMo 3 7B Think available.

Rankings

Overall Rank

-

Coding Rank

-

Model Integrity

Total Score

B+

84 / 100

OLMo 3 7B Think Model Integrity Report

Total Score

84

/ 100

B+

Audit Note

OLMo 3 7B Think sets a high standard for structural transparency, providing public access to training code, energy consumption data, and the full 'model flow' from base to RLVR stages. Its primary strength lies in its permissive Apache 2.0 licensing and the release of its underlying Dolma 3 dataset. However, the integrity of its benchmark claims is significantly undermined by evidence of semantic data contamination, which suggests the model's reasoning performance may be partially inflated by exposure to test-like data during training.

Upstream

27.0 / 30

Architectural Provenance

9.5 / 10

OLMo 3 7B Think provides exemplary architectural transparency. The model is documented as a dense, decoder-only Transformer with 7 billion parameters, utilizing multi-headed attention and Rotary Position Embeddings (RoPE). The Allen Institute for AI (Ai2) has released the full 'model flow,' including the base model (OLMo 3 7B Base), the SFT and DPO intermediate checkpoints, and the final RLVR-tuned weights. The training methodology is explicitly detailed across three stages: pre-training on Dolma 3, mid-training for long-context (up to 65,536 tokens), and a 'thinking' stack involving SFT, DPO, and Reinforcement Learning from Verifiable Rewards (RLVR). Full training code is available in the OLMo-core GitHub repository.

Dataset Composition

9.0 / 10

The model's training data is among the most transparent in the industry. It is pre-trained on Dolma 3, a ~9.3 trillion token corpus (refined to a 6T token mix) with disclosed sources including web content, scientific PDFs (processed via olmOCR), codebases, and math problems. Post-training utilizes the Dolci-Think datasets (SFT, DPO, and RL variants), which are also publicly available. Ai2 provides tools like OlmoTrace to connect model outputs back to specific training data points, though the exact proportions of the final 'Dolma 3 Mix' are described in general categories (web, science, code, math) rather than a precise percentage table for every sub-component.

Tokenizer Integrity

8.5 / 10

The tokenizer is publicly accessible via the Hugging Face repository and integrated into the standard 'transformers' library. It supports a vocabulary size consistent with the OLMo family (typically ~50k-100k tokens, though specific vocabulary files are public for verification). The tokenization approach is documented within the OLMo-core repository, and the model's support for a 65k context window is explicitly tied to its tokenization and embedding scaling (RoPE).

Model

31.5 / 40

Parameter Density

9.0 / 10

The model is clearly identified as a dense architecture with 7.0 billion parameters. Unlike MoE models that often obscure active parameter counts, OLMo 3 7B Think explicitly states its total and active parameters are identical. Detailed architectural configurations (layers, heads, embedding dimensions) are available in the official config files on Hugging Face and GitHub.

Training Compute

8.5 / 10

Ai2 provides significant detail regarding training compute, a rarity in the field. Documentation and community disclosures specify that the 7B model required approximately 234,000 H100 GPU hours for pre-training, with an average power consumption of ~621W per GPU, totaling ~146 MWh of energy. While the carbon footprint calculation depends on the specific data center's energy mix, the raw data required for such a calculation is publicly provided.

Benchmark Reproducibility

4.5 / 10

While Ai2 provides the OLMES evaluation framework and lists scores for standard benchmarks (MATH: 96.1%, HumanEvalPlus: 91.4%), independent audits have identified significant issues with 'soft contamination'—where semantic duplicates of benchmark problems (such as CodeForces and ZebraLogic) were found within the training sets. This significantly complicates the ability to reproduce these results as measures of general reasoning rather than data memorization. The score is penalized for these undisclosed contamination issues that affect the validity of the reported benchmarks.

Identity Consistency

9.5 / 10

The model exhibits high identity consistency, correctly identifying itself as 'Olmo' and an AI assistant developed by the Allen Institute for AI. It is transparent about its 'thinking' nature, using explicit tokens to separate reasoning from final answers. There are no reported instances of the model claiming to be a competitor's product (e.g., GPT-4 or Claude).

Downstream

25.5 / 30

License Clarity

10.0 / 10

The model, its weights, training code, and datasets are all released under the Apache 2.0 license. This is a highly permissive, standard open-source license with no hidden commercial restrictions or conflicting terms of service. The licensing is consistent across all repositories and official documentation.

Hardware Footprint

8.0 / 10

Hardware requirements are well-documented by both the provider and the community. The model requires approximately 14GB of VRAM for FP16 inference, and quantization profiles (GGUF, EXL2) are widely available with documented impacts on memory and performance. Ai2 provides guidance on running the model on consumer hardware, and third-party calculators accurately reflect the model's actual footprint.

Versioning Drift

7.5 / 10

Ai2 maintains a clear versioning history, as seen with the rapid release and documentation of OLMo 3.1 to address specific reasoning improvements. A detailed CHANGELOG is maintained in the OLMo-core repository. However, because the model is part of a 'model flow' with frequent checkpoint releases, tracking specific behavioral drift between minor intermediate versions can be challenging for end-users.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
32k
64k

VRAM Required:

Recommended GPUs

OLMo 3 7B Think: Specifications and GPU VRAM Requirements