OLMo 3 7B Think

Open Source

Open Weights

Parameters

Context Length

66K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

25 Oct 2025

Knowledge Cutoff

Dec 2024

System Requirements

VRAM requirements for different quantization methods and context sizes

1,024 tokens

16.76 GB VRAM

Consumer

1x RTX 4090

24GB VRAM

Datacenter

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

65,536 tokens

52.28 GB VRAM

Consumer

3x RTX 4090

24GB VRAM

Datacenter

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

Architecture Diagram

Evaluation Benchmarks

No evaluation benchmarks for OLMo 3 7B Think available.

Rankings

Overall Rank

Coding Rank

About OLMo 3 7B Think

The OLMo 3 7B Think model is a specialized variant within the OLMo 3 family, developed by the Allen Institute for AI (Ai2). This model is engineered to address complex problems requiring multi-step logical inference by making its reasoning process transparent. It is designed to surface intermediate thinking steps, providing researchers and developers with explicit thinking tokens to examine the model's internal deliberations before reaching a final answer. This capability supports enhanced interpretability and auditability of AI systems.

Architecturally, OLMo 3 7B Think is a Transformer-style autoregressive language model with a dense architecture, comprising 7 billion parameters. It utilizes a multi-headed attention mechanism and incorporates Rotary Position Embeddings (RoPE) with scaling to support an extended context length of up to 65,536 tokens. The model's training methodology involves a multi-stage approach. It is initially pre-trained on the comprehensive Dolma 3 dataset and subsequently post-trained through Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning from Verifiable Rewards (RLVR) on custom Dolci-Think datasets. This layered training focuses on imbuing the model with robust reasoning skills, particularly in domains such as mathematics and coding, while ensuring the model's 'thought process' is explicitly generated.

This variant is optimized for reasoning-intensive tasks, providing a capable foundation for academic research and practical Natural Language Processing (NLP) workflows that demand transparent problem-solving. Its design allows for efficient, inspectable reasoning capabilities, making advanced AI accessible on more modest hardware. The full transparency of the OLMo project, which includes the release of all training data, code, checkpoints, and associated training details under an Apache 2.0 license, fosters reproducibility and further scientific inquiry into model development and behavior.

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

Position Embedding

Absolute Position Embedding

RoPE Theta

500,000

Sliding Window Attention

Yes

Sliding Window Size

4,096

Sliding Window Ratio

Linear Attention

Linear Attention Ratio

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

4,096

Number of Layers

FFN Intermediate Size (Dense)

11,008

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

100,278

Model Integrity

Total Score

B+

84 / 100

Upstream

27.0 / 30

Model

31.5 / 40

Downstream

25.5 / 30

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code

About OLMo 3

OLMo (Open Language Model) is a series of fully open language models designed to enable the science of language models. Released by the Allen Institute for AI (Ai2), OLMo 3 provides complete access to training data (Dolma 3), code, checkpoints, logs, and evaluation methodologies. The family includes Base models for pretraining research, Instruct variants for chat and tool use, and Think variants with chain-of-thought reasoning capabilities. All models are trained with staged approach including pretraining, mid-training, and long-context phases.