Parameters
7B
Context Length
65.536K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
25 Oct 2025
Knowledge Cutoff
Dec 2024
Attention Structure
Multi-Head Attention
Hidden Dimension Size
4096
Number of Layers
32
Attention Heads
32
Key-Value Heads
32
Activation Function
SwigLU
Normalization
-
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
OLMo 3 7B Base represents a foundational component within the Allen Institute for AI's (AI2) OLMo 3 family of language models, designed to advance the scientific understanding and development of large language models. This variant features 7 billion parameters and is trained on 5.93 trillion tokens sourced from the Dolma 3 dataset. A key characteristic of the OLMo 3 project is its commitment to full transparency, offering public access to not only the model weights but also the comprehensive training data, code, intermediate checkpoints, logs, and evaluation methodologies. This approach facilitates reproducibility and supports detailed research into model behavior and development processes.
Architecturally, the OLMo 3 7B Base model is a dense, decoder-only transformer. Its training employs a staged approach, encompassing distinct pretraining, mid-training, and long-context phases to optimize for diverse linguistic capabilities and extended input handling. The model incorporates 32 layers, a hidden dimension size of 4096, and utilizes multi-head attention with 32 query heads and 32 key-value heads. Rotary Positional Embeddings (RoPE) are integrated, with scaling mechanisms implemented to support a substantial context length of 65,536 tokens.
As a base model, OLMo 3 7B is intended primarily for pretraining research and serves as a robust starting point for subsequent fine-tuning across various downstream tasks. Its design prioritizes general capabilities, laying the groundwork for specialized applications in areas such as reasoning, tool use, and instruction following through further post-training. The model's open licensing under Apache 2.0 permits broad usage, including commercial applications, fostering community collaboration and innovation in the AI ecosystem.
OLMo (Open Language Model) is a series of fully open language models designed to enable the science of language models. Released by the Allen Institute for AI (Ai2), OLMo 3 provides complete access to training data (Dolma 3), code, checkpoints, logs, and evaluation methodologies. The family includes Base models for pretraining research, Instruct variants for chat and tool use, and Think variants with chain-of-thought reasoning capabilities. All models are trained with staged approach including pretraining, mid-training, and long-context phases.
Ranking is for Local LLMs.
No evaluation benchmarks for OLMo 3 7B Base available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens