趋近智
参数
7B
上下文长度
65.536K
模态
Text
架构
Dense
许可证
Apache 2.0
发布日期
25 Oct 2025
训练数据截止日期
Dec 2024
注意力结构
Multi-Head Attention
隐藏维度大小
4096
层数
32
注意力头
32
键值头
32
激活函数
SwigLU
归一化
-
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
OLMo 3 7B Base represents a foundational component within the Allen Institute for AI's (AI2) OLMo 3 family of language models, designed to advance the scientific understanding and development of large language models. This variant features 7 billion parameters and is trained on 5.93 trillion tokens sourced from the Dolma 3 dataset. A key characteristic of the OLMo 3 project is its commitment to full transparency, offering public access to not only the model weights but also the comprehensive training data, code, intermediate checkpoints, logs, and evaluation methodologies. This approach facilitates reproducibility and supports detailed research into model behavior and development processes.
Architecturally, the OLMo 3 7B Base model is a dense, decoder-only transformer. Its training employs a staged approach, encompassing distinct pretraining, mid-training, and long-context phases to optimize for diverse linguistic capabilities and extended input handling. The model incorporates 32 layers, a hidden dimension size of 4096, and utilizes multi-head attention with 32 query heads and 32 key-value heads. Rotary Positional Embeddings (RoPE) are integrated, with scaling mechanisms implemented to support a substantial context length of 65,536 tokens.
As a base model, OLMo 3 7B is intended primarily for pretraining research and serves as a robust starting point for subsequent fine-tuning across various downstream tasks. Its design prioritizes general capabilities, laying the groundwork for specialized applications in areas such as reasoning, tool use, and instruction following through further post-training. The model's open licensing under Apache 2.0 permits broad usage, including commercial applications, fostering community collaboration and innovation in the AI ecosystem.
OLMo (Open Language Model) is a series of fully open language models designed to enable the science of language models. Released by the Allen Institute for AI (Ai2), OLMo 3 provides complete access to training data (Dolma 3), code, checkpoints, logs, and evaluation methodologies. The family includes Base models for pretraining research, Instruct variants for chat and tool use, and Think variants with chain-of-thought reasoning capabilities. All models are trained with staged approach including pretraining, mid-training, and long-context phases.
排名适用于本地LLM。
没有可用的 OLMo 3 7B Base 评估基准。