ApX 标志ApX 标志

趋近智

Command A

参数

111B

上下文长度

256K

模态

Text

架构

Dense

许可证

CC-BY-NC

发布日期

13 Mar 2025

训练数据截止日期

Jun 2024

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

-

层数

-

注意力头

-

键值头

-

激活函数

SwigLU

归一化

-

位置嵌入

Absolute Position Embedding

Command A

Cohere Command A is a large-scale generative model engineered for high-performance enterprise workflows, particularly those involving tool use, agents, and retrieval-augmented generation (RAG). Developed to provide a high-throughput alternative for production environments, the model maintains a significant parameter count of 111 billion while optimizing for deployment on common dual-GPU hardware configurations. Its design focuses on business-critical accuracy and speed, supporting a standard context window of 256,000 tokens to facilitate the processing of extensive corporate documentation and long-form conversational histories.

The model's architecture is a decoder-only transformer that utilizes several sophisticated structural innovations to balance local and global context. It features a hybrid attention mechanism where three-quarters of the layers employ sliding window attention for efficient local modeling, while every fourth layer uses a full global attention mechanism to maintain long-range dependencies. Technical specifications include the use of Grouped Query Attention (GQA) for optimized inference throughput, SwiGLU activation functions for improved gradient flow, and the omission of bias terms to stabilize the training process. Positional information is handled via Rotary Positional Embeddings (RoPE) in local attention layers, whereas global layers utilize a position-agnostic approach.

Optimized for global enterprise deployment, Command A is trained across 23 languages, including major business languages such as English, French, Spanish, Chinese, and Arabic. The model is specifically aligned for conversational tool use, allowing it to interact with external APIs, databases, and search engines with high precision. This alignment, achieved through supervised fine-tuning and preference optimization, makes it particularly effective for multi-step agentic reasoning and financial data manipulation where extracting numerical details from complex contexts is required.

关于 Command


其他 Command 模型

评估基准

排名

#106

基准分数排名

0.86

6

0.12

9

排名

排名

#106

编程排名

#90

模型透明度

总分

B+

73 / 100

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
125k
250k

所需显存:

推荐 GPU