Command A: Specifications and GPU VRAM Requirements

Command A

闭源

开放权重

参数

111B

上下文长度

256K

模态

Text

架构

Dense

许可证

CC-BY-NC

发布日期

13 Mar 2025

知识截止

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

层数

注意力头

键值头

激活函数

SwigLU

归一化

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Command A

Cohere Command A is a large language model specifically engineered for enterprise applications that demand high performance, security, and computational efficiency. This model is designed to excel in business-critical tasks such as tool use, retrieval augmented generation (RAG), agentic workflows, and multilingual use cases. It demonstrates notable efficiency, capable of running on minimal GPU configurations, thereby reducing the computational overhead for private deployments. Command A is trained to perform effectively across 23 languages, ensuring its applicability in diverse global business environments.

The architectural foundation of Command A is an optimized decoder-only transformer. This architecture incorporates interleaved attention mechanisms, combining three layers of sliding window attention with Rotary Positional Embeddings (RoPE) for efficient local context modeling. A fourth layer employs global attention without positional embeddings, allowing for unrestricted token interactions across extended sequences. Further architectural innovations include grouped-query attention to enhance throughput, shared input and output embeddings to conserve memory, and the omission of bias terms for training stabilization. The model utilizes SwiGLU activation functions.

Command A is optimized for throughput and long-context reasoning. It supports a context length of 256,000 tokens, which enables it to process extensive documents for various enterprise applications. The model is also designed for conversational interactions and is capable of generating responses in a chatty style, optionally using markdown for clarity. It is particularly adept at extracting and manipulating numerical information in financial settings and is trained for conversational tool use, allowing it to interact with external systems such as APIs and databases.

关于 Command

其他 Command 模型

评估基准

排名适用于本地LLM。

排名

#25

基准	分数	排名
Summarization ProLLM Summarization	0.86	🥈 2
General Knowledge MMLU	0.81	🥉 3
StackUnseen ProLLM Stack Unseen	0.23	10
Agentic Coding LiveBench Agentic	0.05	14
Coding LiveBench Coding	0.54	15
Reasoning LiveBench Reasoning	0.36	19
Mathematics LiveBench Mathematics	0.46	20
Data Analysis LiveBench Data Analysis	0.50	21

排名

#25

编程排名

#32

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

125k

250k

所需显存:

资源

官方文档发布说明阅读论文下载权重

Command A

技术规格

系统要求

Command A

关于 Command

其他 Command 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源