Command A: Specifications and GPU VRAM Requirements

Command A

Closed Source

Open Weights

Parameters

111B

Context Length

256K

Modality

Text

Architecture

Dense

License

CC-BY-NC

Release Date

13 Mar 2025

Knowledge Cutoff

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

Command A

Cohere Command A is a large language model specifically engineered for enterprise applications that demand high performance, security, and computational efficiency. This model is designed to excel in business-critical tasks such as tool use, retrieval augmented generation (RAG), agentic workflows, and multilingual use cases. It demonstrates notable efficiency, capable of running on minimal GPU configurations, thereby reducing the computational overhead for private deployments. Command A is trained to perform effectively across 23 languages, ensuring its applicability in diverse global business environments.

The architectural foundation of Command A is an optimized decoder-only transformer. This architecture incorporates interleaved attention mechanisms, combining three layers of sliding window attention with Rotary Positional Embeddings (RoPE) for efficient local context modeling. A fourth layer employs global attention without positional embeddings, allowing for unrestricted token interactions across extended sequences. Further architectural innovations include grouped-query attention to enhance throughput, shared input and output embeddings to conserve memory, and the omission of bias terms for training stabilization. The model utilizes SwiGLU activation functions.

Command A is optimized for throughput and long-context reasoning. It supports a context length of 256,000 tokens, which enables it to process extensive documents for various enterprise applications. The model is also designed for conversational interactions and is capable of generating responses in a chatty style, optionally using markdown for clarity. It is particularly adept at extracting and manipulating numerical information in financial settings and is trained for conversational tool use, allowing it to interact with external systems such as APIs and databases.

About Command

Other Command Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#25

Benchmark	Score	Rank
Summarization ProLLM Summarization	0.86	🥈 2
General Knowledge MMLU	0.81	🥉 3
StackUnseen ProLLM Stack Unseen	0.23	10
Agentic Coding LiveBench Agentic	0.05	14
Coding LiveBench Coding	0.54	15
Reasoning LiveBench Reasoning	0.36	19
Mathematics LiveBench Mathematics	0.46	20
Data Analysis LiveBench Data Analysis	0.50	21

Rankings

Overall Rank

#25

Coding Rank

#32

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

125k

250k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights