ApX logo

Command A

Parameters

111B

Context Length

256K

Modality

Text

Architecture

Dense

License

CC-BY-NC

Release Date

13 Mar 2025

Knowledge Cutoff

-

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

-

Number of Layers

-

Attention Heads

-

Key-Value Heads

-

Activation Function

SwigLU

Normalization

-

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

Command A

Cohere Command A is a large language model specifically engineered for enterprise applications that demand high performance, security, and computational efficiency. This model is designed to excel in business-critical tasks such as tool use, retrieval augmented generation (RAG), agentic workflows, and multilingual use cases. It demonstrates notable efficiency, capable of running on minimal GPU configurations, thereby reducing the computational overhead for private deployments. Command A is trained to perform effectively across 23 languages, ensuring its applicability in diverse global business environments.

The architectural foundation of Command A is an optimized decoder-only transformer. This architecture incorporates interleaved attention mechanisms, combining three layers of sliding window attention with Rotary Positional Embeddings (RoPE) for efficient local context modeling. A fourth layer employs global attention without positional embeddings, allowing for unrestricted token interactions across extended sequences. Further architectural innovations include grouped-query attention to enhance throughput, shared input and output embeddings to conserve memory, and the omission of bias terms for training stabilization. The model utilizes SwiGLU activation functions.

Command A is optimized for throughput and long-context reasoning. It supports a context length of 256,000 tokens, which enables it to process extensive documents for various enterprise applications. The model is also designed for conversational interactions and is capable of generating responses in a chatty style, optionally using markdown for clarity. It is particularly adept at extracting and manipulating numerical information in financial settings and is trained for conversational tool use, allowing it to interact with external systems such as APIs and databases.

About Command


Other Command Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#25

BenchmarkScoreRank

0.86

🥈

2

General Knowledge

MMLU

0.81

🥉

3

0.23

10

Agentic Coding

LiveBench Agentic

0.05

14

0.54

15

0.36

19

0.46

20

0.50

21

Rankings

Overall Rank

#25

Coding Rank

#32

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
125k
250k

VRAM Required:

Recommended GPUs