ApX logoApX logo

Command A

Parameters

111B

Context Length

256K

Modality

Text

Architecture

Dense

License

CC-BY-NC

Release Date

13 Mar 2025

Knowledge Cutoff

Jun 2024

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

-

Number of Layers

-

Attention Heads

-

Key-Value Heads

-

Activation Function

SwigLU

Normalization

-

Position Embedding

Absolute Position Embedding

Command A

Cohere Command A is a large-scale generative model engineered for high-performance enterprise workflows, particularly those involving tool use, agents, and retrieval-augmented generation (RAG). Developed to provide a high-throughput alternative for production environments, the model maintains a significant parameter count of 111 billion while optimizing for deployment on common dual-GPU hardware configurations. Its design focuses on business-critical accuracy and speed, supporting a standard context window of 256,000 tokens to facilitate the processing of extensive corporate documentation and long-form conversational histories.

The model's architecture is a decoder-only transformer that utilizes several sophisticated structural innovations to balance local and global context. It features a hybrid attention mechanism where three-quarters of the layers employ sliding window attention for efficient local modeling, while every fourth layer uses a full global attention mechanism to maintain long-range dependencies. Technical specifications include the use of Grouped Query Attention (GQA) for optimized inference throughput, SwiGLU activation functions for improved gradient flow, and the omission of bias terms to stabilize the training process. Positional information is handled via Rotary Positional Embeddings (RoPE) in local attention layers, whereas global layers utilize a position-agnostic approach.

Optimized for global enterprise deployment, Command A is trained across 23 languages, including major business languages such as English, French, Spanish, Chinese, and Arabic. The model is specifically aligned for conversational tool use, allowing it to interact with external APIs, databases, and search engines with high precision. This alignment, achieved through supervised fine-tuning and preference optimization, makes it particularly effective for multi-step agentic reasoning and financial data manipulation where extracting numerical details from complex contexts is required.

About Command


Other Command Models

Evaluation Benchmarks

Rank

#106

BenchmarkScoreRank

0.86

6

0.12

9

Rankings

Overall Rank

#106

Coding Rank

#90

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
125k
250k

VRAM Required:

Recommended GPUs

Command A: Specifications and GPU VRAM Requirements