ApX logoApX logo

Command R

Parameters

35B

Context Length

128K

Modality

Text

Architecture

Dense

License

CC-BY-NC

Release Date

11 Mar 2024

Knowledge Cutoff

Jun 2024

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

8192

Number of Layers

40

Attention Heads

64

Key-Value Heads

8

Activation Function

SwigLU

Normalization

Layer Normalization

Position Embedding

Absolute Position Embedding

Command R

Cohere Command R is a generative language model architected specifically for high-performance enterprise workloads, with an emphasis on long-context processing and tool-augmented workflows. Built on an optimized decoder-only Transformer framework, the model utilizes Grouped Query Attention (GQA) to maintain a significant 128,000-token context window while reducing the memory overhead typically associated with large-scale attention mechanisms. It is designed to facilitate the transition from experimental prototypes to production-grade deployments by offering a balance between inference efficiency and high-fidelity output.

The model undergoes a multi-stage training process including extensive pre-training on a diverse multilingual corpus and subsequent alignment via supervised fine-tuning and preference optimization. A defining architectural feature is its native training for grounded generation, which allows the model to produce responses with precise inline citations from external document sources. This makes it particularly effective for retrieval-augmented generation (RAG) pipelines, where maintaining factual consistency and source traceability is a primary requirement. Furthermore, Command R supports sophisticated multi-step tool use, enabling it to act as an agent that can reason through complex tasks by interacting with external APIs, databases, and software tools.

Optimized for global business applications, Command R provides native support for 10 languages and is trained on 23 in total, ensuring versatility across international markets. The architecture incorporates advanced components such as Rotary Positional Embeddings (RoPE) and Layer Normalization to ensure stability and coherence when handling massive input sequences. By focusing on practical utility in tasks like document summarization, complex reasoning, and structured data analysis, Command R serves as a scalable backbone for automated enterprise systems and intelligent agentic workflows.

About Command


Other Command Models

Evaluation Benchmarks

Rank

#77

BenchmarkScoreRank

Web Development

WebDev Arena

1227

52

Rankings

Overall Rank

#77

Coding Rank

#76

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs

Command R: Specifications and GPU VRAM Requirements