趋近智
注意力结构
Multi-Head Attention
隐藏维度大小
-
层数
-
注意力头
-
键值头
-
激活函数
SwigLU
归一化
-
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
Cohere Command A is a large language model specifically engineered for enterprise applications that demand high performance, security, and computational efficiency. This model is designed to excel in business-critical tasks such as tool use, retrieval augmented generation (RAG), agentic workflows, and multilingual use cases. It demonstrates notable efficiency, capable of running on minimal GPU configurations, thereby reducing the computational overhead for private deployments. Command A is trained to perform effectively across 23 languages, ensuring its applicability in diverse global business environments.
The architectural foundation of Command A is an optimized decoder-only transformer. This architecture incorporates interleaved attention mechanisms, combining three layers of sliding window attention with Rotary Positional Embeddings (RoPE) for efficient local context modeling. A fourth layer employs global attention without positional embeddings, allowing for unrestricted token interactions across extended sequences. Further architectural innovations include grouped-query attention to enhance throughput, shared input and output embeddings to conserve memory, and the omission of bias terms for training stabilization. The model utilizes SwiGLU activation functions.
Command A is optimized for throughput and long-context reasoning. It supports a context length of 256,000 tokens, which enables it to process extensive documents for various enterprise applications. The model is also designed for conversational interactions and is capable of generating responses in a chatty style, optionally using markdown for clarity. It is particularly adept at extracting and manipulating numerical information in financial settings and is trained for conversational tool use, allowing it to interact with external systems such as APIs and databases.
排名适用于本地LLM。
排名
#25
基准 | 分数 | 排名 |
---|---|---|
Summarization ProLLM Summarization | 0.86 | 🥈 2 |
General Knowledge MMLU | 0.81 | 🥉 3 |
StackUnseen ProLLM Stack Unseen | 0.23 | 10 |
Agentic Coding LiveBench Agentic | 0.05 | 14 |
Coding LiveBench Coding | 0.54 | 15 |
Reasoning LiveBench Reasoning | 0.36 | 19 |
Mathematics LiveBench Mathematics | 0.46 | 20 |
Data Analysis LiveBench Data Analysis | 0.50 | 21 |