Grok 3

Closed Source

Closed Weights

Parameters

Context Length

256K

Modality

Multimodal

Architecture

Dense

License

Proprietary

Release Date

15 Oct 2025

Knowledge Cutoff

Evaluation Benchmarks

Rank

#62

Benchmark	Score	Rank
QA Assistant ProLLM QA Assistant	0.967	4
Summarization ProLLM Summarization	0.867	8
Graduate-Level QA GPQA	0.846	12
Data Analysis LiveBench Data Analysis	0.63	17
Coding Aider Coding	0.53	24
StackUnseen ProLLM Stack Unseen	0.293	31
Professional Knowledge MMLU Pro	0.80	36

Rankings

Overall Rank

#62

Coding Rank

#120

About Grok 3

Grok 3 represents xAI's advanced reasoning model trained on the Colossus supercomputer. Features real-time information integration from X platform, providing up-to-date knowledge and context. Excels at reasoning, coding, and creative tasks with xAI's distinctive direct and witty personality. Offers cutting-edge capabilities in information synthesis and analysis. Includes multimodal understanding and strong performance on technical benchmarks.

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

Position Embedding

Absolute Position Embedding

RoPE Theta

Sliding Window Attention

Sliding Window Size

Sliding Window Ratio

Linear Attention

Linear Attention Ratio

Normalization

Activation Function

Dimensions

Hidden Dimension Size

Number of Layers

FFN Intermediate Size (Dense)

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

Model Integrity

Total Score

D+

44 / 100

Upstream

12.0 / 30

Model

21.0 / 40

Downstream

11.0 / 30

Grok 3 Model Integrity Report

Total Score

/ 100

D+

Audit Note

Grok 3 exhibits a transparency profile typical of high-end proprietary models, characterized by significant disclosures regarding massive compute infrastructure but opacity in architectural and data specifics. While the model's identity and hardware scale are well-documented, the lack of reproducible evaluation data and the absence of a technical paper represent major barriers to independent verification. The transition from open-source roots to a fully proprietary model has resulted in a significant decrease in overall transparency for the Grok family.

Upstream

12.0 / 30

Architectural Provenance

5.0 / 10

Grok 3 is publicly identified as a Mixture-of-Experts (MoE) model with a reported 1.2 trillion total parameters and 128 expert networks. While xAI has shared high-level architectural details such as the use of 'cross-expert attention gates' and a 'Top-2 gating mechanism,' there is no formal technical paper or comprehensive documentation detailing the specific layer configurations, attention head dimensions, or the exact pretraining methodology beyond general 'staggered curriculum learning' phases.

Dataset Composition

3.0 / 10

Information regarding training data is limited to vague categories and estimated proportions from third-party reports (e.g., 41% web, 32% scientific literature, 27% dialogue). xAI has not released a detailed breakdown of data sources, filtering criteria, or specific datasets used. While real-time integration with the X platform is a core feature, the methodology for incorporating this data into the model's training or inference pipeline remains proprietary and undocumented.

Tokenizer Integrity

4.0 / 10

The tokenizer for Grok 3 is not publicly available for independent inspection or download. While the context window is stated at 1 million tokens and some API documentation provides general tokenization estimates (e.g., ~4 characters per token), the specific vocabulary size, tokenization algorithm (e.g., BPE vs. SentencePiece), and normalization techniques are not officially documented or verifiable.

Model

21.0 / 40

Parameter Density

4.0 / 10

While a total parameter count of 1.2 trillion has been widely cited in technical analyses, xAI has not officially confirmed the exact number of active parameters during inference. Third-party reports suggest an '83% parameter activation efficiency' or the use of 2-of-64 experts, but these claims lack official verification in public documentation. The lack of clarity on dense vs. sparse active counts for a model of this scale is a significant transparency gap.

Training Compute

6.0 / 10

xAI has been relatively transparent about the hardware used, specifically citing the 'Colossus' supercomputer cluster with 100,000 to 200,000 NVIDIA H100 GPUs. Training duration (approx. 80-122 days) and total compute (200 million GPU hours) have been publicly stated by leadership. However, detailed environmental impact reports, precise carbon footprint calculations, and verified energy consumption metrics are missing.

Benchmark Reproducibility

3.0 / 10

Benchmark results (MMLU, AIME, GPQA) are provided in marketing materials, but xAI has not released the evaluation code, specific prompts, or few-shot examples required for independent reproduction. Discrepancies in reporting (e.g., omitting consensus metrics when comparing to competitors) have been noted by the research community, and the lack of a standardized evaluation framework makes official claims difficult to verify.

Identity Consistency

8.0 / 10

Grok 3 consistently identifies itself as an AI developed by xAI and maintains a distinct 'witty' personality as advertised. It generally shows awareness of its versioning and capabilities, including its 'Think' and 'DeepSearch' modes. There are no widespread reports of the model claiming to be a competitor's product, though its 'truth-seeking' claims are occasionally at odds with its internal safety guardrails.

Downstream

11.0 / 30

License Clarity

3.0 / 10

Grok 3 is governed by a strictly proprietary license. Unlike Grok-1, which was released under Apache 2.0, Grok 3 offers no public access to weights or source code. The terms of service for the API and web interface are standard for proprietary models but lack the transparency of open-weights alternatives. Commercial use is permitted via API, but derivative works and weight modification are prohibited.

Hardware Footprint

4.0 / 10

Official hardware requirements for local deployment are non-existent because the model is not available for local use. API documentation provides some guidance on context-length limitations (128k to 1M tokens) and latency, but there is no public data on VRAM requirements for different quantization levels (FP16/Q4/Q8) or the accuracy trade-offs associated with them.

Versioning Drift

4.0 / 10

xAI maintains a basic changelog for its API and platform, but it lacks the technical depth of semantic versioning. Updates are often announced via social media or brief blog posts rather than detailed technical release notes. There is no public mechanism to access or pin specific previous versions of the model weights to mitigate silent performance drift or behavior changes.

Resources

Official Documentation

About Grok 3

xAI's Grok 3 series models trained on the massive Colossus supercomputer cluster. Features real-time information integration from X platform, advanced reasoning capabilities, and distinctive personality. Offers cutting-edge capabilities in reasoning and information synthesis.

Other Grok 3 Models

No related models available