Claude Sonnet 4.6

Closed Source

Closed Weights

Parameters

Context Length

1,000K

Modality

Multimodal

Architecture

Dense

License

Proprietary

Release Date

17 Feb 2026

Knowledge Cutoff

Aug 2025

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

Absolute Position Embedding

Claude Sonnet 4.6

Claude Sonnet 4.6 is a multimodal foundation model engineered for high-performance agentic workflows, complex software engineering, and large-scale document analysis. As a central component of the Claude 4 model family, it utilizes a dense transformer architecture optimized for balancing computational efficiency with high-order reasoning capabilities. The model is specifically designed to function as a versatile workhorse for enterprise automation, supporting advanced tasks such as autonomous navigation of graphical user interfaces and multi-step agentic planning.

Technically, the model introduces several architectural innovations, including a beta 1-million-token context window that enables the processing of extensive codebases and multi-document datasets in a single inference pass. It features a hybrid reasoning framework that supports both adaptive thinking and extended thinking modes, allowing the model to dynamically allocate internal processing tokens for complex problem-solving. Furthermore, the inclusion of context compaction technology facilitates the efficient management of long-running conversations by summarizing historical context as it approaches architectural limits.

Performance is characterized by significant advancements in computer use, where the model demonstrates human-level proficiency in interacting with standard software environments, including web browsers and spreadsheets. It is highly optimized for the software development lifecycle, providing precise instruction following and a reduction in the common pitfalls of overengineering or output latency. The model is deployed via the Anthropic API and major cloud platforms, offering a scalable solution for developers requiring frontier-level intelligence for high-volume production applications.

About Claude 4

Anthropic's fourth generation Claude models with advanced reasoning, extended context windows up to 200K tokens, and configurable thinking effort levels. Features improved safety alignment, nuanced understanding, and sophisticated task completion. Includes Opus (most capable), Sonnet (balanced), and Haiku (fast) variants, with thinking modes that enable transparent chain-of-thought reasoning for complex problems.

Other Claude 4 Models

Evaluation Benchmarks

Rank

#20

Benchmark	Score	Rank
StackUnseen ProLLM Stack Unseen	0.89	🥈 2
Web Development WebDev Arena	1523	🥈 2
Data Analysis LiveBench Data Analysis	0.78	⭐ 6
Professional Knowledge MMLU Pro	0.87	⭐ 8
Graduate-Level QA GPQA	0.75	34

Rankings

Overall Rank

#20

Coding Rank

#2 🥈

Model Transparency

Total Score

D+

41 / 100

Upstream

13.0 / 30

Model

16.0 / 40

Downstream

12.0 / 30

Claude Sonnet 4.6 Transparency Report

Total Score

/ 100

D+

Audit Note

Claude Sonnet 4.6 is a highly capable but technically opaque model, characterized by a 'black box' approach to its internal architecture and training data. While it excels in functional transparency—providing clear pricing, versioning, and identity consistency—it fails to disclose critical technical details such as parameter counts, dataset composition, or training compute. This reliance on proprietary secrecy limits the ability of the research community to verify its safety and performance claims independently.

Upstream

13.0 / 30

Architectural Provenance

5.0 / 10

Anthropic identifies Claude Sonnet 4.6 as a 'dense transformer' architecture, a departure from the sparse or MoE (Mixture of Experts) trends in other frontier models. While the model is explicitly named and its hybrid reasoning capabilities (standard vs. extended thinking) are described in marketing materials, there is no public technical paper or detailed documentation regarding the specific layer count, attention mechanisms, or the exact pre-training methodology. The 'architectural innovations' mentioned, such as context compaction and interleaved thinking, are described functionally rather than technically, leaving the underlying implementation opaque.

Dataset Composition

2.0 / 10

Information regarding the training data for Claude 4.6 is extremely limited. Anthropic uses vague descriptors like 'diverse internet data' and 'carefully curated datasets' without providing a percentage breakdown of sources (e.g., code, web, academic). While a knowledge cutoff of March 2025 is stated, no specific information on data filtering, cleaning processes, or the ratio of synthetic to human-generated data is publicly available. This lack of disclosure makes it impossible to verify the representativeness or quality of the training set.

Tokenizer Integrity

6.0 / 10

The tokenizer is accessible via the Anthropic API and third-party tools (e.g., claudetokenizer.com), allowing for some verification of token counts and vocabulary behavior. However, Anthropic does not provide a formal technical specification of the tokenizer's training data alignment or a public repository for the tokenizer's source code. While the 1-million-token context window is a documented feature, the underlying tokenization approach and normalization rules remain proprietary and largely undocumented.

Model

16.0 / 40

Parameter Density

1.0 / 10

Anthropic maintains a strict policy of non-disclosure regarding parameter counts. While third-party analysts speculate the model is in the 'tens of billions' range, there is no official confirmation of total or active parameters. The claim of being a 'dense' architecture provides a structural hint, but without a specific count or architectural breakdown (e.g., FFN vs. attention parameters), the model fails to meet basic transparency standards for density.

Training Compute

2.0 / 10

No specific data regarding GPU/TPU hours, hardware clusters, or total compute expenditure (FLOPs) has been released for Claude 4.6. While Anthropic's System Card mentions a 'carbon footprint' section, it typically provides high-level estimates of operational emissions rather than the detailed training compute metrics required for a high score. The environmental impact of the training phase remains largely opaque and unverified by third parties.

Benchmark Reproducibility

4.0 / 10

Anthropic provides scores for standard benchmarks like SWE-bench Verified (72.7%) and GPQA Diamond, often including brief methodology notes in appendices (e.g., top_p settings, use of bash tools). However, they do not release the full evaluation code, exact prompt sets, or the specific few-shot examples used to achieve these results. The lack of a reproducible evaluation harness means third parties must rely on Anthropic's reported figures, which are often achieved using specialized internal frameworks like 'Claude Code'.

Identity Consistency

9.0 / 10

Claude 4.6 demonstrates high identity consistency, correctly identifying its version and provider in most interactions. It is transparent about its status as an AI and generally adheres to its defined capabilities and limitations. There are no widespread reports of the model claiming to be a competitor's product or exhibiting significant identity confusion, though it occasionally relies on system-prompted identity rather than intrinsic architectural awareness.

Downstream

12.0 / 30

License Clarity

5.0 / 10

The model is governed by a proprietary license with distinct terms for consumer and commercial use. While the pricing for API access is clearly stated ($3/$15 per million tokens), the legal terms are complex. Commercial users are granted ownership of outputs, but consumer terms for free users have historically included 'non-commercial' restrictions that create ambiguity for developers. The lack of an open-source or open-weights option limits its transparency compared to permissive licenses like Apache 2.0.

Hardware Footprint

2.0 / 10

As a closed-source API-only model, there is virtually no information on the hardware requirements for local deployment or the VRAM footprint of the model at various quantization levels. While Anthropic provides API latency and throughput stats, they do not disclose the infrastructure required to serve the 1M token context window. Users have no visibility into the efficiency-accuracy tradeoffs of the underlying serving architecture.

Versioning Drift

5.0 / 10

Anthropic uses a clear naming convention (e.g., claude-sonnet-4-6-20260217) and maintains a public changelog for major releases. However, the model is subject to 'silent' updates and optimizations that can lead to behavioral drift without a change in the version string. While they provide deprecation notices for older models, the lack of detailed technical changelogs for sub-version updates makes it difficult for developers to track subtle performance shifts over time.

Resources

Official Documentation Release Notes