Claude Sonnet 4.5

Closed Source

Closed Weights

Parameters

Context Length

200K

Modality

Text

Architecture

Dense

License

Proprietary

Release Date

29 Sept 2025

Knowledge Cutoff

Jan 2025

Evaluation Benchmarks

Rank

#83

Benchmark	Score	Rank
Coding LiveBench Coding	0.76	16
StackUnseen ProLLM Stack Unseen	0.694	16
Graduate-Level QA GPQA	0.834	16
Coding Aider Coding	0.56	18
Agentic Coding LiveBench Agentic	0.48	21
General Text Text Arena	1454	22
Web Development WebDev Arena	1386	43
Data Analysis LiveBench Data Analysis	0.47	46
Mathematics LiveBench Mathematics	0.63	49
Reasoning LiveBench Reasoning	0.42	51

Rankings

Overall Rank

#83

Coding Rank

#47

About Claude Sonnet 4.5

Claude 4.5 Sonnet is a mid-tier frontier model engineered by Anthropic to deliver a refined equilibrium between high-order reasoning and operational efficiency. Designed as a production workhorse, it is specifically optimized for complex agentic workflows, large-scale software engineering, and sophisticated computer-use tasks. The model serves as a core component for autonomous systems, supporting long-running operations with a significant emphasis on reliability and instruction-following accuracy across diverse professional domains.

The underlying architecture utilizes a dense transformer-based framework that integrates a hybrid reasoning system. This system allows for two distinct modes of execution: a standard low-latency mode for rapid interaction and an extended thinking mode that exposes the model's internal reasoning process for more difficult problem-solving. It features a substantial 200,000-token context window for general availability, with a specialized 1-million-token beta capacity for handling massive datasets, entire codebases, or extensive research documentation. The implementation of absolute position embeddings and multi-head attention ensures stable performance over these long sequences.

Technically, the model introduces advanced capabilities such as parallel tool execution, which enables agents to perform multiple actions, such as executing several shell commands simultaneously, within a single turn. It is natively integrated with the Model Context Protocol (MCP) and supports specific developer tools like checkpoints for state management and context editing for precise memory control. These features make it particularly suitable for enterprise-grade applications in finance, law, and cybersecurity, where sustained focus and deep domain knowledge are required for multi-step, high-stakes tasks.

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

Position Embedding

Absolute Position Embedding

RoPE Theta

Sliding Window Attention

Sliding Window Size

Sliding Window Ratio

Linear Attention

Linear Attention Ratio

Normalization

Activation Function

Dimensions

Hidden Dimension Size

Number of Layers

FFN Intermediate Size (Dense)

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

Model Integrity

Total Score

38 / 100

Upstream

10.0 / 30

Model

15.0 / 40

Downstream

13.0 / 30

Claude Sonnet 4.5 Model Integrity Report

Total Score

/ 100

Audit Note

Claude 4.5 Sonnet exhibits a high degree of operational transparency regarding its identity and API capabilities, but remains largely opaque concerning its internal architecture and training provenance. While benchmark performance is well-documented, the lack of data on training compute, parameter counts, and dataset composition reflects a 'black box' approach typical of frontier proprietary models.

Upstream

10.0 / 30

Architectural Provenance

3.0 / 10

Anthropic identifies Claude 4.5 Sonnet as a 'dense transformer-based' model utilizing a 'hybrid reasoning system' with standard and extended thinking modes. However, no technical paper or detailed architectural documentation has been released. Specifics regarding layer counts, attention mechanisms (beyond a mention of multi-head attention), or the exact nature of the hybrid reasoning implementation remain proprietary and undisclosed.

Dataset Composition

2.0 / 10

The model's training data is described vaguely as a 'proprietary mix' of public internet data (up to July 2025), non-public third-party data, and user-provided data. While the System Card mentions general cleaning methods like deduplication, it provides no specific breakdown of dataset proportions (e.g., code vs. web), naming of specific sources, or verifiable details on the filtering criteria used.

Tokenizer Integrity

5.0 / 10

While a 'Claude Tokenizer' is publicly accessible via web tools and APIs for token counting, official technical documentation detailing the vocabulary size, specific tokenization algorithm (e.g., BPE), or training data alignment for the 4.5 series is absent. Users can verify token counts through the API, but the underlying technical specifications are not fully transparent.

Model

15.0 / 40

Parameter Density

1.0 / 10

Anthropic does not disclose the parameter count for Claude 4.5 Sonnet. While the model is described as 'dense' to distinguish it from sparse or MoE architectures, there is no verifiable information regarding total or active parameters, nor any architectural breakdown of parameter distribution across model components.

Training Compute

1.0 / 10

No information is provided regarding the compute resources used to train the model. There are no public disclosures of GPU/TPU hours, hardware specifications, training duration, or the total carbon footprint associated with the training phase. Environmental data is limited to third-party inference estimates rather than official training reports.

Benchmark Reproducibility

4.0 / 10

Anthropic provides scores for several public benchmarks (SWE-bench Verified, OSWorld, GPQA) and some details on evaluation settings (e.g., 100 max steps for OSWorld). However, the full evaluation code, exact prompts, and few-shot examples required for independent reproduction are not publicly available, and some results rely on 'internal benchmarks' or specific 'prompt addendums' that are not fully disclosed.

Identity Consistency

9.0 / 10

The model demonstrates high identity consistency, correctly identifying itself as Claude 4.5 Sonnet and maintaining awareness of its versioning. It provides clear information about its capabilities, such as the extended thinking mode and context window limits, and does not exhibit confusion with competitor models in official documentation or API responses.

Downstream

13.0 / 30

License Clarity

6.0 / 10

The model is governed by a clear but strictly proprietary license. Commercial terms are defined for API and enterprise users, while consumer terms apply to Pro/Max users. While the terms are accessible, the lack of an open-source or open-weights option and the presence of restrictive usage caps on 'flat-rate' plans create some complexity for users regarding derivative works and commercial scaling.

Hardware Footprint

2.0 / 10

As a closed-source API-based model, there is no documentation regarding the hardware required to run the model locally (VRAM, quantization tradeoffs, etc.). Guidance is limited to API-side constraints like context window limits (200k/1M) and output token maximums (64k), which do not provide transparency into the model's actual computational requirements.

Versioning Drift

5.0 / 10

Anthropic uses specific model strings (e.g., claude-sonnet-4-5-20250929) and maintains a changelog for associated tools like Claude Code. However, the model weights themselves are subject to silent updates and 'behavioral improvements' (e.g., alignment tuning) that are not always accompanied by new version numbers, making it difficult for developers to track or mitigate performance drift over time.

Resources

Official Documentation Release Notes

About Claude 4.5

Enhanced Claude models with further improvements in reasoning, coding, and agentic capabilities. Features advanced thinking modes with adjustable effort levels (high, medium, standard) for optimal performance-latency tradeoffs. Excels at complex analysis, software development, web development, and long-context understanding. Includes thinking variants that expose reasoning process for improved transparency.

Claude Sonnet 4.5

Evaluation Benchmarks

Rankings

About Claude Sonnet 4.5

Technical Specifications

Model Integrity

Claude Sonnet 4.5 Model Integrity Report

Audit Note

Upstream

Model

Downstream

Resources

About Claude 4.5

Other Claude 4.5 Models