ApX logoApX logo

Typhoon-2-70B

Parameters

70B

Context Length

128K

Modality

Text

Architecture

Dense

License

Apache-2.0

Release Date

1 Jun 2024

Knowledge Cutoff

Dec 2023

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

64

Key-Value Heads

8

Attention Head Dimension

-

Position Embedding

Absolute Position Embedding

RoPE Theta

-

Sliding Window Attention

-

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

8,192

Number of Layers

80

FFN Intermediate Size (Dense)

-

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

-

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 8.2k · Context: 128kx 80 layersRMSNormPre-AttentionMulti-Head Attention64Q / 8KV headsHead dim: 128+RMSNormPre-FFNFeed-Forward NetworkSwiGLU+Final RMSNormOutput Logits

Typhoon-2-70B

Typhoon-2-70B is a high-capacity Thai-English large language model developed by SCB 10X, specifically architected to address the linguistic complexities of the Thai language. Built upon the Llama 3.1 70B backbone, this model undergoes extensive continual pre-training on a curated corpus of over 5 billion high-quality Thai tokens. This training process is designed to align the model with Thai cultural nuances and linguistic structures while preserving the original English reasoning capabilities of the underlying architecture. The resulting model serves as a foundation for enterprise-level applications requiring high precision in bilingual contexts.

The technical architecture employs a dense, decoder-only transformer structure with Grouped-Query Attention (GQA) to optimize inference efficiency and memory throughput. It utilizes a 128K token context window, enabling the processing of lengthy legal documents, technical manuals, and multi-turn conversational histories. The model integrates advanced post-training techniques, including supervised fine-tuning (SFT) and Direct Preference Optimization (DPO), to enhance its instruction-following accuracy and function-calling capabilities. These optimizations allow the model to interact with external tools and APIs, facilitating complex agentic workflows.

Released under the Llama 3.1 Community License, Typhoon-2-70B provides a transparent path for developers to integrate sovereign AI capabilities into production environments. Its design emphasizes performance in specialized Thai domains such as legal reasoning, cultural content generation, and sophisticated data analysis. By bridging the gap between English-centric foundation models and local language requirements, Typhoon-2-70B enables the development of localized AI solutions that maintain parity with global standards of reasoning and accuracy.

About Typhoon

Typhoon is a Thai language model family developed by SCB 10X. It is specifically optimized for the Thai language, addressing complexities such as the lack of word delimiters and tonal nuances. The models are trained on Thai-centric datasets including legal, cultural, and historical documents to ensure localized context and knowledge.


Other Typhoon Models

Evaluation Benchmarks

No evaluation benchmarks for Typhoon-2-70B available.

Rankings

Overall Rank

-

Coding Rank

-

Model Integrity

Total Score

B

65 / 100

Typhoon-2-70B Model Integrity Report

Total Score

65

/ 100

B

Audit Note

Typhoon-2-70B exhibits a strong transparency profile regarding its architectural foundation and its specialized focus on the Thai language, supported by a comprehensive technical report. Its primary strengths lie in its clear identity and documented design choices, such as tokenizer selection. However, it lacks transparency in training compute metrics and provides only moderate detail on dataset composition and hardware-specific performance tradeoffs.

Upstream

21.0 / 30

Architectural Provenance

7.5 / 10

The model is explicitly identified as a continual pre-training (CPT) and instruction-tuned version of the Llama 3.1 70B architecture. The technical report (arXiv:2412.13702) provides a detailed description of the training methodology, including the use of Grouped-Query Attention (GQA) and the transition from Typhoon 1.5. It documents the decision not to extend the tokenizer vocabulary based on recent research regarding performance degradation, which is a high level of architectural transparency. However, specific layer-by-layer modifications or exact hyperparameter configurations for the 70B variant's pre-training phase are less granular than the 8B variant.

Dataset Composition

5.5 / 10

SCB 10X provides a moderate breakdown of the training data, disclosing a mixture of 50% English and 50% Thai data to mitigate catastrophic forgetting. The Thai portion includes 5 billion high-quality tokens curated from Common Crawl (40 packs), synthetic textbooks, and culturally relevant documents. While the technical report describes the filtering (MinHash, LSH) and quality classification (human-in-the-loop) processes, the specific proportions of sub-categories (e.g., legal vs. medical vs. web) within the Thai corpus are not fully quantified, and the full dataset is not public.

Tokenizer Integrity

8.0 / 10

The model utilizes the standard Llama 3 tokenizer (128k vocabulary) without expansion. This is a transparent design choice documented in the technical report to maintain architectural integrity and performance. Tokenization efficiency for Thai is discussed relative to previous versions (Typhoon 1.0 used an augmented tokenizer), and the 128K context window support is verified in official documentation and Hugging Face model cards.

Model

25.0 / 40

Parameter Density

7.0 / 10

The model is clearly defined as a 70B dense decoder-only transformer. There is no ambiguity regarding active vs. total parameters as it is not an MoE model. The technical report confirms it inherits the Llama 3.1 70B structure. While it lacks a detailed breakdown of parameter allocation (e.g., exact percentage of weights in attention vs. FFN), the density and architecture are well-documented for a model of this scale.

Training Compute

3.0 / 10

Information regarding the specific compute resources used for the 70B model is vague. While the report mentions using Together AI's GPU clusters and bare-metal H100/A100 nodes, it does not disclose the total GPU hours, training duration, or the carbon footprint associated with the 70B variant. Most detailed compute metrics in the literature refer to the smaller 8B models or general infrastructure capabilities rather than the specific 70B training run.

Benchmark Reproducibility

6.0 / 10

The model is evaluated on standard benchmarks (MMLU, IFEval) and specialized Thai benchmarks (ThaiExam, M3Exam). The technical report provides detailed results and some methodology, including the use of a Thai-translated IFEval. However, the exact prompts and few-shot examples used for the 70B evaluation are not fully public in a reproducible code repository, and third-party verification is limited to leaderboard positions rather than independent audit reports.

Identity Consistency

9.0 / 10

The model demonstrates high identity consistency, correctly identifying itself as 'Typhoon' and acknowledging its Llama 3.1 foundation in system prompts and documentation. It is transparent about its bilingual focus and does not exhibit known issues of claiming to be a competitor model (e.g., GPT-4). Versioning is clearly maintained (Typhoon-2 vs. 1.5X).

Downstream

18.5 / 30

License Clarity

7.0 / 10

The model is released under the Llama 3.1 Community License, which is a custom license with specific commercial restrictions (e.g., the 700M monthly active user clause). While the license itself is clear and publicly accessible, it is not a standard OSI-approved open-source license (like Apache 2.0), which introduces some complexity for commercial users. The documentation clearly states the licensing terms on Hugging Face.

Hardware Footprint

6.5 / 10

Basic hardware requirements are provided, such as the need for at least 2x A100/H100 80GB GPUs for FP16 inference. Some guidance on quantization (FP8) is available via the opentyphoon.ai serving documentation. However, it lacks a comprehensive table of VRAM requirements across various quantization levels (Q4, Q5, Q8) and the associated accuracy tradeoffs, which is critical for downstream deployment on consumer or mid-range hardware.

Versioning Drift

5.0 / 10

SCB 10X uses a versioning system (Typhoon 1.0, 1.5, 1.5X, 2.0), but a detailed, public changelog for weight updates or minor revisions is not consistently maintained. While major releases are documented via blog posts and technical reports, there is limited information on how silent updates or safety alignment changes might affect model behavior over time, making it difficult for developers to track drift.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs

Typhoon-2-70B: Specifications and GPU VRAM Requirements