Parameters
70B
Context Length
128K
Modality
Text
Architecture
Dense
License
Apache-2.0
Release Date
1 Jun 2024
Knowledge Cutoff
Dec 2023
Attention
Attention Structure
Multi-Head Attention
Attention Heads
64
Key-Value Heads
8
Attention Head Dimension
-
Position Embedding
Absolute Position Embedding
RoPE Theta
-
Sliding Window Attention
-
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
8,192
Number of Layers
80
FFN Intermediate Size (Dense)
-
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
-
Typhoon-2-70B is a high-capacity Thai-English large language model developed by SCB 10X, specifically architected to address the linguistic complexities of the Thai language. Built upon the Llama 3.1 70B backbone, this model undergoes extensive continual pre-training on a curated corpus of over 5 billion high-quality Thai tokens. This training process is designed to align the model with Thai cultural nuances and linguistic structures while preserving the original English reasoning capabilities of the underlying architecture. The resulting model serves as a foundation for enterprise-level applications requiring high precision in bilingual contexts.
The technical architecture employs a dense, decoder-only transformer structure with Grouped-Query Attention (GQA) to optimize inference efficiency and memory throughput. It utilizes a 128K token context window, enabling the processing of lengthy legal documents, technical manuals, and multi-turn conversational histories. The model integrates advanced post-training techniques, including supervised fine-tuning (SFT) and Direct Preference Optimization (DPO), to enhance its instruction-following accuracy and function-calling capabilities. These optimizations allow the model to interact with external tools and APIs, facilitating complex agentic workflows.
Released under the Llama 3.1 Community License, Typhoon-2-70B provides a transparent path for developers to integrate sovereign AI capabilities into production environments. Its design emphasizes performance in specialized Thai domains such as legal reasoning, cultural content generation, and sophisticated data analysis. By bridging the gap between English-centric foundation models and local language requirements, Typhoon-2-70B enables the development of localized AI solutions that maintain parity with global standards of reasoning and accuracy.
Typhoon is a Thai language model family developed by SCB 10X. It is specifically optimized for the Thai language, addressing complexities such as the lack of word delimiters and tonal nuances. The models are trained on Thai-centric datasets including legal, cultural, and historical documents to ensure localized context and knowledge.
No evaluation benchmarks for Typhoon-2-70B available.
Overall Rank
-
Coding Rank
-
Total Score
65
/ 100
Typhoon-2-70B exhibits a strong transparency profile regarding its architectural foundation and its specialized focus on the Thai language, supported by a comprehensive technical report. Its primary strengths lie in its clear identity and documented design choices, such as tokenizer selection. However, it lacks transparency in training compute metrics and provides only moderate detail on dataset composition and hardware-specific performance tradeoffs.
Architectural Provenance
The model is explicitly identified as a continual pre-training (CPT) and instruction-tuned version of the Llama 3.1 70B architecture. The technical report (arXiv:2412.13702) provides a detailed description of the training methodology, including the use of Grouped-Query Attention (GQA) and the transition from Typhoon 1.5. It documents the decision not to extend the tokenizer vocabulary based on recent research regarding performance degradation, which is a high level of architectural transparency. However, specific layer-by-layer modifications or exact hyperparameter configurations for the 70B variant's pre-training phase are less granular than the 8B variant.
Dataset Composition
SCB 10X provides a moderate breakdown of the training data, disclosing a mixture of 50% English and 50% Thai data to mitigate catastrophic forgetting. The Thai portion includes 5 billion high-quality tokens curated from Common Crawl (40 packs), synthetic textbooks, and culturally relevant documents. While the technical report describes the filtering (MinHash, LSH) and quality classification (human-in-the-loop) processes, the specific proportions of sub-categories (e.g., legal vs. medical vs. web) within the Thai corpus are not fully quantified, and the full dataset is not public.
Tokenizer Integrity
The model utilizes the standard Llama 3 tokenizer (128k vocabulary) without expansion. This is a transparent design choice documented in the technical report to maintain architectural integrity and performance. Tokenization efficiency for Thai is discussed relative to previous versions (Typhoon 1.0 used an augmented tokenizer), and the 128K context window support is verified in official documentation and Hugging Face model cards.
Parameter Density
The model is clearly defined as a 70B dense decoder-only transformer. There is no ambiguity regarding active vs. total parameters as it is not an MoE model. The technical report confirms it inherits the Llama 3.1 70B structure. While it lacks a detailed breakdown of parameter allocation (e.g., exact percentage of weights in attention vs. FFN), the density and architecture are well-documented for a model of this scale.
Training Compute
Information regarding the specific compute resources used for the 70B model is vague. While the report mentions using Together AI's GPU clusters and bare-metal H100/A100 nodes, it does not disclose the total GPU hours, training duration, or the carbon footprint associated with the 70B variant. Most detailed compute metrics in the literature refer to the smaller 8B models or general infrastructure capabilities rather than the specific 70B training run.
Benchmark Reproducibility
The model is evaluated on standard benchmarks (MMLU, IFEval) and specialized Thai benchmarks (ThaiExam, M3Exam). The technical report provides detailed results and some methodology, including the use of a Thai-translated IFEval. However, the exact prompts and few-shot examples used for the 70B evaluation are not fully public in a reproducible code repository, and third-party verification is limited to leaderboard positions rather than independent audit reports.
Identity Consistency
The model demonstrates high identity consistency, correctly identifying itself as 'Typhoon' and acknowledging its Llama 3.1 foundation in system prompts and documentation. It is transparent about its bilingual focus and does not exhibit known issues of claiming to be a competitor model (e.g., GPT-4). Versioning is clearly maintained (Typhoon-2 vs. 1.5X).
License Clarity
The model is released under the Llama 3.1 Community License, which is a custom license with specific commercial restrictions (e.g., the 700M monthly active user clause). While the license itself is clear and publicly accessible, it is not a standard OSI-approved open-source license (like Apache 2.0), which introduces some complexity for commercial users. The documentation clearly states the licensing terms on Hugging Face.
Hardware Footprint
Basic hardware requirements are provided, such as the need for at least 2x A100/H100 80GB GPUs for FP16 inference. Some guidance on quantization (FP8) is available via the opentyphoon.ai serving documentation. However, it lacks a comprehensive table of VRAM requirements across various quantization levels (Q4, Q5, Q8) and the associated accuracy tradeoffs, which is critical for downstream deployment on consumer or mid-range hardware.
Versioning Drift
SCB 10X uses a versioning system (Typhoon 1.0, 1.5, 1.5X, 2.0), but a detailed, public changelog for weight updates or minor revisions is not consistently maintained. While major releases are documented via blog posts and technical reports, there is limited information on how silent updates or safety alignment changes might affect model behavior over time, making it difficult for developers to track drift.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online