趋近智
注意力结构
Multi-Head Attention
隐藏维度大小
-
层数
-
注意力头
-
键值头
-
激活函数
-
归一化
-
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
Typhoon-2-8B is an 8 billion parameter model optimized for Thai language processing. It features an expanded context length of 128,000 tokens and supports function calling. The model is trained to handle Thai cultural nuances and specific domains such as Thai law and local administration. Released under the Apache 2.0 license.
Typhoon is a Thai language model family developed by SCB 10X. It is specifically optimized for the Thai language, addressing complexities such as the lack of word delimiters and tonal nuances. The models are trained on Thai-centric datasets including legal, cultural, and historical documents to ensure localized context and knowledge.
没有可用的 Typhoon-2-8B 评估基准。