趋近智
参数
1.3B
上下文长度
2.048K
模态
Text
架构
Dense
许可证
Apache-2.0
发布日期
29 Feb 2024
训练数据截止日期
Nov 2023
注意力结构
Multi-Head Attention
隐藏维度大小
2048
层数
24
注意力头
16
键值头
16
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
Absolute Position Embedding
CroissantLLM Base is a 1.3 billion parameter decoder-only transformer model designed to provide balanced bilingual proficiency in French and English. Unlike many contemporary large language models that treat non-English languages as secondary through minor data inclusion, CroissantLLM was pre-trained using a strictly balanced 1:1 ratio of French and English data. This architectural choice aims to mitigate linguistic bias and ensure that French cultural and technical knowledge is represented with the same fidelity as English. The model was trained on 3 trillion tokens, a substantial corpus that exceeds the training volume of many larger open-source models in its class.
Technically, the model is built upon the Llama architecture, incorporating established components such as Rotary Positional Encodings (RoPE) and RMSNorm to stabilize deep network activations. To optimize for the bilingual use case, the developers introduced a custom SentencePiece-based tokenizer trained on a high-quality mix of French, English, and code data. This tokenizer achieves significantly lower fertility rates for French text compared to standard multilingual tokenizers, improving both computational efficiency and the model's ability to capture linguistic nuances. The architecture features 24 layers with a hidden dimension of 2048 and 16 attention heads, following a dense structure without the use of mixture-of-experts.
CroissantLLM Base is engineered for high performance on consumer-grade hardware, making it suitable for deployment on local devices such as personal computers and mobile systems. Its training history is highly transparent, with the researchers releasing extensive details on the pre-training data and providing access to checkpoints throughout the training process. The model serves as a foundation for various downstream tasks, particularly translation and content generation in French-centric environments, where its specialized vocabulary and balanced training provide a distinct advantage over models trained on predominantly English-centric datasets.
CroissantLLM is a bilingual French-English language model developed by French research institutions. The model is trained on a curated mix of French and English data to provide language understanding while preserving French linguistic heritage. It is designed for low-resource inference on consumer-grade hardware.
没有可用的 CroissantLLM Base 评估基准。