ApX 标志ApX 标志

趋近智

CroissantLLM Base

参数

1.3B

上下文长度

2.048K

模态

Text

架构

Dense

许可证

Apache-2.0

发布日期

29 Feb 2024

训练数据截止日期

Nov 2023

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

2048

层数

24

注意力头

16

键值头

16

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

CroissantLLM Base

CroissantLLM Base is a 1.3 billion parameter decoder-only transformer model designed to provide balanced bilingual proficiency in French and English. Unlike many contemporary large language models that treat non-English languages as secondary through minor data inclusion, CroissantLLM was pre-trained using a strictly balanced 1:1 ratio of French and English data. This architectural choice aims to mitigate linguistic bias and ensure that French cultural and technical knowledge is represented with the same fidelity as English. The model was trained on 3 trillion tokens, a substantial corpus that exceeds the training volume of many larger open-source models in its class.

Technically, the model is built upon the Llama architecture, incorporating established components such as Rotary Positional Encodings (RoPE) and RMSNorm to stabilize deep network activations. To optimize for the bilingual use case, the developers introduced a custom SentencePiece-based tokenizer trained on a high-quality mix of French, English, and code data. This tokenizer achieves significantly lower fertility rates for French text compared to standard multilingual tokenizers, improving both computational efficiency and the model's ability to capture linguistic nuances. The architecture features 24 layers with a hidden dimension of 2048 and 16 attention heads, following a dense structure without the use of mixture-of-experts.

CroissantLLM Base is engineered for high performance on consumer-grade hardware, making it suitable for deployment on local devices such as personal computers and mobile systems. Its training history is highly transparent, with the researchers releasing extensive details on the pre-training data and providing access to checkpoints throughout the training process. The model serves as a foundation for various downstream tasks, particularly translation and content generation in French-centric environments, where its specialized vocabulary and balanced training provide a distinct advantage over models trained on predominantly English-centric datasets.

关于 CroissantLLM

CroissantLLM is a bilingual French-English language model developed by French research institutions. The model is trained on a curated mix of French and English data to provide language understanding while preserving French linguistic heritage. It is designed for low-resource inference on consumer-grade hardware.


其他 CroissantLLM 模型
  • 没有相关模型

评估基准

没有可用的 CroissantLLM Base 评估基准。

排名

排名

-

编程排名

-

模型透明度

总分

B+

82 / 100

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
1k
2k

所需显存:

推荐 GPU