--- license: apache-2.0 pipeline_tag: sentence-similarity language: - fr tags: - embeddings - french - feature-extraction - bfloat16 - sentence-similarity - text-embeddings ---
Evaluation task Embeddings-Francais-BF16-BASE-50M Test-Train-Avant-Main-Train
SICKFr 0.519713 0.699325
SyntecReranking 0.313680 0.328360
SummEvalFr 0.306903 0.305028
AlloProfClusteringS2S 0.213383 0.209503
SyntecRetrieval 0.051370 0.123900
HALClusteringS2S Failed 0.042094
Hyperparameter Embeddings-Francais-BF16-BASE-50M Test-Train-Avant-Main-Train
Training tokens seen 2.46B 61.44M + SFT
Parameters 169,896,960 21,240,576
Context length 4096 4096
Embedding dimension 1536 384
Vocabulary size 32768 32768
Layers 4 4
Heads 12 4
Head dimension 128 96
Precision bfloat16 bfloat16
Attention backend SageAttention SageAttention
Pooling Mean pooling Mean pooling
Normalization L2 normalize L2 normalize