SmolLM3-3B-SFT-FR

SmolLM3-3B fine-tuned on the French split of ReasonXL via supervised fine-tuning (SFT), targeting in-language reasoning adaptation.


Model Description

This model is the result of Stage 1 of a two-stage reasoning adaptation pipeline. It is trained to shift the base model's reasoning language from English to French, by exposing it to a large corpus of French-language reasoning traces spanning math, science, code, and general domains.

Property Value
Base model HuggingFaceTB/SmolLM3-3B
Training stage SFT (Stage 1 only)
Target language French (fr)
Training data ReasonXL — FR split
Training tokens ~8.8B
Avg. sequence length 3,872 tokens

Training Data

The model is trained on the French split of ReasonXL, a multilingual cross-domain reasoning corpus of 2M samples per language (9B tokens). English source samples were translated using Qwen3-32B with a dedicated system prompt preserving technical terminology, mathematical notation, and reasoning structure.

Each sample consists of three independently translated components: the user input, the model's reasoning trace (within <think> tags), and the final output.

The English source data was annotated and filtered using Propella-1-4B across 18 properties (safety, quality, information density, educational value, domain), followed by class-aware downsampling for domain balance.

Source datasets included in ReasonXL:

Dataset Config Samples
Cascade-SFT-Stage-2 general / math 768,615
Dolci-Think-SFT-7B science 347,453
Cascade-SFT-Stage-1 general / code / math / science 711,812
Llama-Nemotron-PTD science 267,147
Nemotron-Science-v1 97,026
Nemotron-IF-Chat-v1 91,151
Total 2,282,204

Training Details

Training uses completion-only loss — only the assistant's reasoning trace and output contribute to the objective; user and system tokens are masked. Sequences are chat-formatted and packed to 16,384 tokens.

Hyperparameter Value
Epochs 2
Max sequence length 16,384
Packing Enabled
Precision bfloat16
Optimizer adamw_torch_fused
Per-device batch size 4
Gradient accumulation 4 steps
Weight decay 0.05
LR scheduler Cosine with min LR
Min LR 5×10⁻⁶
Warmup ratio 0.05
Distributed strategy FSDP (8 GPUs)
Gradient checkpointing Enabled

Intended Use

This is a research checkpoint demonstrating that reasoning language re-wiring via SFT is feasible at the 3B scale. It is intended as a starting point for Stage 2 RL fine-tuning (see DGurgurov/SmolLM3-3B-SFT-GRPO-FR), and may exhibit reduced reasoning quality compared to the base model on non-French benchmarks.


Citation

@misc{reasonxl2026,
  title        = {Reason{XL}: A Multilingual Cross-Domain Reasoning Corpus},
  author       = {Daniil Gurgurov and Tom Röhr},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/datasets/toroe/Soofi-Think-SFT-10B-multilingual}}
}

Paper citation will be added upon publication.

Downloads last month
25
Safetensors
Model size
3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DGurgurov/SmolLM3-3B-SFT-FR

Finetuned
(124)
this model

Dataset used to train DGurgurov/SmolLM3-3B-SFT-FR

Collection including DGurgurov/SmolLM3-3B-SFT-FR