SmolLM3-3B-SFT-FR
SmolLM3-3B fine-tuned on the French split of ReasonXL via supervised fine-tuning (SFT), targeting in-language reasoning adaptation.
Model Description
This model is the result of Stage 1 of a two-stage reasoning adaptation pipeline. It is trained to shift the base model's reasoning language from English to French, by exposing it to a large corpus of French-language reasoning traces spanning math, science, code, and general domains.
| Property | Value |
|---|---|
| Base model | HuggingFaceTB/SmolLM3-3B |
| Training stage | SFT (Stage 1 only) |
| Target language | French (fr) |
| Training data | ReasonXL — FR split |
| Training tokens | ~8.8B |
| Avg. sequence length | 3,872 tokens |
Training Data
The model is trained on the French split of ReasonXL, a multilingual cross-domain reasoning corpus of 2M samples per language (9B tokens). English source samples were translated using Qwen3-32B with a dedicated system prompt preserving technical terminology, mathematical notation, and reasoning structure.
Each sample consists of three independently translated components: the user input, the model's reasoning trace (within <think> tags), and the final output.
The English source data was annotated and filtered using Propella-1-4B across 18 properties (safety, quality, information density, educational value, domain), followed by class-aware downsampling for domain balance.
Source datasets included in ReasonXL:
| Dataset | Config | Samples |
|---|---|---|
| Cascade-SFT-Stage-2 | general / math | 768,615 |
| Dolci-Think-SFT-7B | science | 347,453 |
| Cascade-SFT-Stage-1 | general / code / math / science | 711,812 |
| Llama-Nemotron-PTD | science | 267,147 |
| Nemotron-Science-v1 | — | 97,026 |
| Nemotron-IF-Chat-v1 | — | 91,151 |
| Total | 2,282,204 |
Training Details
Training uses completion-only loss — only the assistant's reasoning trace and output contribute to the objective; user and system tokens are masked. Sequences are chat-formatted and packed to 16,384 tokens.
| Hyperparameter | Value |
|---|---|
| Epochs | 2 |
| Max sequence length | 16,384 |
| Packing | Enabled |
| Precision | bfloat16 |
| Optimizer | adamw_torch_fused |
| Per-device batch size | 4 |
| Gradient accumulation | 4 steps |
| Weight decay | 0.05 |
| LR scheduler | Cosine with min LR |
| Min LR | 5×10⁻⁶ |
| Warmup ratio | 0.05 |
| Distributed strategy | FSDP (8 GPUs) |
| Gradient checkpointing | Enabled |
Intended Use
This is a research checkpoint demonstrating that reasoning language re-wiring via SFT is feasible at the 3B scale. It is intended as a starting point for Stage 2 RL fine-tuning (see DGurgurov/SmolLM3-3B-SFT-GRPO-FR), and may exhibit reduced reasoning quality compared to the base model on non-French benchmarks.
Citation
@misc{reasonxl2026,
title = {Reason{XL}: A Multilingual Cross-Domain Reasoning Corpus},
author = {Daniil Gurgurov and Tom Röhr},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/datasets/toroe/Soofi-Think-SFT-10B-multilingual}}
}
Paper citation will be added upon publication.
- Downloads last month
- 25
Model tree for DGurgurov/SmolLM3-3B-SFT-FR
Base model
HuggingFaceTB/SmolLM3-3B-Base