Instructions to use toroe/SmolLM-3B-Science-DE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use toroe/SmolLM-3B-Science-DE with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="toroe/SmolLM-3B-Science-DE")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("toroe/SmolLM-3B-Science-DE")
model = AutoModelForCausalLM.from_pretrained("toroe/SmolLM-3B-Science-DE", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use toroe/SmolLM-3B-Science-DE with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "toroe/SmolLM-3B-Science-DE"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "toroe/SmolLM-3B-Science-DE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/toroe/SmolLM-3B-Science-DE

SGLang

How to use toroe/SmolLM-3B-Science-DE with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "toroe/SmolLM-3B-Science-DE" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "toroe/SmolLM-3B-Science-DE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "toroe/SmolLM-3B-Science-DE" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "toroe/SmolLM-3B-Science-DE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use toroe/SmolLM-3B-Science-DE with Docker Model Runner:
```
docker model run hf.co/toroe/SmolLM-3B-Science-DE
```

SmolLM3-3B — German Reasoning Instruction SFT (Nemotron Multilingual Reasoning)

Model Description

This model is a Supervised Fine-Tuned (SFT) version of:

HuggingFaceTB/SmolLM3-3B

It was fine-tuned on the German (de) split of the dataset:

DGurgurov/Nemotron-Multilingual-Reasoning

The goal of the training was to improve:

German instruction following
Step-by-step reasoning
Long-context conversation behavior

The model was trained using chat-formatted conversations and completion-only loss, meaning only assistant responses contributed to optimization.

Key properties:

Base model: SmolLM3-3B
Language specialization: German
Context length during training: 16,384 tokens
Chat formatted dataset
Long-context packing enabled

Intended Uses

Suitable For

German conversational assistants
Educational tutoring
Reasoning and structured explanation tasks
Long-document Q&A in German
Research experiments with long-context small LLMs

Not Suitable For

Medical or legal advice without human review
Autonomous decision-making
Safety-critical systems
High-stakes financial decisions

Training Data

Dataset used:

DGurgurov/Nemotron-Multilingual-Reasoning

Processing configuration:

Language filtering: German only
Converted into chat messages (prepare_messages=True)
Assistant-only optimization (completion_only_loss=True)

Only the assistant responses were used to compute loss; user and system messages were masked.

Please review the dataset card for provenance and limitations.

Training Procedure

Training was performed using HuggingFace Accelerate with FSDP (Fully Sharded Data Parallel) across 8 processes.

Core Setup

Training method: Supervised fine-tuning (SFT)
Epochs: 3
Maximum sequence length: 16,384
Sequence packing: enabled
Precision: bfloat16
Kernel optimization: Liger kernel enabled
Gradient checkpointing: enabled
Distributed: FSDP (8 processes)

Optimization

Optimizer: adamw_torch_fused
Per-device batch size: 4
Gradient accumulation: 4
Effective batch size (per GPU): 16 sequences per step
Weight decay: 0.05

Learning rate schedule:

Scheduler: cosine_with_min_lr
Warmup ratio: 0.05
Minimum LR: 5e-6

Logging & Checkpoints

Logging every 5 steps
Checkpoint every 450 steps
Weights & Biases tracking enabled
Token accuracy logged during training

Data Processing

Dataset workers: 16
Dataset preparation: enabled
Chat message preparation: enabled
German split: enabled

Usage

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "YOUR_USERNAME/YOUR_MODEL_NAME"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
   model_id,
   device_map="auto",
   torch_dtype=torch.bfloat16,
)

messages = [
   {"role": "system", "content": "Du bist ein hilfreicher Assistent."},
   {"role": "user", "content": "Warum ist der Himmel blau?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
   **inputs,
   max_new_tokens=512,
   temperature=0.7,
   top_p=0.9,
   do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Important:
You should use apply_chat_template() when prompting. The model was trained on chat-formatted conversations and performance will degrade without it.

Evaluation

During training, token accuracy was logged as a diagnostic metric.

Token accuracy:

is useful for monitoring training stability
is NOT a benchmark score
does not represent real reasoning performance

For proper evaluation, use:

German instruction-following benchmarks
reasoning datasets
long-context evaluation tasks

Limitations

May hallucinate facts
Reasoning chains can still contain logical errors
Performance near 16k context depends heavily on prompt structure
Improvements mainly apply to German
Smaller model size means weaker world knowledge than large LLMs
Not aligned for safety-critical deployment

Bias & Safety

This model inherits biases from:

the base model
the training dataset

Recommended mitigations:

add moderation filters
use system prompts enforcing safe behavior
include human review for sensitive deployments

License

This model is a derivative of:

HuggingFaceTB/SmolLM3-3B

Therefore, the original base model license and usage restrictions apply, along with any dataset terms.

Verify compatibility before commercial deployment.

Reproducibility (Training Arguments)

accelerate launch --use_fsdp --num_processes 8 --config_file sft/my_config.yaml sft/sft_trainer.py

--model_name HuggingFaceTB/SmolLM3-3B
--tokenizer_name HuggingFaceTB/SmolLM3-3B
--dataset_path DGurgurov/Nemotron-Multilingual-Reasoning
--skip_prepare_dataset False
--lang_split de
--prepare_messages True
--completion_only_loss True
--max_length 16384
--dataset_num_proc 16
--packing True
--use_liger_kernel True
--bf16 True
--log_token_accuracy True
--optim adamw_torch_fused
--gradient_checkpointing True
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--ddp_find_unused_parameters False
--lr_scheduler_type cosine_with_min_lr
--lr_scheduler_kwargs {"min_lr": 5.0e-6}
--warmup_ratio 0.05
--weight_decay 0.05
--report_to wandb
--run_name smol_3b_3epochs_lns_de
--num_train_epochs 3
--save_strategy steps
--logging_steps 5
--save_steps 450

Citation

If you use this model, please cite:

HuggingFaceTB/SmolLM3-3B
DGurgurov/Nemotron-Multilingual-Reasoning

Acknowledgements

HuggingFaceTB — SmolLM3 base model
Nemotron Multilingual Reasoning dataset authors
HuggingFace Accelerate and Transformers libraries

Downloads last month: 7

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for toroe/SmolLM-3B-Science-DE

Base model

HuggingFaceTB/SmolLM3-3B-Base

Finetuned

HuggingFaceTB/SmolLM3-3B