Instructions to use toroe/Qwen3-4B-Instruct-DE-Science-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use toroe/Qwen3-4B-Instruct-DE-Science-Thinking with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="toroe/Qwen3-4B-Instruct-DE-Science-Thinking")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("toroe/Qwen3-4B-Instruct-DE-Science-Thinking")
model = AutoModelForCausalLM.from_pretrained("toroe/Qwen3-4B-Instruct-DE-Science-Thinking")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use toroe/Qwen3-4B-Instruct-DE-Science-Thinking with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "toroe/Qwen3-4B-Instruct-DE-Science-Thinking"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "toroe/Qwen3-4B-Instruct-DE-Science-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/toroe/Qwen3-4B-Instruct-DE-Science-Thinking

SGLang

How to use toroe/Qwen3-4B-Instruct-DE-Science-Thinking with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "toroe/Qwen3-4B-Instruct-DE-Science-Thinking" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "toroe/Qwen3-4B-Instruct-DE-Science-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "toroe/Qwen3-4B-Instruct-DE-Science-Thinking" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "toroe/Qwen3-4B-Instruct-DE-Science-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use toroe/Qwen3-4B-Instruct-DE-Science-Thinking with Docker Model Runner:
```
docker model run hf.co/toroe/Qwen3-4B-Instruct-DE-Science-Thinking
```

Qwen3-4B-Instruct-2507 — German Reasoning SFT (Nemotron Multilingual Reasoning)

Model description

This model is a supervised fine-tuned (SFT) version of Qwen/Qwen3-4B-Instruct-2507, trained on the German (de) split of DGurgurov/Nemotron-Multilingual-Reasoning.

The objective of this training run was to improve:

German instruction following
Step-by-step reasoning
Long-context conversational performance

Key characteristics:

Base model: Qwen/Qwen3-4B-Instruct-2507
Tokenizer: Qwen/Qwen3-4B-Instruct-2507
Training data: DGurgurov/Nemotron-Multilingual-Reasoning (de)
Loss: completion-only loss (only assistant tokens are optimized)
Context length during training: 16,384 tokens
Chat formatted data: Yes (message templates prepared)

Intended uses

Suitable for

German assistants and chatbots
German reasoning tasks (logic, math, structured explanations)
Long-context document QA in German
Instruction following

Not suitable for

Medical or legal advice without professional oversight
Safety-critical decisions
Autonomous decision making systems

Training data

Dataset used:

DGurgurov/Nemotron-Multilingual-Reasoning

Configuration:

Language filter: German only (de)
Converted to chat messages (prepare_messages=True)
Loss masking: completion_only_loss=True

Only assistant responses contributed to training loss.

Please review the dataset card for provenance and potential limitations.

Training procedure

General

Method: Supervised fine-tuning (SFT)
Epochs: 3
Max sequence length: 16384
Packing: enabled
Precision: bfloat16
Gradient checkpointing: enabled
Kernel optimization: Liger kernel enabled
Distributed training: DDP

Optimization

Optimizer: adamw_torch_fused
Batch size per device: 4
Gradient accumulation: 4
Effective batch size (per GPU): 16 sequences/step
Weight decay: 0.05

Learning rate:

Scheduler: cosine_with_min_lr
Warmup ratio: 0.05
Minimum LR: 5e-6

Logging & checkpoints

Logging steps: 5
Save steps: 900
Tracking: Weights & Biases
Token accuracy logged during training

Data processing

Dataset workers: 16 processes
Dataset preparation: enabled
Language split: de

Usage

Transformers example

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "YOUR_USERNAME/YOUR_MODEL_REPO"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "Du bist ein hilfreicher Assistent."},
    {"role": "user", "content": "Erkläre mir kurz den Unterschied zwischen erneuerbaren und fossilen Energien."},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

out = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

print(tokenizer.decode(out[0], skip_special_tokens=True))

Important:
Use the tokenizer's apply_chat_template() — the model was trained in chat format and quality will drop without it.

Evaluation

Training logged token accuracy as a diagnostic metric.

Token accuracy is not a real benchmark score and should not be interpreted as model quality.
For proper evaluation, use German instruction-following and reasoning benchmarks.

Limitations

May hallucinate facts
Reasoning is not guaranteed correct
Performance near 16k context depends on prompt structure
Improvements mainly apply to German (other languages may not improve)
Not aligned for safety-critical deployments

Bias & Safety

This model inherits:

biases from the base model
biases from training data

Recommended mitigations:

add moderation layer
add safety prompts
human review for sensitive applications

License

This is a derivative model of:

Qwen/Qwen3-4B-Instruct-2507

Therefore the base model license and usage restrictions apply in addition to any dataset terms.

Please verify compatibility before commercial use.

Reproducibility (Training Arguments)

--model_name Qwen/Qwen3-4B-Instruct-2507
--tokenizer_name Qwen/Qwen3-4B-Instruct-2507
--dataset_path DGurgurov/Nemotron-Multilingual-Reasoning
--skip_prepare_dataset False
--lang_split de
--prepare_messages True
--completion_only_loss True
--max_length 16384
--dataset_num_proc 16
--packing True
--use_liger_kernel True
--bf16 True
--log_token_accuracy True
--optim adamw_torch_fused
--gradient_checkpointing True
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--ddp_find_unused_parameters False
--lr_scheduler_type cosine_with_min_lr
--lr_scheduler_kwargs {"min_lr": 5.0e-6}
--warmup_ratio 0.05
--weight_decay 0.05
--report_to wandb
--run_name qwen3_4b_instruct_lns_de_3_epochs
--num_train_epochs 3
--save_strategy steps
--logging_steps 5
--save_steps 900

Citation

If you use this model, please cite:

Qwen/Qwen3-4B-Instruct-2507
DGurgurov/Nemotron-Multilingual-Reasoning

Acknowledgements

Qwen Team — base model
Nemotron Multilingual Reasoning dataset authors
HuggingFace Transformers ecosystem

Downloads last month: 2

Safetensors

Model size

1B params

Tensor type

F32

Model tree for toroe/Qwen3-4B-Instruct-DE-Science-Thinking

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1714)

this model

toroe
/

Qwen3-4B-Instruct-DE-Science-Thinking