Instructions to use Kiy-K/Fyodor-Mini-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Kiy-K/Fyodor-Mini-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Kiy-K/Fyodor-Mini-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Kiy-K/Fyodor-Mini-3B")
model = AutoModelForCausalLM.from_pretrained("Kiy-K/Fyodor-Mini-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Kiy-K/Fyodor-Mini-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Kiy-K/Fyodor-Mini-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kiy-K/Fyodor-Mini-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Kiy-K/Fyodor-Mini-3B

SGLang

How to use Kiy-K/Fyodor-Mini-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Kiy-K/Fyodor-Mini-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kiy-K/Fyodor-Mini-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Kiy-K/Fyodor-Mini-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kiy-K/Fyodor-Mini-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Kiy-K/Fyodor-Mini-3B with Docker Model Runner:
```
docker model run hf.co/Kiy-K/Fyodor-Mini-3B
```

Fyodor SmolLM3-3B v2 Instruct

Fine-tuned SmolLM3-3B with enhanced general knowledge, coding, math, tool calling, reasoning, and instruction-following capabilities.

Model Details

Base Model: HuggingFaceTB/SmolLM3-3B
Model Type: Causal Language Model (3B parameters)
Language(s): English, Python, and multiple programming languages
License: Apache 2.0
Training Method: LoRA fine-tuning with mixed precision (bfloat16)
Model Size: ~3B parameters
Dtype: bfloat16

Training Details

Training Strategy

This model was trained using LoRA (Low-Rank Adaptation) fine-tuning with the following configuration:

Training Strategy: smollm3_3b_lora_hard_merge
Final Training Loss: 0.3240
Number of Epochs: 3
Learning Rate: 2e-4
Batch Size: 8
Gradient Accumulation Steps: 8 (effective batch size: 64)
Max Sequence Length: 1024 tokens
Warmup Steps: 100

LoRA Configuration

lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
lora_target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

Training Data Distribution

The model was trained on a carefully balanced mix of high-quality datasets:

30% General Knowledge: MuskumPillerum/General-Knowledge, HuggingFaceH4/ultrachat_200k, teknium/OpenHermes-2.5, cognitivecomputations/dolphin
20% Coding: bigcode/starcoderdata (Python), sahil2801/CodeAlpaca-20k, iamtarun/python_code_instructions_18k_alpaca
20% Tool Calling: Salesforce/xlam-function-calling-60k, glaiveai/glaive-function-calling-v2, NousResearch/hermes-function-calling-v1
10% Math: meta-math/MetaMathQA, openai/gsm8k
10% Advanced Reasoning: Open-Orca/OpenOrca
10% Instruction Following: tatsu-lab/alpaca, HuggingFaceH4/ultrachat_200k

Usage

Installation

pip install transformers torch accelerate

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "Kiy-K/Fyodor-Mini-3B",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("Kiy-K/Fyodor-Mini-3B")

# Generate text
prompt = """### Instruction:
Write a Python function to calculate Fibonacci numbers using dynamic programming.

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.95,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Code Generation Example

prompt = """### Instruction:
Create a Python class for a binary search tree with insert and search methods.

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Tool Calling Example

prompt = """You have access to the following functions:

[
  {
    "name": "get_weather",
    "description": "Get current weather for a location",
    "parameters": {
      "location": {"type": "string", "description": "City name"}
    }
  }
]

User: What's the weather in Paris?
Assistant:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.3)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Math Problem Solving

prompt = """Question: A train travels 120 km in 2 hours. What is its average speed in km/h?
Answer:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Capabilities

This model excels at:

✅ General Knowledge: Answering questions across various domains
✅ Code Generation: Writing Python, JavaScript, and other programming languages
✅ Mathematical Reasoning: Solving arithmetic and word problems
✅ Tool/Function Calling: Understanding and generating function calls
✅ Chain-of-Thought Reasoning: Step-by-step problem solving
✅ Instruction Following: Understanding and executing complex instructions

Recommended Generation Parameters

For best results, use these generation settings based on your use case:

Code Generation

temperature=0.2
top_p=0.95
max_new_tokens=512
do_sample=True

Creative Writing

temperature=0.8
top_p=0.95
max_new_tokens=1024
do_sample=True

Mathematical Reasoning

temperature=0.1
top_p=0.9
max_new_tokens=512
do_sample=True

General Q&A

temperature=0.7
top_p=0.95
max_new_tokens=512
do_sample=True

Limitations

Context window limited to 1024 tokens during training (base model supports up to 2048)
May occasionally generate incorrect information or code
Not specifically optimized for languages other than English
Should not be used for medical, legal, or other professional advice without expert review
Generated code should always be reviewed and tested before production use
May exhibit biases present in the training data

Ethical Considerations

This model can generate code that may have security vulnerabilities - always review before deployment
The model should not be used to generate malicious code or harmful content
Be aware of potential biases inherited from training data
Not suitable for making critical decisions without human oversight
Users are responsible for ensuring appropriate use of generated content

Performance Benchmarks

Training metrics:

Final Validation Loss: 0.3240
Training Strategy: Hard LoRA merge
Perplexity: ~1.38 (estimated from loss)

Model Card Contact

For questions, feedback, or issues, please:

Open an issue on the model repository
Contact the author through Hugging Face

Citation

If you use this model in your research or applications, please cite:

@misc{fyodor-mini-2025,
  author = {Khoi},
  title = {Fyodor SmolLM3-3B v2 Instruct},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Kiy-K/Fyodor-Mini-3B}
}