Instructions to use prithivMLmods/Draconis-Qwen3_Math-4B-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Draconis-Qwen3_Math-4B-Preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Draconis-Qwen3_Math-4B-Preview")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Draconis-Qwen3_Math-4B-Preview")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Draconis-Qwen3_Math-4B-Preview", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use prithivMLmods/Draconis-Qwen3_Math-4B-Preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Draconis-Qwen3_Math-4B-Preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Draconis-Qwen3_Math-4B-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Draconis-Qwen3_Math-4B-Preview

SGLang

How to use prithivMLmods/Draconis-Qwen3_Math-4B-Preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Draconis-Qwen3_Math-4B-Preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Draconis-Qwen3_Math-4B-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Draconis-Qwen3_Math-4B-Preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Draconis-Qwen3_Math-4B-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Draconis-Qwen3_Math-4B-Preview with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Draconis-Qwen3_Math-4B-Preview
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Draconis-Qwen3_Math-4B-Preview

Draconis-Qwen3_Math-4B-Preview is fine-tuned on the Qwen3-4B architecture, optimized for excellence in mathematical reasoning, logical problem solving, and structured content generation. This preview model focuses on precision, step-by-step reasoning, and efficient inference, making it ideal for educational and technical applications where reliability and compact performance are essential.

GGUF [Q4_K_M] : https://huggingface.co/prithivMLmods/Draconis-Qwen3_Math-4B-Preview-Q4_K_M-GGUF

GGUF [Q5_K_M] : https://huggingface.co/prithivMLmods/Draconis-Qwen3_Math-4B-Preview-Q5_K_M-GGUF

Key Features

Mathematical and Logical Reasoning Finetuned to solve symbolic logic, arithmetic, and multi-step mathematical problems, making it ideal for STEM learning, competitions, and educational use.
Compact Code Understanding Efficient in writing and interpreting code in Python, JavaScript, and other languages, suitable for lightweight coding tasks and algorithmic explanations.
Factual Precision Trained on high-quality, curated data with reasoning benchmarks to reduce hallucinations and ensure correctness in technical outputs.
Instruction-Tuned Strong adherence to instructions, ideal for structured queries, step-by-step problem solving, and producing formatted outputs (Markdown, JSON, tables).
Multilingual Support Capable of understanding and responding in over 20 languages, useful for multilingual education and technical translation.
Efficient Performance Based on the 4B parameter variant of Qwen3, optimized for resource-constrained environments without compromising core reasoning capability.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Draconis-Qwen3_Math-4B-Preview"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Solve the equation: 3x + 7 = 22. Show all steps."

messages = [
    {"role": "system", "content": "You are a step-by-step math tutor."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Intended Use

Solving math and logic problems
Code assistance and basic debugging
Education-focused applications (STEM tutoring)
Structured content generation (e.g., JSON, Markdown)
Multilingual reasoning and translations
Lightweight deployment in reasoning tasks

Limitations

Limited creativity in open-ended or fictional content
May struggle with ambiguous or multi-intent prompts
Smaller context window compared to 14B+ variants
Still subject to factual errors in edge cases or adversarial queries

References

[AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models] : https://arxiv.org/pdf/2504.16891
[YaRN: Efficient Context Window Extension of Large Language Models] : https://arxiv.org/pdf/2309.00071