Instructions to use prithivMLmods/Draconis-Qwen3_Math-4B-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use prithivMLmods/Draconis-Qwen3_Math-4B-Preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="prithivMLmods/Draconis-Qwen3_Math-4B-Preview") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Draconis-Qwen3_Math-4B-Preview") model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Draconis-Qwen3_Math-4B-Preview") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use prithivMLmods/Draconis-Qwen3_Math-4B-Preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "prithivMLmods/Draconis-Qwen3_Math-4B-Preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Draconis-Qwen3_Math-4B-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/prithivMLmods/Draconis-Qwen3_Math-4B-Preview
- SGLang
How to use prithivMLmods/Draconis-Qwen3_Math-4B-Preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "prithivMLmods/Draconis-Qwen3_Math-4B-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Draconis-Qwen3_Math-4B-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "prithivMLmods/Draconis-Qwen3_Math-4B-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Draconis-Qwen3_Math-4B-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use prithivMLmods/Draconis-Qwen3_Math-4B-Preview with Docker Model Runner:
docker model run hf.co/prithivMLmods/Draconis-Qwen3_Math-4B-Preview
Draconis-Qwen3_Math-4B-Preview
Draconis-Qwen3_Math-4B-Preview is fine-tuned on the Qwen3-4B architecture, optimized for excellence in mathematical reasoning, logical problem solving, and structured content generation. This preview model focuses on precision, step-by-step reasoning, and efficient inference, making it ideal for educational and technical applications where reliability and compact performance are essential.
GGUF [Q4_K_M] : https://huggingface.co/prithivMLmods/Draconis-Qwen3_Math-4B-Preview-Q4_K_M-GGUF
GGUF [Q5_K_M] : https://huggingface.co/prithivMLmods/Draconis-Qwen3_Math-4B-Preview-Q5_K_M-GGUF
Key Features
Mathematical and Logical Reasoning Finetuned to solve symbolic logic, arithmetic, and multi-step mathematical problems, making it ideal for STEM learning, competitions, and educational use.
Compact Code Understanding Efficient in writing and interpreting code in Python, JavaScript, and other languages, suitable for lightweight coding tasks and algorithmic explanations.
Factual Precision Trained on high-quality, curated data with reasoning benchmarks to reduce hallucinations and ensure correctness in technical outputs.
Instruction-Tuned Strong adherence to instructions, ideal for structured queries, step-by-step problem solving, and producing formatted outputs (Markdown, JSON, tables).
Multilingual Support Capable of understanding and responding in over 20 languages, useful for multilingual education and technical translation.
Efficient Performance Based on the 4B parameter variant of Qwen3, optimized for resource-constrained environments without compromising core reasoning capability.
Quickstart with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/Draconis-Qwen3_Math-4B-Preview"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Solve the equation: 3x + 7 = 22. Show all steps."
messages = [
{"role": "system", "content": "You are a step-by-step math tutor."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Intended Use
- Solving math and logic problems
- Code assistance and basic debugging
- Education-focused applications (STEM tutoring)
- Structured content generation (e.g., JSON, Markdown)
- Multilingual reasoning and translations
- Lightweight deployment in reasoning tasks
Limitations
- Limited creativity in open-ended or fictional content
- May struggle with ambiguous or multi-intent prompts
- Smaller context window compared to 14B+ variants
- Still subject to factual errors in edge cases or adversarial queries
References
- [AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models] : https://arxiv.org/pdf/2504.16891
- [YaRN: Efficient Context Window Extension of Large Language Models] : https://arxiv.org/pdf/2309.00071
- Downloads last month
- 27
