How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="NotHereNorThere/Qwemini-0.5b-alpha",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Qwemini-0.5B-Alpha

Qwen2.5-0.5B fine-tuned on 250 Gemini 3 Pro chain-of-thought responses.

QLoRA fine-tune (~6 minutes on an RTX 4060) to transfer Gemini 3 Pro's structured step-by-step reasoning style into a 0.5B parameter model. This wasn't really a serious attempt to make a micro scale reasoner, just a test for a potential semi-serious finetune series.

What it learned

Qwemini spontaneously produces structured CoT (however, missing the <think'> tags) responses with no system prompt or CoT trigger. It sets up equations, labels steps, and verifies answers in the style of its teacher model.

Downloads last month
55
GGUF
Model size
0.5B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

5-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NotHereNorThere/Qwemini-0.5b-alpha

Quantized
(220)
this model

Collection including NotHereNorThere/Qwemini-0.5b-alpha