How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Liontix/ruby-9b-GGUF",
	filename="Ruby-Q4_K_M.gguf",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Warning

This model is experimental and not meant for production or serious use.

Opus Agent x Gemini 3 Flash Preview

This model was trained using SFT (Supervised Fine Tuning) on Gemini 3 Flash Preview responses, but this time I masked the CoT (Chain of Thought/Thinking Part) when training. In the future I plan to release more models like these, the catch would be using a fine tuned base model that was SFT with raw/synthetic CoT (not reasoning summaries like before), and then training on a lot more entries without touching the CoT.

For generating synthetic CoTs I might use Glint-Research/Glint-Trace, seems like a promising model and a step up from just using reasoning summaries.

Thanks to

Metadata

  • Developed by: Liontix
  • License: apache-2.0
  • Finetuned from model : armand0e/Qwen3.5-9B-Opus-Agent
Downloads last month
121
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Liontix/ruby-9b-GGUF

Finetuned
Qwen/Qwen3.5-9B
Finetuned
Liontix/ruby-9b
Quantized
(1)
this model

Dataset used to train Liontix/ruby-9b-GGUF