output = llm(
"Once upon a time,",
max_tokens=512,
echo=True
)
print(output)
Arch-Router-1.5B
Arch-Router-1.5B introduces a preference-aligned routing framework that guides model selection by matching queries to user-defined domains (e.g., travel) or action types (e.g., image editing) -- offering a practical mechanism to encode preferences in routing decisions. Specifically, we introduce Arch-Router, a compact 1.5B model that learns to map queries to domain-action preferences for model routing decisions. Experiments on conversational datasets demonstrate that our approach achieves state-of-the-art (SOTA) results in matching queries with human preferences, outperforming top proprietary models.
Model Files
File Name
Size
Type
Description
Arch-Router-1.5B.Q2_K.gguf
676 MB
Model
Q2_K quantized model (smallest)
Arch-Router-1.5B.Q3_K_S.gguf
761 MB
Model
Q3_K_S quantized model
Arch-Router-1.5B.Q3_K_M.gguf
824 MB
Model
Q3_K_M quantized model
Arch-Router-1.5B.Q3_K_L.gguf
880 MB
Model
Q3_K_L quantized model
Arch-Router-1.5B.Q4_K_S.gguf
940 MB
Model
Q4_K_S quantized model
Arch-Router-1.5B.Q4_K_M.gguf
986 MB
Model
Q4_K_M quantized model
Arch-Router-1.5B.Q5_K_S.gguf
1.1 GB
Model
Q5_K_S quantized model
Arch-Router-1.5B.Q5_K_M.gguf
1.13 GB
Model
Q5_K_M quantized model
Arch-Router-1.5B.Q6_K.gguf
1.27 GB
Model
Q6_K quantized model
Arch-Router-1.5B.Q8_0.gguf
1.65 GB
Model
Q8_0 quantized model
Arch-Router-1.5B.BF16.gguf
3.09 GB
Model
BF16 precision model
Arch-Router-1.5B.F16.gguf
3.09 GB
Model
F16 precision model
Arch-Router-1.5B.F32.gguf
6.18 GB
Model
F32 full precision model (largest)
.gitattributes
2.49 kB
Config
Git LFS configuration
config.json
31 Bytes
Config
Model configuration
README.md
173 Bytes
Documentation
Repository documentation
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant
types (lower is better):
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="prithivMLmods/Arch-Router-1.5B-GGUF", filename="", )