Hermes-3-Llama-3.2-3B Q4_0 GGUF
Quantized GGUF version of NousResearch/Hermes-3-Llama-3.2-3B
Model Details
- Base Model: NousResearch/Hermes-3-Llama-3.2-3B
- Quantization: Q4_0 (4-bit)
- Format: GGUF
- Size: 1.79 GB
- Use Case: Efficient inference with llama.cpp
Usage
With llama.cpp
# Download model
huggingface-cli download gameversellc/hermes_3_tests_gv Hermes-3-Llama-3.2-3B_q4_0.gguf
# Run inference
./llama-cli -m Hermes-3-Llama-3.2-3B_q4_0.gguf -p "Your prompt here" -n 100
With llama-cpp-python
from llama_cpp import Llama
llm = Llama(model_path="Hermes-3-Llama-3.2-3B_q4_0.gguf")
output = llm("Your prompt here", max_tokens=100)
print(output)
Performance
- MMLU Accuracy: ~40% (quantized)
- Inference Speed: Fast on CPU
- Memory Usage: ~2 GB RAM
License
Apache 2.0 (same as base model)
- Downloads last month
- 3
Hardware compatibility
Log In
to add your hardware
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support