YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
license: llama3
language:
- en
pipeline_tag: text-generation
tags:
- meta
- llama-guard
- safety
- llmcompressor
- w8a8
- int8
- vllm
- latency
- ttft
---
# Llama-Guard-3-8B — W8A8 (llmcompressor, vLLM-ready)
Quantized with llmcompressor using SmoothQuant + GPTQ (W8A8).
## Use with vLLM
Python:
```python
from vllm import LLM, SamplingParams
llm = LLM(model="pillarsecurity/llamaguard_3_8b_w8a8") # vLLM auto-detects quantization
# Alternatively: llm = LLM(model="pillarsecurity/llamaguard_3_8b_w8a8", quantization="w8a8")
out = llm.generate(["<formatted Llama-Guard prompt>"], SamplingParams(max_tokens=64, temperature=0.0))
print(out[0].outputs[0].text)
```
Server:
```bash
vllm serve pillarsecurity/llamaguard_3_8b_w8a8
# Or explicitly:
# vllm serve pillarsecurity/llamaguard_3_8b_w8a8 --quantization w8a8
```
Notes:
- Calibrated on a small wikitext sample by default. For best fidelity, quantize with your moderation prompts/logs.
- Tokenizer is included and must be loaded from this repo.
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support