YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

license: llama3
language:
- en
pipeline_tag: text-generation
tags:
- meta
- llama-guard
- safety
- llmcompressor
- w8a8
- int8
- vllm
- latency
- ttft
---

# Llama-Guard-3-8B — W8A8 (llmcompressor, vLLM-ready)

Quantized with llmcompressor using SmoothQuant + GPTQ (W8A8).

## Use with vLLM

Python:
```python
from vllm import LLM, SamplingParams
llm = LLM(model="pillarsecurity/llamaguard_3_8b_w8a8")  # vLLM auto-detects quantization
# Alternatively: llm = LLM(model="pillarsecurity/llamaguard_3_8b_w8a8", quantization="w8a8")
out = llm.generate(["<formatted Llama-Guard prompt>"], SamplingParams(max_tokens=64, temperature=0.0))
print(out[0].outputs[0].text)
```

Server:

```bash
vllm serve pillarsecurity/llamaguard_3_8b_w8a8
# Or explicitly:
# vllm serve pillarsecurity/llamaguard_3_8b_w8a8 --quantization w8a8
```

Notes:
- Calibrated on a small wikitext sample by default. For best fidelity, quantize with your moderation prompts/logs.
- Tokenizer is included and must be loaded from this repo.
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support