YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

license: llama3
language:
- en
pipeline_tag: text-generation
tags:
- meta
- llama-guard
- safety
- llmcompressor
- w8a8
- int8
- vllm
- latency
- ttft
---

# Llama-Guard-3-8B — W8A8 (llmcompressor, vLLM-ready)

Quantized with llmcompressor using SmoothQuant + GPTQ (W8A8).

## Use with vLLM

Python:
```python
from vllm import LLM, SamplingParams
llm = LLM(model="pillarsecurity/llamaguard_3_8b_w8a8")  # vLLM auto-detects quantization
# Alternatively: llm = LLM(model="pillarsecurity/llamaguard_3_8b_w8a8", quantization="w8a8")
out = llm.generate(["<formatted Llama-Guard prompt>"], SamplingParams(max_tokens=64, temperature=0.0))
print(out[0].outputs[0].text)
```

Server:

```bash
vllm serve pillarsecurity/llamaguard_3_8b_w8a8
# Or explicitly:
# vllm serve pillarsecurity/llamaguard_3_8b_w8a8 --quantization w8a8
```

Notes:
- Calibrated on a small wikitext sample by default. For best fidelity, quantize with your moderation prompts/logs.
- Tokenizer is included and must be loaded from this repo.

Downloads last month: 1

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support