---
base_model: openai/gpt-oss-safeguard-20b
base_model_relation: merge
library_name: transformers
pipeline_tag: text-generation
tags:
- sft
- transformers
- trl
- safety
- reasoning
license: apache-2.0
language:
- en
- ko
---

# Vayne-V2

**Vayne-V2** is a **compact, efficient, and high-performance enterprise LLM** optimized for **AI agent frameworks**, **MCP-based tool orchestration**, **Retrieval-Augmented Generation (RAG) pipelines**, and **secure on-premise deployment**.

- ✅ Lightweight architecture for fast inference and low resource usage  
- ⚙️ Seamless integration with modern AI agent frameworks  
- 🔗 Built-in compatibility for MCP-based multi-tool orchestration  
- 🔍 Optimized for enterprise-grade RAG systems  
- 🛡️ Secure deployment in private or regulated environments  

---

## Key Design Principles

| Feature | Description |
|----------|-------------|
| 🔐 Private AI Ready | Deploy fully **on-premise** or in **air-gapped** secure environments |
| ⚡ Lightweight Inference | **Single-GPU optimized** architecture for fast and cost-efficient deployment |
| 🧠 Enterprise Reasoning | Structured output and instruction-following for **business automation** |
| 🔧 Agent & MCP Native | Built for **AI agent frameworks** and **MCP-based tool orchestration** |
| 🔍 RAG Enhanced | Optimized for **retrieval workflows** with vector DBs (FAISS, Milvus, pgvector, etc.) | 

---

## Model Architecture & Training

| Specification | Details |
|---------------|---------|
| 🧬 Base Model | GPT-OSS-Safeguard-20B |
| 🔢 Parameters | 21B (Active: 3.6B) |
| 🎯 Precision | BF16 / FP16 |
| 🧱 Architecture | Decoder-only Transformer |
| 🛡️ Safety Architecture | Chain-of-Thought Reasoning |
| 📏 Context Length | 4K tokens |
| ⚡ Inference | Single-GPU (16GB VRAM) / Multi-GPU |

### Training Data
Fine-tuned using supervised instruction tuning (SFT) on:
- Enterprise QA datasets
- Task reasoning + tool usage instructions
- RAG-style retrieval prompts
- Business reports & structured communication
- Korean–English bilingual QA and translation
- Safety reasoning with Chain-of-Thought (CoT) supervision
- Policy-based content classification datasets

---

## Safety & Reasoning Features

Vayne-V2 inherits advanced safety reasoning capabilities from gpt-oss-safeguard-20b:

| Feature | Description |
|---------|-------------|
| 🧠 **Chain-of-Thought Safety** | Transparent reasoning process for content safety decisions |
| 📋 **Bring Your Own Policy** | Custom policy interpretation and application |
| ⚖️ **Configurable Reasoning** | Adjustable reasoning effort (Low/Medium/High) |
| 🔬 **Explainable Outputs** | Full CoT traces for safety decision auditing |

### Reasoning Effort Levels

| Level | Use Case | Trade-off |
|-------|----------|-----------|
| **Low** | Fast filtering, real-time applications | Speed-optimized, lower latency |
| **Medium** | Balanced production use | Balanced accuracy and speed |
| **High** | Critical content review | Maximum accuracy, higher latency |

---

## Secure On-Premise Deployment

Vayne-V2 is built for **enterprise AI inside your firewall**.

✅ No external API dependency  
✅ Compatible with **offline environments**  
✅ Proven for secure deployments

---

## MCP (Model Context Protocol) Integration

Vayne-V2 supports **MCP-based agent tooling**, making it easy to integrate tool-use AI.

Works seamlessly with:

* Claude MCP-compatible agent systems
* Local agent runtimes
* JSON structured execution

---

## RAG Compatibility

Designed for **hybrid reasoning + retrieval**.

✅ Works with FAISS, Chroma, Elasticsearch  
✅ Handles long-context document QA  
✅ Ideal for enterprise knowledge bases  

---

## Quick Start

```bash
pip install transformers peft accelerate bitsandbytes
```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "PoSTMEDIA/Vayne-V2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "Explain the benefits of private AI for enterprise security."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## Use Cases

✅ Internal enterprise AI assistant  
✅ Private AI document analysis  
✅ Business writing (reports, proposals, strategy)  
✅ AI automation agents  
✅ Secure RAG search systems  

---

## Safety & Limitations

* Not intended for medical, legal, or financial decision-making
* May occasionally generate hallucinations
* Use human validation for critical outputs
* Recommended: enable output guardrails for production

---

## Citation

```bibtex
@misc{vayne2025,
  title={Vayne-V2: Safety-Enhanced Enterprise LLM with Chain-of-Thought Reasoning},
  author={PoSTMEDIA AI Lab},
  year={2025},
  publisher={Hugging Face}
}
```

---

## Contact

**PoSTMEDIA AI Lab**  
📧 [dev.postmedia@gmail.com](mailto:dev.postmedia@gmail.com)  
🌐 [https://postmedia.ai](https://postmedia.ai)  
🌐 [https://postmedia.co.kr](https://postmedia.co.kr)  

---