File size: 5,234 Bytes
c3a3ce7
9086a08
 
c3a3ce7
9086a08
 
 
 
 
 
 
 
 
 
 
c3a3ce7
 
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
 
 
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
 
 
 
 
 
 
c3a3ce7
 
9086a08
 
 
 
 
 
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
 
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
c3a3ce7
9086a08
 
 
c3a3ce7
9086a08
 
 
 
 
 
 
c3a3ce7
9086a08
 
 
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
 
 
 
 
 
c3a3ce7
9086a08
c3a3ce7
9086a08
c3a3ce7
9086a08
 
 
 
c3a3ce7
9086a08
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
base_model: openai/gpt-oss-safeguard-20b
base_model_relation: merge
library_name: transformers
pipeline_tag: text-generation
tags:
- sft
- transformers
- trl
- safety
- reasoning
license: apache-2.0
language:
- en
- ko
---

# Vayne-V2

**Vayne-V2** is a **compact, efficient, and high-performance enterprise LLM** optimized for **AI agent frameworks**, **MCP-based tool orchestration**, **Retrieval-Augmented Generation (RAG) pipelines**, and **secure on-premise deployment**.

- ✅ Lightweight architecture for fast inference and low resource usage  
- ⚙️ Seamless integration with modern AI agent frameworks  
- 🔗 Built-in compatibility for MCP-based multi-tool orchestration  
- 🔍 Optimized for enterprise-grade RAG systems  
- 🛡️ Secure deployment in private or regulated environments  

---

## Key Design Principles

| Feature | Description |
|----------|-------------|
| 🔐 Private AI Ready | Deploy fully **on-premise** or in **air-gapped** secure environments |
| ⚡ Lightweight Inference | **Single-GPU optimized** architecture for fast and cost-efficient deployment |
| 🧠 Enterprise Reasoning | Structured output and instruction-following for **business automation** |
| 🔧 Agent & MCP Native | Built for **AI agent frameworks** and **MCP-based tool orchestration** |
| 🔍 RAG Enhanced | Optimized for **retrieval workflows** with vector DBs (FAISS, Milvus, pgvector, etc.) | 

---

## Model Architecture & Training

| Specification | Details |
|---------------|---------|
| 🧬 Base Model | GPT-OSS-Safeguard-20B |
| 🔢 Parameters | 21B (Active: 3.6B) |
| 🎯 Precision | BF16 / FP16 |
| 🧱 Architecture | Decoder-only Transformer |
| 🛡️ Safety Architecture | Chain-of-Thought Reasoning |
| 📏 Context Length | 4K tokens |
| ⚡ Inference | Single-GPU (16GB VRAM) / Multi-GPU |

### Training Data
Fine-tuned using supervised instruction tuning (SFT) on:
- Enterprise QA datasets
- Task reasoning + tool usage instructions
- RAG-style retrieval prompts
- Business reports & structured communication
- Korean–English bilingual QA and translation
- Safety reasoning with Chain-of-Thought (CoT) supervision
- Policy-based content classification datasets

---

## Safety & Reasoning Features

Vayne-V2 inherits advanced safety reasoning capabilities from gpt-oss-safeguard-20b:

| Feature | Description |
|---------|-------------|
| 🧠 **Chain-of-Thought Safety** | Transparent reasoning process for content safety decisions |
| 📋 **Bring Your Own Policy** | Custom policy interpretation and application |
| ⚖️ **Configurable Reasoning** | Adjustable reasoning effort (Low/Medium/High) |
| 🔬 **Explainable Outputs** | Full CoT traces for safety decision auditing |

### Reasoning Effort Levels

| Level | Use Case | Trade-off |
|-------|----------|-----------|
| **Low** | Fast filtering, real-time applications | Speed-optimized, lower latency |
| **Medium** | Balanced production use | Balanced accuracy and speed |
| **High** | Critical content review | Maximum accuracy, higher latency |

---

## Secure On-Premise Deployment

Vayne-V2 is built for **enterprise AI inside your firewall**.

✅ No external API dependency  
✅ Compatible with **offline environments**  
✅ Proven for secure deployments

---

## MCP (Model Context Protocol) Integration

Vayne-V2 supports **MCP-based agent tooling**, making it easy to integrate tool-use AI.

Works seamlessly with:

* Claude MCP-compatible agent systems
* Local agent runtimes
* JSON structured execution

---

## RAG Compatibility

Designed for **hybrid reasoning + retrieval**.

✅ Works with FAISS, Chroma, Elasticsearch  
✅ Handles long-context document QA  
✅ Ideal for enterprise knowledge bases  

---

## Quick Start

```bash
pip install transformers peft accelerate bitsandbytes
```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "PoSTMEDIA/Vayne-V2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "Explain the benefits of private AI for enterprise security."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## Use Cases

✅ Internal enterprise AI assistant  
✅ Private AI document analysis  
✅ Business writing (reports, proposals, strategy)  
✅ AI automation agents  
✅ Secure RAG search systems  

---

## Safety & Limitations

* Not intended for medical, legal, or financial decision-making
* May occasionally generate hallucinations
* Use human validation for critical outputs
* Recommended: enable output guardrails for production

---

## Citation

```bibtex
@misc{vayne2025,
  title={Vayne-V2: Safety-Enhanced Enterprise LLM with Chain-of-Thought Reasoning},
  author={PoSTMEDIA AI Lab},
  year={2025},
  publisher={Hugging Face}
}
```

---

## Contact

**PoSTMEDIA AI Lab**  
📧 [dev.postmedia@gmail.com](mailto:dev.postmedia@gmail.com)  
🌐 [https://postmedia.ai](https://postmedia.ai)  
🌐 [https://postmedia.co.kr](https://postmedia.co.kr)  

---