File size: 7,066 Bytes
b02a931
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0bf98cd
b02a931
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
---
license: apache-2.0
language:
- en
- ko
base_model:
- openai/gpt-oss-safeguard-120b
base_model_relation: merge
pipeline_tag: text-generation
library_name: transformers
tags:
- sft
- trl
- transformers
- safety
- reasoning
---
# Vayne-V3-Pro

**Vayne-V3-Pro** is a **fully fine-tuned, MXFP4-quantized enterprise LLM** built for **AI agent frameworks**, **MCP-based tool orchestration**, **Retrieval-Augmented Generation (RAG) pipelines**, and **secure on-premise deployment**.

Building on the foundation of Vayne-V3, Vayne-V3-Pro delivers deeper model adaptation through **full-parameter Supervised Fine-Tuning (SFT)** combined with **NVIDIA ModelOpt Quantization-Aware Training (QAT)**, resulting in significantly improved instruction-following, identity consistency, and inference efficiency.

- **Full-parameter fine-tuning** for deeper knowledge integration (vs. LoRA in V2)
- **MXFP4 quantization** via NVIDIA ModelOpt for fast, memory-efficient inference
- **Enhanced multilingual reasoning** with Korean Chain-of-Thought capabilities
- Seamless integration with MCP-based multi-tool orchestration
- Secure deployment in private or regulated environments

---

## What's New in V3

| Feature | V2 | V3 |
|---------|----|----|
| Fine-Tuning Method | LoRA (Adapter) | **Full-Parameter SFT** |
| Quantization | BF16 / FP16 | **MXFP4 (QAT)** |
| Identity Alignment | Basic | **Enhanced (5x oversampled identity training)** |
| Multilingual Reasoning | Bilingual QA | **Korean Chain-of-Thought Thinking** |
| Training Pipeline | Single-step | **3-Step QAT Recipe** |

---

## Key Design Principles

| Feature | Description |
|---------|-------------|
| Private AI Ready | Deploy fully **on-premise** or in **air-gapped** secure environments |
| Efficient Inference | **MXFP4 quantization** enables fast inference on a single GPU |
| Enterprise Reasoning | Structured output and instruction-following for **business automation** |
| Agent & MCP Native | Built for **AI agent frameworks** and **MCP-based tool orchestration** |
| RAG Enhanced | Optimized for **retrieval workflows** with vector DBs (FAISS, Milvus, pgvector, etc.) |

---

## Model Architecture & Training

| Specification | Details |
|---------------|---------|
| Base Model | [openai/gpt-oss-safeguard-120b](https://huggingface.co/openai/gpt-oss-safeguard-120b) |
| Parameters | 117B (Active: 5.1B) |
| Training Precision | BF16 |
| Inference Precision | **MXFP4** (Quantization-Aware Training) |
| Architecture | Decoder-only Transformer (MoE, 128 experts / 4 active) |
| Safety Architecture | Chain-of-Thought Reasoning |
| Context Length | 128K tokens |
| Inference | Single-GPU (80GB VRAM, H100 / MI300X) / Multi-GPU |

### Training Pipeline — 3-Step QAT Recipe

Vayne-V3-Pro is trained using a **3-step Quantization-Aware Training (QAT) recipe** powered by NVIDIA ModelOpt:

```
Step 1: Full-Parameter SFT
   └─ Standard supervised fine-tuning on BF16 weights (no quantization)

Step 2: Quantization-Aware Training (QAT)
   └─ Fine-tune with MXFP4_MLP_WEIGHT_ONLY quantization config
   └─ Lower learning rate (1e-5) for stable convergence

Step 3: MXFP4 Conversion
   └─ Convert trained model to MXFP4 format via nvidia_convert.py
   └─ Optimized for production inference
```

### Training Data

Fine-tuned using full-parameter supervised instruction tuning (SFT) on proprietary and curated datasets covering:

- Model identity and persona alignment
- Domain-specific knowledge for targeted enterprise verticals
- Multilingual Chain-of-Thought reasoning (Korean-English)

### Training Configuration

| Parameter | Value |
|-----------|-------|
| Learning Rate (SFT) | 2.0e-5 |
| Learning Rate (QAT) | 1.0e-5 |
| Batch Size | 2 per device |
| Epochs | 1.0 |
| Max Sequence Length | 131,072 |
| Warmup Ratio | 0.03 |
| LR Scheduler | Cosine with Min LR (10%) |
| Gradient Checkpointing | Enabled |
| Training Infrastructure | NVIDIA H200 x 16 |

---

## Safety & Reasoning Features

Vayne-V3-Pro inherits advanced safety reasoning capabilities from gpt-oss-safeguard-120b:

| Feature | Description |
|---------|-------------|
| **Chain-of-Thought Safety** | Transparent reasoning process for content safety decisions |
| **Bring Your Own Policy** | Custom policy interpretation and application |
| **Configurable Reasoning** | Adjustable reasoning effort (Low/Medium/High) |
| **Explainable Outputs** | Full CoT traces for safety decision auditing |

### Reasoning Effort Levels

| Level | Use Case | Trade-off |
|-------|----------|-----------|
| **Low** | Fast filtering, real-time applications | Speed-optimized, lower latency |
| **Medium** | Balanced production use | Balanced accuracy and speed |
| **High** | Critical content review | Maximum accuracy, higher latency |

---

## Secure On-Premise Deployment

Vayne-V3-Pro is built for **enterprise AI inside your firewall**.

- No external API dependency
- Compatible with **offline environments**
- MXFP4 quantization for **resource-efficient deployment**
- Proven for secure, regulated environments

---

## MCP (Model Context Protocol) Integration

Vayne-V3-Pro supports **MCP-based agent tooling**, making it easy to build tool-use AI agents.

Works seamlessly with:

- Claude MCP-compatible agent systems
- Local agent runtimes
- JSON structured execution

---

## RAG Compatibility

Designed for **hybrid reasoning + retrieval**.

- Works with FAISS, Chroma, Elasticsearch
- Handles long-context document QA
- Ideal for enterprise knowledge bases

---

## Quick Start

```bash
pip install transformers accelerate
```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "PoSTMEDIA/Vayne-V3-Pro"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Explain the benefits of private AI for enterprise security."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## Use Cases

- Internal enterprise AI assistant
- Private AI document analysis
- Business writing (reports, proposals, strategy)
- AI automation agents with MCP tool orchestration
- Secure RAG search systems
- Multilingual (Korean-English) reasoning tasks

---

## Safety & Limitations

- Not intended for medical, legal, or financial decision-making
- May occasionally generate hallucinations
- Use human validation for critical outputs
- Recommended: enable output guardrails for production

---

## Citation

```bibtex
@misc{vayne2026,
  title={Vayne-V3-Pro: Fully Fine-Tuned Enterprise LLM with MXFP4 Quantization-Aware Training},
  author={PoSTMEDIA AI Lab},
  year={2026},
  publisher={Hugging Face}
}
```

---

## Contact

**PoSTMEDIA AI Lab**
- Email: [dev.postmedia@gmail.com](mailto:dev.postmedia@gmail.com)
- Web: [https://postmedia.ai](https://postmedia.ai)
- Web: [https://postmedia.co.kr](https://postmedia.co.kr)