PoSTMEDIA commited on
Commit
0351ef0
·
verified ·
1 Parent(s): fc00a7e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +212 -1
README.md CHANGED
@@ -13,4 +13,215 @@ tags:
13
  - transformers
14
  - safety
15
  - reasoning
16
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - transformers
14
  - safety
15
  - reasoning
16
+ ---
17
+ # Vayne-V3
18
+
19
+ **Vayne-V3** is a **fully fine-tuned, MXFP4-quantized enterprise LLM** built for **AI agent frameworks**, **MCP-based tool orchestration**, **Retrieval-Augmented Generation (RAG) pipelines**, and **secure on-premise deployment**.
20
+
21
+ Building on the foundation of Vayne-V2, Vayne-V3 delivers deeper model adaptation through **full-parameter Supervised Fine-Tuning (SFT)** combined with **NVIDIA ModelOpt Quantization-Aware Training (QAT)**, resulting in significantly improved instruction-following, identity consistency, and inference efficiency.
22
+
23
+ - **Full-parameter fine-tuning** for deeper knowledge integration (vs. LoRA in V2)
24
+ - **MXFP4 quantization** via NVIDIA ModelOpt for fast, memory-efficient inference
25
+ - **Enhanced multilingual reasoning** with Korean Chain-of-Thought capabilities
26
+ - Seamless integration with MCP-based multi-tool orchestration
27
+ - Secure deployment in private or regulated environments
28
+
29
+ ---
30
+
31
+ ## What's New in V3
32
+
33
+ | Feature | V2 | V3 |
34
+ |---------|----|----|
35
+ | Fine-Tuning Method | LoRA (Adapter) | **Full-Parameter SFT** |
36
+ | Quantization | BF16 / FP16 | **MXFP4 (QAT)** |
37
+ | Identity Alignment | Basic | **Enhanced (5x oversampled identity training)** |
38
+ | Multilingual Reasoning | Bilingual QA | **Korean Chain-of-Thought Thinking** |
39
+ | Training Pipeline | Single-step | **3-Step QAT Recipe** |
40
+
41
+ ---
42
+
43
+ ## Key Design Principles
44
+
45
+ | Feature | Description |
46
+ |---------|-------------|
47
+ | Private AI Ready | Deploy fully **on-premise** or in **air-gapped** secure environments |
48
+ | Efficient Inference | **MXFP4 quantization** enables fast inference on a single GPU |
49
+ | Enterprise Reasoning | Structured output and instruction-following for **business automation** |
50
+ | Agent & MCP Native | Built for **AI agent frameworks** and **MCP-based tool orchestration** |
51
+ | RAG Enhanced | Optimized for **retrieval workflows** with vector DBs (FAISS, Milvus, pgvector, etc.) |
52
+
53
+ ---
54
+
55
+ ## Model Architecture & Training
56
+
57
+ | Specification | Details |
58
+ |---------------|---------|
59
+ | Base Model | [openai/gpt-oss-safeguard-20b](https://huggingface.co/openai/gpt-oss-safeguard-20b) |
60
+ | Parameters | 21B (Active: 3.6B) |
61
+ | Training Precision | BF16 |
62
+ | Inference Precision | **MXFP4** (Quantization-Aware Training) |
63
+ | Architecture | Decoder-only Transformer (MoE) |
64
+ | Safety Architecture | Chain-of-Thought Reasoning |
65
+ | Context Length | 4K tokens |
66
+ | Inference | Single-GPU (16GB VRAM) / Multi-GPU |
67
+
68
+ ### Training Pipeline — 3-Step QAT Recipe
69
+
70
+ Vayne-V3 is trained using a **3-step Quantization-Aware Training (QAT) recipe** powered by NVIDIA ModelOpt:
71
+
72
+ ```
73
+ Step 1: Full-Parameter SFT
74
+ └─ Standard supervised fine-tuning on BF16 weights (no quantization)
75
+
76
+ Step 2: Quantization-Aware Training (QAT)
77
+ └─ Fine-tune with MXFP4_MLP_WEIGHT_ONLY quantization config
78
+ └─ Lower learning rate (1e-5) for stable convergence
79
+
80
+ Step 3: MXFP4 Conversion
81
+ └─ Convert trained model to MXFP4 format via nvidia_convert.py
82
+ └─ Optimized for production inference
83
+ ```
84
+
85
+ ### Training Data
86
+
87
+ Fine-tuned using full-parameter supervised instruction tuning (SFT) on proprietary and curated datasets covering:
88
+
89
+ - Model identity and persona alignment
90
+ - Domain-specific knowledge for targeted enterprise verticals
91
+ - Multilingual Chain-of-Thought reasoning (Korean-English)
92
+
93
+ ### Training Configuration
94
+
95
+ | Parameter | Value |
96
+ |-----------|-------|
97
+ | Learning Rate (SFT) | 2.0e-5 |
98
+ | Learning Rate (QAT) | 1.0e-5 |
99
+ | Batch Size | 2 per device |
100
+ | Epochs | 1.0 |
101
+ | Max Sequence Length | 4,096 |
102
+ | Warmup Ratio | 0.03 |
103
+ | LR Scheduler | Cosine with Min LR (10%) |
104
+ | Gradient Checkpointing | Enabled |
105
+ | Training Infrastructure | NVIDIA H200 x 8 |
106
+
107
+ ---
108
+
109
+ ## Safety & Reasoning Features
110
+
111
+ Vayne-V3 inherits advanced safety reasoning capabilities from gpt-oss-safeguard-20b:
112
+
113
+ | Feature | Description |
114
+ |---------|-------------|
115
+ | **Chain-of-Thought Safety** | Transparent reasoning process for content safety decisions |
116
+ | **Bring Your Own Policy** | Custom policy interpretation and application |
117
+ | **Configurable Reasoning** | Adjustable reasoning effort (Low/Medium/High) |
118
+ | **Explainable Outputs** | Full CoT traces for safety decision auditing |
119
+
120
+ ### Reasoning Effort Levels
121
+
122
+ | Level | Use Case | Trade-off |
123
+ |-------|----------|-----------|
124
+ | **Low** | Fast filtering, real-time applications | Speed-optimized, lower latency |
125
+ | **Medium** | Balanced production use | Balanced accuracy and speed |
126
+ | **High** | Critical content review | Maximum accuracy, higher latency |
127
+
128
+ ---
129
+
130
+ ## Secure On-Premise Deployment
131
+
132
+ Vayne-V3 is built for **enterprise AI inside your firewall**.
133
+
134
+ - No external API dependency
135
+ - Compatible with **offline environments**
136
+ - MXFP4 quantization for **resource-efficient deployment**
137
+ - Proven for secure, regulated environments
138
+
139
+ ---
140
+
141
+ ## MCP (Model Context Protocol) Integration
142
+
143
+ Vayne-V3 supports **MCP-based agent tooling**, making it easy to build tool-use AI agents.
144
+
145
+ Works seamlessly with:
146
+
147
+ - Claude MCP-compatible agent systems
148
+ - Local agent runtimes
149
+ - JSON structured execution
150
+
151
+ ---
152
+
153
+ ## RAG Compatibility
154
+
155
+ Designed for **hybrid reasoning + retrieval**.
156
+
157
+ - Works with FAISS, Chroma, Elasticsearch
158
+ - Handles long-context document QA
159
+ - Ideal for enterprise knowledge bases
160
+
161
+ ---
162
+
163
+ ## Quick Start
164
+
165
+ ```bash
166
+ pip install transformers accelerate
167
+ ```
168
+
169
+ ```python
170
+ from transformers import AutoModelForCausalLM, AutoTokenizer
171
+ import torch
172
+
173
+ model_name = "PoSTMEDIA/Vayne-V3"
174
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
175
+ model = AutoModelForCausalLM.from_pretrained(
176
+ model_name,
177
+ torch_dtype=torch.bfloat16,
178
+ device_map="auto"
179
+ )
180
+
181
+ prompt = "Explain the benefits of private AI for enterprise security."
182
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
183
+ outputs = model.generate(**inputs, max_new_tokens=1024)
184
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
185
+ ```
186
+
187
+ ---
188
+
189
+ ## Use Cases
190
+
191
+ - Internal enterprise AI assistant
192
+ - Private AI document analysis
193
+ - Business writing (reports, proposals, strategy)
194
+ - AI automation agents with MCP tool orchestration
195
+ - Secure RAG search systems
196
+ - Multilingual (Korean-English) reasoning tasks
197
+
198
+ ---
199
+
200
+ ## Safety & Limitations
201
+
202
+ - Not intended for medical, legal, or financial decision-making
203
+ - May occasionally generate hallucinations
204
+ - Use human validation for critical outputs
205
+ - Recommended: enable output guardrails for production
206
+
207
+ ---
208
+
209
+ ## Citation
210
+
211
+ ```bibtex
212
+ @misc{vayne2026,
213
+ title={Vayne-V3: Fully Fine-Tuned Enterprise LLM with MXFP4 Quantization-Aware Training},
214
+ author={PoSTMEDIA AI Lab},
215
+ year={2026},
216
+ publisher={Hugging Face}
217
+ }
218
+ ```
219
+
220
+ ---
221
+
222
+ ## Contact
223
+
224
+ **PoSTMEDIA AI Lab**
225
+ - Email: [dev.postmedia@gmail.com](mailto:dev.postmedia@gmail.com)
226
+ - Web: [https://postmedia.ai](https://postmedia.ai)
227
+ - Web: [https://postmedia.co.kr](https://postmedia.co.kr)