--- license: apache-2.0 language: - en - ko base_model: - openai/gpt-oss-safeguard-20b base_model_relation: merge pipeline_tag: text-generation library_name: transformers tags: - sft - trl - transformers - safety - reasoning --- # Vayne-V3 **Vayne-V3** is a **fully fine-tuned, MXFP4-quantized enterprise LLM** built for **AI agent frameworks**, **MCP-based tool orchestration**, **Retrieval-Augmented Generation (RAG) pipelines**, and **secure on-premise deployment**. Building on the foundation of Vayne-V2, Vayne-V3 delivers deeper model adaptation through **full-parameter Supervised Fine-Tuning (SFT)** combined with **NVIDIA ModelOpt Quantization-Aware Training (QAT)**, resulting in significantly improved instruction-following, identity consistency, and inference efficiency. - **Full-parameter fine-tuning** for deeper knowledge integration (vs. LoRA in V2) - **MXFP4 quantization** via NVIDIA ModelOpt for fast, memory-efficient inference - **Enhanced multilingual reasoning** with Korean Chain-of-Thought capabilities - Seamless integration with MCP-based multi-tool orchestration - Secure deployment in private or regulated environments --- ## What's New in V3 | Feature | V2 | V3 | |---------|----|----| | Fine-Tuning Method | LoRA (Adapter) | **Full-Parameter SFT** | | Quantization | BF16 / FP16 | **MXFP4 (QAT)** | | Identity Alignment | Basic | **Enhanced (5x oversampled identity training)** | | Multilingual Reasoning | Bilingual QA | **Korean Chain-of-Thought Thinking** | | Training Pipeline | Single-step | **3-Step QAT Recipe** | --- ## Key Design Principles | Feature | Description | |---------|-------------| | Private AI Ready | Deploy fully **on-premise** or in **air-gapped** secure environments | | Efficient Inference | **MXFP4 quantization** enables fast inference on a single GPU | | Enterprise Reasoning | Structured output and instruction-following for **business automation** | | Agent & MCP Native | Built for **AI agent frameworks** and **MCP-based tool orchestration** | | RAG Enhanced | Optimized for **retrieval workflows** with vector DBs (FAISS, Milvus, pgvector, etc.) | --- ## Model Architecture & Training | Specification | Details | |---------------|---------| | Base Model | [openai/gpt-oss-safeguard-20b](https://huggingface.co/openai/gpt-oss-safeguard-20b) | | Parameters | 21B (Active: 3.6B) | | Training Precision | BF16 | | Inference Precision | **MXFP4** (Quantization-Aware Training) | | Architecture | Decoder-only Transformer (MoE) | | Safety Architecture | Chain-of-Thought Reasoning | | Context Length | 4K tokens | | Inference | Single-GPU (16GB VRAM) / Multi-GPU | ### Training Pipeline — 3-Step QAT Recipe Vayne-V3 is trained using a **3-step Quantization-Aware Training (QAT) recipe** powered by NVIDIA ModelOpt: ``` Step 1: Full-Parameter SFT └─ Standard supervised fine-tuning on BF16 weights (no quantization) Step 2: Quantization-Aware Training (QAT) └─ Fine-tune with MXFP4_MLP_WEIGHT_ONLY quantization config └─ Lower learning rate (1e-5) for stable convergence Step 3: MXFP4 Conversion └─ Convert trained model to MXFP4 format via nvidia_convert.py └─ Optimized for production inference ``` ### Training Data Fine-tuned using full-parameter supervised instruction tuning (SFT) on proprietary and curated datasets covering: - Model identity and persona alignment - Domain-specific knowledge for targeted enterprise verticals - Multilingual Chain-of-Thought reasoning (Korean-English) ### Training Configuration | Parameter | Value | |-----------|-------| | Learning Rate (SFT) | 2.0e-5 | | Learning Rate (QAT) | 1.0e-5 | | Batch Size | 2 per device | | Epochs | 1.0 | | Max Sequence Length | 4,096 | | Warmup Ratio | 0.03 | | LR Scheduler | Cosine with Min LR (10%) | | Gradient Checkpointing | Enabled | | Training Infrastructure | NVIDIA H200 x 8 | --- ## Safety & Reasoning Features Vayne-V3 inherits advanced safety reasoning capabilities from gpt-oss-safeguard-20b: | Feature | Description | |---------|-------------| | **Chain-of-Thought Safety** | Transparent reasoning process for content safety decisions | | **Bring Your Own Policy** | Custom policy interpretation and application | | **Configurable Reasoning** | Adjustable reasoning effort (Low/Medium/High) | | **Explainable Outputs** | Full CoT traces for safety decision auditing | ### Reasoning Effort Levels | Level | Use Case | Trade-off | |-------|----------|-----------| | **Low** | Fast filtering, real-time applications | Speed-optimized, lower latency | | **Medium** | Balanced production use | Balanced accuracy and speed | | **High** | Critical content review | Maximum accuracy, higher latency | --- ## Secure On-Premise Deployment Vayne-V3 is built for **enterprise AI inside your firewall**. - No external API dependency - Compatible with **offline environments** - MXFP4 quantization for **resource-efficient deployment** - Proven for secure, regulated environments --- ## MCP (Model Context Protocol) Integration Vayne-V3 supports **MCP-based agent tooling**, making it easy to build tool-use AI agents. Works seamlessly with: - Claude MCP-compatible agent systems - Local agent runtimes - JSON structured execution --- ## RAG Compatibility Designed for **hybrid reasoning + retrieval**. - Works with FAISS, Chroma, Elasticsearch - Handles long-context document QA - Ideal for enterprise knowledge bases --- ## Quick Start ```bash pip install transformers accelerate ``` ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "PoSTMEDIA/Vayne-V3" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ) prompt = "Explain the benefits of private AI for enterprise security." inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=1024) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Use Cases - Internal enterprise AI assistant - Private AI document analysis - Business writing (reports, proposals, strategy) - AI automation agents with MCP tool orchestration - Secure RAG search systems - Multilingual (Korean-English) reasoning tasks --- ## Safety & Limitations - Not intended for medical, legal, or financial decision-making - May occasionally generate hallucinations - Use human validation for critical outputs - Recommended: enable output guardrails for production --- ## Citation ```bibtex @misc{vayne2026, title={Vayne-V3: Fully Fine-Tuned Enterprise LLM with MXFP4 Quantization-Aware Training}, author={PoSTMEDIA AI Lab}, year={2026}, publisher={Hugging Face} } ``` --- ## Contact **PoSTMEDIA AI Lab** - Email: [dev.postmedia@gmail.com](mailto:dev.postmedia@gmail.com) - Web: [https://postmedia.ai](https://postmedia.ai) - Web: [https://postmedia.co.kr](https://postmedia.co.kr)