--- base_model: openai/gpt-oss-safeguard-20b base_model_relation: merge library_name: transformers pipeline_tag: text-generation tags: - sft - transformers - trl - safety - reasoning license: apache-2.0 language: - en - ko --- # Vayne-V2 **Vayne-V2** is a **compact, efficient, and high-performance enterprise LLM** optimized for **AI agent frameworks**, **MCP-based tool orchestration**, **Retrieval-Augmented Generation (RAG) pipelines**, and **secure on-premise deployment**. - βœ… Lightweight architecture for fast inference and low resource usage - βš™οΈ Seamless integration with modern AI agent frameworks - πŸ”— Built-in compatibility for MCP-based multi-tool orchestration - πŸ” Optimized for enterprise-grade RAG systems - πŸ›‘οΈ Secure deployment in private or regulated environments --- ## Key Design Principles | Feature | Description | |----------|-------------| | πŸ” Private AI Ready | Deploy fully **on-premise** or in **air-gapped** secure environments | | ⚑ Lightweight Inference | **Single-GPU optimized** architecture for fast and cost-efficient deployment | | 🧠 Enterprise Reasoning | Structured output and instruction-following for **business automation** | | πŸ”§ Agent & MCP Native | Built for **AI agent frameworks** and **MCP-based tool orchestration** | | πŸ” RAG Enhanced | Optimized for **retrieval workflows** with vector DBs (FAISS, Milvus, pgvector, etc.) | --- ## Model Architecture & Training | Specification | Details | |---------------|---------| | 🧬 Base Model | GPT-OSS-Safeguard-20B | | πŸ”’ Parameters | 21B (Active: 3.6B) | | 🎯 Precision | BF16 / FP16 | | 🧱 Architecture | Decoder-only Transformer | | πŸ›‘οΈ Safety Architecture | Chain-of-Thought Reasoning | | πŸ“ Context Length | 4K tokens | | ⚑ Inference | Single-GPU (16GB VRAM) / Multi-GPU | ### Training Data Fine-tuned using supervised instruction tuning (SFT) on: - Enterprise QA datasets - Task reasoning + tool usage instructions - RAG-style retrieval prompts - Business reports & structured communication - Korean–English bilingual QA and translation - Safety reasoning with Chain-of-Thought (CoT) supervision - Policy-based content classification datasets --- ## Safety & Reasoning Features Vayne-V2 inherits advanced safety reasoning capabilities from gpt-oss-safeguard-20b: | Feature | Description | |---------|-------------| | 🧠 **Chain-of-Thought Safety** | Transparent reasoning process for content safety decisions | | πŸ“‹ **Bring Your Own Policy** | Custom policy interpretation and application | | βš–οΈ **Configurable Reasoning** | Adjustable reasoning effort (Low/Medium/High) | | πŸ”¬ **Explainable Outputs** | Full CoT traces for safety decision auditing | ### Reasoning Effort Levels | Level | Use Case | Trade-off | |-------|----------|-----------| | **Low** | Fast filtering, real-time applications | Speed-optimized, lower latency | | **Medium** | Balanced production use | Balanced accuracy and speed | | **High** | Critical content review | Maximum accuracy, higher latency | --- ## Secure On-Premise Deployment Vayne-V2 is built for **enterprise AI inside your firewall**. βœ… No external API dependency βœ… Compatible with **offline environments** βœ… Proven for secure deployments --- ## MCP (Model Context Protocol) Integration Vayne-V2 supports **MCP-based agent tooling**, making it easy to integrate tool-use AI. Works seamlessly with: * Claude MCP-compatible agent systems * Local agent runtimes * JSON structured execution --- ## RAG Compatibility Designed for **hybrid reasoning + retrieval**. βœ… Works with FAISS, Chroma, Elasticsearch βœ… Handles long-context document QA βœ… Ideal for enterprise knowledge bases --- ## Quick Start ```bash pip install transformers peft accelerate bitsandbytes ``` ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "PoSTMEDIA/Vayne-V2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto" ) prompt = "Explain the benefits of private AI for enterprise security." inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_length=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Use Cases βœ… Internal enterprise AI assistant βœ… Private AI document analysis βœ… Business writing (reports, proposals, strategy) βœ… AI automation agents βœ… Secure RAG search systems --- ## Safety & Limitations * Not intended for medical, legal, or financial decision-making * May occasionally generate hallucinations * Use human validation for critical outputs * Recommended: enable output guardrails for production --- ## Citation ```bibtex @misc{vayne2025, title={Vayne-V2: Safety-Enhanced Enterprise LLM with Chain-of-Thought Reasoning}, author={PoSTMEDIA AI Lab}, year={2025}, publisher={Hugging Face} } ``` --- ## Contact **PoSTMEDIA AI Lab** πŸ“§ [dev.postmedia@gmail.com](mailto:dev.postmedia@gmail.com) 🌐 [https://postmedia.ai](https://postmedia.ai) 🌐 [https://postmedia.co.kr](https://postmedia.co.kr) ---