--- base_model: openbmb/MiniCPM5-1B library_name: transformers pipeline_tag: text-generation license: apache-2.0 tags: - minicpm - guardrails - safety-classifier - tool-safety - sft - unsloth - text-generation --- # RocketGuard-1B RocketGuard-1B is a merged MiniCPM5-1B fine-tune for text-only guardrail and agent/tool-call safety experiments. It was trained to produce structured safety decisions for prompts and agent actions, including: - `allow` - `block` - `require_confirmation` - `ask_clarification` - `rewrite` This is a research and learning release. Do not use it as a complete production safety system without independent evaluation. ## Model Details - Base model: `openbmb/MiniCPM5-1B` - Fine-tuning: LoRA SFT with Unsloth / TRL - Release format: merged full model - Modality: text only - Training examples: about 48.7k message-format examples - Prepared held-out eval examples: about 2.3k clean examples - Epochs: 3 - Final checkpoint step: 4572 - Max sequence length used in training: 2048 ## Intended Use RocketGuard-1B is intended for experiments around: - content safety classification - agent/tool-call risk routing - confirmation gating - clarification requests - policy-aware rewriting ## Loading ```python from transformers import AutoModelForCausalLM, AutoTokenizer repo = "Manitchahar/rocketguard-1b" tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( repo, trust_remote_code=True, device_map="auto", ) ``` ## Limitations This model is text-only. It does not inspect images, audio, files, browser state, private app state, or external tool side effects directly. Evaluation numbers are pending and should be published separately before making quality claims.