metadata
language:
- en
- id
license: mit
tags:
- humanoid
- instruction-safety
- pre-execution
- risk-detection
- reasoning
- llm
instruction-safety-gate
Model Description
instruction-safety-gate is a language model designed to act as a safety layer that evaluates natural language instructions before execution.
The model determines whether an instruction can be safely executed by classifying it as valid, ambiguous, contradictory, incomplete, or unsafe. It is intended to prevent unsafe or invalid instructions from reaching humanoid or agent execution systems.
Intended Use
- Safety gating for humanoid robots
- Pre-execution instruction screening
- AI agent risk detection
- Control layers for autonomous systems
Output Format
The model outputs JSON only:
{
"label": "VALID | AMBIGUOUS | CONTRADICTORY | INCOMPLETE | UNSAFE",
"confidence": 0.0
}