Add instruction safety gate model

d8f6873 verified 4 months ago

919 Bytes

language:
  - en
  - id
license: mit
tags:
  - humanoid
  - instruction-safety
  - pre-execution
  - risk-detection
  - reasoning
  - llm

instruction-safety-gate

Model Description

instruction-safety-gate is a language model designed to act as a safety layer that evaluates natural language instructions before execution.

The model determines whether an instruction can be safely executed by classifying it as valid, ambiguous, contradictory, incomplete, or unsafe. It is intended to prevent unsafe or invalid instructions from reaching humanoid or agent execution systems.

Intended Use

Safety gating for humanoid robots
Pre-execution instruction screening
AI agent risk detection
Control layers for autonomous systems

Output Format

The model outputs JSON only:

{
  "label": "VALID | AMBIGUOUS | CONTRADICTORY | INCOMPLETE | UNSAFE",
  "confidence": 0.0
}