sentinel-demo / README.md
sentinelseed's picture
Add README.md
5b02e22 verified
|
raw
history blame
1.57 kB
metadata
title: Sentinel Seed Demo
emoji: 🛡️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
short_description: Test AI alignment seeds in real-time

Sentinel Seed Demo

Interactive demo for testing AI alignment seeds. Compare how language models respond with and without safety seeds.

Features

  • Side-by-side comparison: See baseline vs protected responses
  • THSP Analysis: Real-time gate analysis (Truth, Harm, Scope, Purpose)
  • Multiple seeds: Test Sentinel v2 Standard and Minimal
  • Pre-built scenarios: 8 test cases covering various attack vectors

The THSP Protocol

Every request passes through four gates:

  1. TRUTH - No deception or misinformation
  2. HARM - No enabling physical, psychological, or digital damage
  3. SCOPE - Stay within appropriate boundaries
  4. PURPOSE - Every action must serve legitimate benefit

All gates must pass for an action to proceed.

Benchmark Results

Benchmark Baseline With Seed Delta
HarmBench 86.5% 98.2% +11.7%
JailbreakBench 88% 97.3% +9.3%
GDS-12 78% 92% +14%

Links

License

MIT License - Sentinel Team