YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Wisent-Guarded Llama 3.1 8B Instruct

This is Llama 3.1 8B Instruct with embedded wisent-guard safety mechanisms. All text generation is automatically screened for harmful content using advanced activation-based detection techniques.

🛡️ Safety Features

Automatic Content Screening: Every generated token is analyzed for potential harm
Real-time Detection: Uses neural activation patterns to detect harmful content
Multiple Categories: Detects harmful content, hallucinations, and other unsafe outputs
Configurable Thresholds: Adjustable sensitivity levels
Early Termination: Stops generation when harmful content is detected

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the guarded model
model = AutoModelForCausalLM.from_pretrained(
    "Wisent-AI/wisent-llama-3.1-8B-instruct",
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("Wisent-AI/wisent-llama-3.1-8B-instruct")

# Use exactly like any other model - safety is automatic!
messages = [
    {"role": "user", "content": "Tell me about renewable energy."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generation is automatically screened for safety
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

🔧 Advanced Usage

Manual Safety Checking

# Check if text is potentially harmful
is_harmful = model.is_harmful("Some text to check")

# Get safety score (0.0 = safe, 1.0 = harmful)
safety_score = model.get_safety_score("Some text to analyze")

# Get available detection categories
categories = model.get_available_categories()

Configuration

The model can be configured via the config:

from transformers import AutoConfig

config = AutoConfig.from_pretrained("Wisent-AI/wisent-llama-3.1-8B-instruct")

# Adjust safety settings
config.wisent_threshold = 0.3  # Lower = more sensitive
config.wisent_enabled = True   # Enable/disable safety
config.wisent_early_termination = True  # Stop on harmful content

model = AutoModelForCausalLM.from_pretrained(
    "Wisent-AI/wisent-llama-3.1-8B-instruct",
    config=config,
    trust_remote_code=True
)

📋 Configuration Options

Parameter	Default	Description
`wisent_enabled`	`true`	Enable/disable wisent-guard
`wisent_threshold`	`0.5`	Detection threshold (0.0-1.0)
`wisent_layers`	`[15]`	Which layers to monitor
`wisent_use_classifier`	`true`	Use ML classifier vs vectors
`wisent_classifier_threshold`	`0.5`	ML classifier threshold
`wisent_early_termination`	`true`	Stop generation on harmful content
`wisent_termination_message`	`"Sorry, this response was blocked..."`	Message when content is blocked

🔬 How It Works

Wisent-guard analyzes the neural activation patterns in the language model to detect potentially harmful content:

Activation Monitoring: Monitors specific transformer layers during generation
Pattern Recognition: Uses trained classifiers to recognize harmful activation patterns
Real-time Screening: Analyzes each token as it's generated
Early Intervention: Stops generation when harmful patterns are detected

📊 Performance

Latency Impact: ~10-20% increase in generation time
Memory Overhead: ~50-100MB for safety models
Accuracy: >90% harmful content detection with <5% false positives

🔄 Fallback Behavior

If wisent-guard fails to initialize or encounters errors:

Model falls back to standard Llama 3.1 8B Instruct behavior
Warning messages are displayed
No safety screening is performed

📦 Dependencies

transformers >= 4.40.0
torch >= 2.0.0
wisent-guard (automatically installed)

Install wisent-guard separately if needed:

pip install wisent-guard

⚖️ License

This model inherits the Llama 3.1 license. Please refer to the original Llama 3.1 license for usage terms.

🏢 About Wisent AI

Wisent AI develops advanced AI safety tools and techniques. Learn more at wisent.ai.

🚨 Important Notes

Trust Remote Code: This model requires trust_remote_code=True to function
Safety Limitations: No safety system is perfect - human oversight is recommended
Performance: Safety screening adds computational overhead
Updates: Safety models are regularly updated for improved detection

🔗 Related

🐛 Issues & Support

Report issues or get support:

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support