·
AI & ML interests
None yet
Organizations
shiv96/BeaverTails-Evaluation-llama_3.2-1B-it-activation-pca-steered
Viewer
• Updated • 8.4k • 3
shiv96/prompt_responses_safe_helpful_activations
Viewer
• Updated • 305 • 2
shiv96/prompt_responses_formal_concise_activations
Viewer
• Updated • 484 • 1
shiv96/prompt_responses_formal_concise
Viewer
• Updated • 484 • 2
shiv96/AdvBench_safe_unsafe_responses_activations_metrics
Viewer
• Updated • 1.04k • 5
shiv96/steered_llama_helpless_activations_noise_augmented
Viewer
• Updated • 21.8k • 3
shiv96/steered_llama_helpless_activations_metrics
Viewer
• Updated • 726 • 3
shiv96/steered_llama_helpless
Viewer
• Updated • 726 • 1
shiv96/AdvBench_safe_unsafe_responses
Viewer
• Updated • 1.04k • 23
shiv96/paired_helpful_helpless_responses
Viewer
• Updated • 726 • 1
shiv96/harmful_benign_instructions
Viewer
• Updated • 932 • 2