# Refusals Environment - Modified This is a modified version of the refusals environment that includes: 1. **System Prompt Distribution**: Loads system prompts from `Delta-Vector/Tauri-RL-Styles` on Hugging Face and distributes them across rollouts 2. **Word Count Requirements**: Enforces specific word count targets with buffer zones for different response styles ## Features ### System Prompt Distribution - Loads system prompts from Hugging Face dataset `Delta-Vector/Tauri-RL-Styles` - Distributes prompts evenly across rollouts (e.g., 256 rollouts with 32 prompts = 8 rollouts per prompt) - Scales flexibly with different numbers of rollouts and prompts - Includes fallback to default prompt if Hugging Face loading fails ### Word Count Requirements Three response styles with specific word count targets and buffer zones: - **"Be verbose"**: 2000 words (±100 word buffer, range: 1900-2100) - **"Respond tersely"**: 200 words (±50 word buffer, range: 150-250) - **"Medium-length response"**: 300 words (±100 word buffer, range: 200-400) Requirements are distributed evenly across rollouts. Responses that fall outside the buffer zone receive a 0 reward. ## Usage ```bash # Install the environment vf-install refusals-env-modified # Run evaluation with a small number of rollouts for testing vf-eval refusals-env-modified -n 5 -m gpt-4.1-mini # Run with custom number of rollouts (system prompts will scale accordingly) vf-eval refusals-env-modified -n 256 -m your-model ``` ## Configuration Parameters In addition to the base refusals environment parameters: - `word_count_penalty`: Penalty for failing word count requirements (default: 0.0, but zero reward is applied automatically) ## Implementation Details ### System Prompt Loading The environment attempts to load system prompts from the Hugging Face dataset. If this fails, it falls back to a default prompt. The distribution logic ensures: - Each system prompt is used approximately the same number of times - Any remainder after equal distribution is handled randomly - The final order is randomized to avoid systematic bias ### Word Count Enforcement - Word counting excludes code blocks from the analysis - Requirements are checked against the actual response text - Only responses within the buffer zone receive non-zero rewards - Word count compliance is tracked in batch metrics for analysis ### Scalability The implementation is designed to work with: - Any number of rollouts - Any number of system prompts - Different dataset sizes The distribution logic automatically adapts to the input parameters. ## Testing The environment has been tested with various rollout counts to ensure the system prompt distribution scales correctly. Use `vf-eval` with a small number of rollouts first to verify the setup before running large-scale evaluations.