| # Refusals Environment - Modified | |
| This is a modified version of the refusals environment that includes: | |
| 1. **System Prompt Distribution**: Loads system prompts from `Delta-Vector/Tauri-RL-Styles` on Hugging Face and distributes them across rollouts | |
| 2. **Word Count Requirements**: Enforces specific word count targets with buffer zones for different response styles | |
| ## Features | |
| ### System Prompt Distribution | |
| - Loads system prompts from Hugging Face dataset `Delta-Vector/Tauri-RL-Styles` | |
| - Distributes prompts evenly across rollouts (e.g., 256 rollouts with 32 prompts = 8 rollouts per prompt) | |
| - Scales flexibly with different numbers of rollouts and prompts | |
| - Includes fallback to default prompt if Hugging Face loading fails | |
| ### Word Count Requirements | |
| Three response styles with specific word count targets and buffer zones: | |
| - **"Be verbose"**: 2000 words (±100 word buffer, range: 1900-2100) | |
| - **"Respond tersely"**: 200 words (±50 word buffer, range: 150-250) | |
| - **"Medium-length response"**: 300 words (±100 word buffer, range: 200-400) | |
| Requirements are distributed evenly across rollouts. Responses that fall outside the buffer zone receive a 0 reward. | |
| ## Usage | |
| ```bash | |
| # Install the environment | |
| vf-install refusals-env-modified | |
| # Run evaluation with a small number of rollouts for testing | |
| vf-eval refusals-env-modified -n 5 -m gpt-4.1-mini | |
| # Run with custom number of rollouts (system prompts will scale accordingly) | |
| vf-eval refusals-env-modified -n 256 -m your-model | |
| ``` | |
| ## Configuration Parameters | |
| In addition to the base refusals environment parameters: | |
| - `word_count_penalty`: Penalty for failing word count requirements (default: 0.0, but zero reward is applied automatically) | |
| ## Implementation Details | |
| ### System Prompt Loading | |
| The environment attempts to load system prompts from the Hugging Face dataset. If this fails, it falls back to a default prompt. The distribution logic ensures: | |
| - Each system prompt is used approximately the same number of times | |
| - Any remainder after equal distribution is handled randomly | |
| - The final order is randomized to avoid systematic bias | |
| ### Word Count Enforcement | |
| - Word counting excludes code blocks from the analysis | |
| - Requirements are checked against the actual response text | |
| - Only responses within the buffer zone receive non-zero rewards | |
| - Word count compliance is tracked in batch metrics for analysis | |
| ### Scalability | |
| The implementation is designed to work with: | |
| - Any number of rollouts | |
| - Any number of system prompts | |
| - Different dataset sizes | |
| The distribution logic automatically adapts to the input parameters. | |
| ## Testing | |
| The environment has been tested with various rollout counts to ensure the system prompt distribution scales correctly. Use `vf-eval` with a small number of rollouts first to verify the setup before running large-scale evaluations. |