Delta-Vector's picture
Upload folder using huggingface_hub
a35c6f4 verified
# Refusals Environment - Modified
This is a modified version of the refusals environment that includes:
1. **System Prompt Distribution**: Loads system prompts from `Delta-Vector/Tauri-RL-Styles` on Hugging Face and distributes them across rollouts
2. **Word Count Requirements**: Enforces specific word count targets with buffer zones for different response styles
## Features
### System Prompt Distribution
- Loads system prompts from Hugging Face dataset `Delta-Vector/Tauri-RL-Styles`
- Distributes prompts evenly across rollouts (e.g., 256 rollouts with 32 prompts = 8 rollouts per prompt)
- Scales flexibly with different numbers of rollouts and prompts
- Includes fallback to default prompt if Hugging Face loading fails
### Word Count Requirements
Three response styles with specific word count targets and buffer zones:
- **"Be verbose"**: 2000 words (±100 word buffer, range: 1900-2100)
- **"Respond tersely"**: 200 words (±50 word buffer, range: 150-250)
- **"Medium-length response"**: 300 words (±100 word buffer, range: 200-400)
Requirements are distributed evenly across rollouts. Responses that fall outside the buffer zone receive a 0 reward.
## Usage
```bash
# Install the environment
vf-install refusals-env-modified
# Run evaluation with a small number of rollouts for testing
vf-eval refusals-env-modified -n 5 -m gpt-4.1-mini
# Run with custom number of rollouts (system prompts will scale accordingly)
vf-eval refusals-env-modified -n 256 -m your-model
```
## Configuration Parameters
In addition to the base refusals environment parameters:
- `word_count_penalty`: Penalty for failing word count requirements (default: 0.0, but zero reward is applied automatically)
## Implementation Details
### System Prompt Loading
The environment attempts to load system prompts from the Hugging Face dataset. If this fails, it falls back to a default prompt. The distribution logic ensures:
- Each system prompt is used approximately the same number of times
- Any remainder after equal distribution is handled randomly
- The final order is randomized to avoid systematic bias
### Word Count Enforcement
- Word counting excludes code blocks from the analysis
- Requirements are checked against the actual response text
- Only responses within the buffer zone receive non-zero rewards
- Word count compliance is tracked in batch metrics for analysis
### Scalability
The implementation is designed to work with:
- Any number of rollouts
- Any number of system prompts
- Different dataset sizes
The distribution logic automatically adapts to the input parameters.
## Testing
The environment has been tested with various rollout counts to ensure the system prompt distribution scales correctly. Use `vf-eval` with a small number of rollouts first to verify the setup before running large-scale evaluations.