NewEden
/

envs

Model card Files Files and versions

envs / refusals_env_modified /README.md

Delta-Vector's picture

Upload folder using huggingface_hub

a35c6f4 verified 16 days ago

|

history blame contribute delete

2.84 kB

	# Refusals Environment - Modified

	This is a modified version of the refusals environment that includes:

	1. System Prompt Distribution: Loads system prompts from `Delta-Vector/Tauri-RL-Styles` on Hugging Face and distributes them across rollouts
	2. Word Count Requirements: Enforces specific word count targets with buffer zones for different response styles

	## Features

	### System Prompt Distribution
	- Loads system prompts from Hugging Face dataset `Delta-Vector/Tauri-RL-Styles`
	- Distributes prompts evenly across rollouts (e.g., 256 rollouts with 32 prompts = 8 rollouts per prompt)
	- Scales flexibly with different numbers of rollouts and prompts
	- Includes fallback to default prompt if Hugging Face loading fails

	### Word Count Requirements
	Three response styles with specific word count targets and buffer zones:

	- "Be verbose": 2000 words (±100 word buffer, range: 1900-2100)
	- "Respond tersely": 200 words (±50 word buffer, range: 150-250)
	- "Medium-length response": 300 words (±100 word buffer, range: 200-400)

	Requirements are distributed evenly across rollouts. Responses that fall outside the buffer zone receive a 0 reward.

	## Usage

	```bash
	# Install the environment
	vf-install refusals-env-modified

	# Run evaluation with a small number of rollouts for testing
	vf-eval refusals-env-modified -n 5 -m gpt-4.1-mini

	# Run with custom number of rollouts (system prompts will scale accordingly)
	vf-eval refusals-env-modified -n 256 -m your-model
	```

	## Configuration Parameters

	In addition to the base refusals environment parameters:

	- `word_count_penalty`: Penalty for failing word count requirements (default: 0.0, but zero reward is applied automatically)

	## Implementation Details

	### System Prompt Loading
	The environment attempts to load system prompts from the Hugging Face dataset. If this fails, it falls back to a default prompt. The distribution logic ensures:

	- Each system prompt is used approximately the same number of times
	- Any remainder after equal distribution is handled randomly
	- The final order is randomized to avoid systematic bias

	### Word Count Enforcement
	- Word counting excludes code blocks from the analysis
	- Requirements are checked against the actual response text
	- Only responses within the buffer zone receive non-zero rewards
	- Word count compliance is tracked in batch metrics for analysis

	### Scalability
	The implementation is designed to work with:
	- Any number of rollouts
	- Any number of system prompts
	- Different dataset sizes

	The distribution logic automatically adapts to the input parameters.

	## Testing

	The environment has been tested with various rollout counts to ensure the system prompt distribution scales correctly. Use `vf-eval` with a small number of rollouts first to verify the setup before running large-scale evaluations.