complex-json-output
Overview
- Environment ID:
complex-json-output - Short description: Verifies model ability to generate complex JSON structures matching exact specifications
- Tags: json, instruction-following, verifiable-reward, train, eval
Datasets
- Primary dataset(s): Delta-Vector/Tauri-Complex-JSON-Formatting
- Source links: https://huggingface.co/datasets/Delta-Vector/Tauri-Complex-JSON-Formatting
- Split sizes: 7000 train, 1000 eval (default)
Task
- Type: single-turn
- Parser: Custom parser that extracts JSON from code blocks or raw text
- Rubric overview (multiplicative to prevent local minima):
- Main reward:
key_accuracy * value_accuracykey_accuracy = (correct_keys) / (total_keys_in_response)value_accuracy = (correct_values) / (total_values_in_response)
- Penalizes both missing items AND adding extra incorrect ones
- If JSON fails to parse: reward = 0
- Individual metrics tracked for debugging but don't contribute to training
- Main reward:
- No system prompt - dataset prompts contain all instructions
Quickstart
Run an evaluation with default settings:
uv run vf-eval complex-json-output
Configure model and sampling:
uv run vf-eval complex-json-output -m gpt-4.1-mini -n 20 -r 3 -t 1024 -T 0.7
Notes:
- Use
-a/--env-argsto pass environment-specific configuration as a JSON object.
Environment Arguments
| Arg | Type | Default | Description |
|---|---|---|---|
num_train_examples |
int | 7000 |
Number of training examples |
num_eval_examples |
int | 1000 |
Number of evaluation examples |
Metrics
| Metric | Meaning |
|---|---|
reward |
Multiplicative: key_accuracy * value_accuracy (0.0 to 1.0) |
multiplicative_reward |
Main training reward (0.0 to 1.0) |
format_reward |
Metric only: whether JSON is valid dict (0.33 or 0) |
keys_match_reward |
Metric only: whether all keys match (0.33 or 0) |
values_match_reward |
Metric only: whether all values match (0.33 or 0) |