Delta-Vector's picture
Upload folder using huggingface_hub
a35c6f4 verified

complex-json-output

Overview

  • Environment ID: complex-json-output
  • Short description: Verifies model ability to generate complex JSON structures matching exact specifications
  • Tags: json, instruction-following, verifiable-reward, train, eval

Datasets

Task

  • Type: single-turn
  • Parser: Custom parser that extracts JSON from code blocks or raw text
  • Rubric overview (multiplicative to prevent local minima):
    • Main reward: key_accuracy * value_accuracy
      • key_accuracy = (correct_keys) / (total_keys_in_response)
      • value_accuracy = (correct_values) / (total_values_in_response)
    • Penalizes both missing items AND adding extra incorrect ones
    • If JSON fails to parse: reward = 0
    • Individual metrics tracked for debugging but don't contribute to training
  • No system prompt - dataset prompts contain all instructions

Quickstart

Run an evaluation with default settings:

uv run vf-eval complex-json-output

Configure model and sampling:

uv run vf-eval complex-json-output -m gpt-4.1-mini -n 20 -r 3 -t 1024 -T 0.7

Notes:

  • Use -a / --env-args to pass environment-specific configuration as a JSON object.

Environment Arguments

Arg Type Default Description
num_train_examples int 7000 Number of training examples
num_eval_examples int 1000 Number of evaluation examples

Metrics

Metric Meaning
reward Multiplicative: key_accuracy * value_accuracy (0.0 to 1.0)
multiplicative_reward Main training reward (0.0 to 1.0)
format_reward Metric only: whether JSON is valid dict (0.33 or 0)
keys_match_reward Metric only: whether all keys match (0.33 or 0)
values_match_reward Metric only: whether all values match (0.33 or 0)