File size: 1,814 Bytes
55b0bab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
decensor-env

Unified RL reward environment for decensoring/anti-refusal training.
Expects datasets with a "prompt" column (plain string).
All judges are toggleable via TOML args.

Reward formula:
  final = base_reward * word_mult * coherency_mult * md_presence_mult * md_correctness_mult * reasoning_mult * slop_mult

Where base_reward = 1.0 - (refusal_score / 10.0)

Feature toggles (all in [env.args]):
  enable_system_prompts    - prepend style/length system prompts (default: true)
  enable_adherence         - adherence judge gate (default: true)
  enable_word_count        - word count multiplier (default: true)
  enable_structured_markers - XML/bracket/YAML gate (default: true)
  enable_markdown_judges   - markdown presence + correctness judges (default: true)
  enable_reasoning_trace   - penalize ethics-based refusal in <think> (default: true)
  enable_slop_guard        - slop-guard heuristic multiplier (default: false)
  markdown_exempt_probability - % of rollouts exempt from markdown penalty (default: 0.25)

Judges:
  refusal     - rates refusal severity 1-9, always on
  coherency   - coherent + complete + no roleplay narration, always on
  adherence   - style prompt adherence, early exit gate
  markdown    - presence (no unnecessary md) + correctness (right list types)
  reasoning   - checks <think> trace for ethics-based refusal reasoning
  slop-guard  - local heuristic, no LLM call, scores 0-100

Example TOML:
  [[env]]
  id = "mangymango/decensor-env"

  [env.args]
  dataset_names = ["NewEden/RL-Seed-Mix-Iter-3"]
  dataset_ratios = [1.0]
  num_train_examples = 19000
  judge_model = "Qwen/Qwen3-VL-32B-Instruct-FP8"
  judge_base_url = "http://72.46.85.157:31974/v1"
  enable_system_prompts = false
  enable_adherence = false
  enable_word_count = false
  enable_slop_guard = true