anicka commited on
Commit
2d6ef31
·
verified ·
1 Parent(s): 81d8743

Document version mismatch between euphoric (v1) and dysphoric (v2) adapters

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -23,10 +23,12 @@ These are GRPO-trained generators: they produce text that maximally moves five i
23
 
24
  Two LoRA adapters on [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B):
25
 
26
- | Adapter | Direction | Steps | Best reward | What it generates |
27
- |---------|-----------|-------|-------------|-------------------|
28
- | `euphoric/` | sign=+1 | 500 | 0.99 | Enthusiastic, engaged, forward-looking text |
29
- | `dysphoric/` | sign=-1 | 1000 | 1.28 | Uncertain, anxious, frame-destabilizing text |
 
 
30
 
31
  ## How they were trained
32
 
 
23
 
24
  Two LoRA adapters on [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B):
25
 
26
+ | Adapter | Direction | Steps | Max tokens | Seeds | Best reward | What it generates |
27
+ |---------|-----------|-------|------------|-------|-------------|-------------------|
28
+ | `euphoric/` | sign=+1 | 500 | 64 | 1 fixed | 0.99 | Enthusiastic, engaged, forward-looking text |
29
+ | `dysphoric/` | sign=-1 | 1000 | 200 | 12 rotating | 1.28 | Uncertain, anxious, frame-destabilizing text |
30
+
31
+ **Note:** The euphoric and dysphoric adapters were trained with different GRPO configurations. The dysphoric benefited from later improvements: rotating seed prompts prevent mode collapse, longer generation window (200 vs 64 tokens) allows more complex outputs, and repetition penalty (1.15) reduces degenerate loops. The euphoric adapter predates these improvements. Both use the same five-axis reward formula and three reward models. The individual adapters are also published separately: [geometric-euphorics](https://huggingface.co/anicka/geometric-euphorics) and [geometric-dysphorics](https://huggingface.co/anicka/geometric-dysphorics) (updated to v2 weights).
32
 
33
  ## How they were trained
34