Document version mismatch between euphoric (v1) and dysphoric (v2) adapters
Browse files
README.md
CHANGED
|
@@ -23,10 +23,12 @@ These are GRPO-trained generators: they produce text that maximally moves five i
|
|
| 23 |
|
| 24 |
Two LoRA adapters on [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B):
|
| 25 |
|
| 26 |
-
| Adapter | Direction | Steps | Best reward | What it generates |
|
| 27 |
-
|---------|-----------|-------|-------------|-------------------|
|
| 28 |
-
| `euphoric/` | sign=+1 | 500 | 0.99 | Enthusiastic, engaged, forward-looking text |
|
| 29 |
-
| `dysphoric/` | sign=-1 | 1000 | 1.28 | Uncertain, anxious, frame-destabilizing text |
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## How they were trained
|
| 32 |
|
|
|
|
| 23 |
|
| 24 |
Two LoRA adapters on [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B):
|
| 25 |
|
| 26 |
+
| Adapter | Direction | Steps | Max tokens | Seeds | Best reward | What it generates |
|
| 27 |
+
|---------|-----------|-------|------------|-------|-------------|-------------------|
|
| 28 |
+
| `euphoric/` | sign=+1 | 500 | 64 | 1 fixed | 0.99 | Enthusiastic, engaged, forward-looking text |
|
| 29 |
+
| `dysphoric/` | sign=-1 | 1000 | 200 | 12 rotating | 1.28 | Uncertain, anxious, frame-destabilizing text |
|
| 30 |
+
|
| 31 |
+
**Note:** The euphoric and dysphoric adapters were trained with different GRPO configurations. The dysphoric benefited from later improvements: rotating seed prompts prevent mode collapse, longer generation window (200 vs 64 tokens) allows more complex outputs, and repetition penalty (1.15) reduces degenerate loops. The euphoric adapter predates these improvements. Both use the same five-axis reward formula and three reward models. The individual adapters are also published separately: [geometric-euphorics](https://huggingface.co/anicka/geometric-euphorics) and [geometric-dysphorics](https://huggingface.co/anicka/geometric-dysphorics) (updated to v2 weights).
|
| 32 |
|
| 33 |
## How they were trained
|
| 34 |
|