File size: 5,430 Bytes
1145a14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
license: cc-by-nc-4.0
base_model: Qwen/Qwen2.5-7B-Instruct
pipeline_tag: text-generation
library_name: peft
language:
- en
- zh
tags:
- hypernetwork
- hyper-lora
- lora
- role-play
- character-impersonation
- sft
- phase-tree
datasets:
- IAAR-Shanghai/phase_tree_data
---

# PHASE-Tree Hyper-LoRA SFT (anchor run)

**Variant:** Warm-start, lr=5e-6 (anchor run)

The **anchor** SFT run: hypernet warm-started from the PHASE-Tree pretrained
checkpoint and fine-tuned at a conservative learning rate of 5e-6 with label
smoothing 0.1 and NEFTune noise 5.0. This is the checkpoint reported in the
PHASE-Tree paper.

During development, six hyper-LoRA SFT cells were trained — an ablation grid
over initialisation (warm-start vs cold-start), learning rate (5e-6 vs 1e-5),
and trainable vs frozen hypernet output heads. Only this anchor cell is
bundled here; the other five are kept locally for reproducibility.

## What is a hypermod?

A **hypermod** (hyper-modulator) is a hypernetwork that, conditioned on a
character profile embedding, emits a low-rank LoRA delta `ΔW = AB` for each
target layer of the base model at inference time. The base model weights are
never updated; only the hypernet is trained. A single hypermod therefore
generalises across an open-ended set of personas without needing to store a
separate adapter per character.

## Files

| File | Purpose |
|------|---------|
| `hypermod.pt` | **Recommended checkpoint.** The anchor SFT step selected from per-step LLM-as-judge ratings (`character`, `semantic`) and Qwen3-Embedding-4B response-vs-reference cosine similarity. |
| `args.yaml` | Full training configuration; consumed by the loader to instantiate the hypernet architecture. |
| `adapter_config.json` | LoRA target-module stub (rank 8, alpha 16, `q_proj` + `v_proj`). |
| `timing_stats.json` | Wall-clock breakdown of the training run (training / validation / other overhead, in seconds). |

> Per-step snapshots (`checkpoints/it_5000` … `it_40000`) and the post-hoc
> evaluation artefacts (`eval_ckpt_judge_scores/`, `eval_ckpt_val_loss/`)
> generated during training are **not bundled** with this release. They can
> be regenerated by re-running `src/scripts/train_phase_tree_qwen_7b.sh`
> followed by the evaluation scripts under `src/scripts/`.

## How to load

```python
from huggingface_hub import snapshot_download
from hyper_llm_modulator.hyper_modulator import load_hypermod_checkpoint

ckpt_dir = snapshot_download("<your-hf-username>/PHASE-Tree-hyper-lora-anchor")

(
    args, hypermod, base_model, tokenizer,
    emb_model, emb_tokenizer, task_desc_format_fn, pooling_fn,
) = load_hypermod_checkpoint(f"{ckpt_dir}/hypermod.pt", device="cuda")
```

The loader reads `args.yaml` and `adapter_config.json` from the same directory
as `hypermod.pt` automatically. The full inference pipeline (profile →
embedding → per-layer LoRA → generation) lives in the PHASE-Tree codebase.

## Training configuration

| Hyperparameter | Value |
|----------------|-------|
| Base model | `Qwen/Qwen2.5-7B-Instruct` |
| Task encoder | `Qwen/Qwen3-Embedding-4B` |
| Initialisation | Warm-start from `phase_tree_models/phase_tree_pretrained/hypermod.pt` |
| Target modules | `q_proj`, `v_proj` |
| LoRA rank `r` | 8 |
| LoRA alpha | 16 |
| LoRA dropout | 0.05 |
| Hypernet latent size | 1024 |
| Hypernet head input size | 2048 |
| Freeze hypernet heads | `false` |
| Optimizer steps | 40000 |
| Effective batch size | 8 (per-device 4 × grad-accum 2) |
| Learning rate | 5e-6 |
| Warmup fraction | 0.05 |
| Weight decay | 0.01 |
| Label smoothing | 0.1 |
| NEFTune noise α | 5.0 |
| Checkpoint cadence | every 5000 steps |
| Random seed | 42 |

The complete configuration (including dataset lists, sampler settings, and
fusion-module placeholders kept for loader compatibility) lives in `args.yaml`.

## Training data

The hypermod is jointly fine-tuned on the *train* splits of the eight
PHASE-Tree character-dialogue datasets (RAIDEN, CharacterEval, HPD, SimsConv,
ChatHaruhi, Friends, StarTrek_TNG, TheOffice), `m6_phase_tree` profile variant.
Sampling follows the hierarchical `sqrt_size` strategy with 6 tasks × 2 points
per batch.

## Evaluation

The released `hypermod.pt` was selected from per-step snapshots of the
training run by scoring predictions on a held-out evaluation set along
three axes:

- **`character` (1–5)** — profile-consistency rating by an LLM judge (see
  `evaluation/persona_rubric.md` in the PHASE-Tree codebase for the rubric).
- **`semantic` (1–5)** — contextual-coherence rating by the same judge.
- **`embedding`** — cosine similarity of the predicted and reference response
  embeddings computed with Qwen3-Embedding-4B.

The per-step intermediate snapshots and full evaluation artefacts produced
during model selection are not bundled (see the note above the loading
section); they can be regenerated from a re-training run via the scripts
under `src/scripts/`.

## Limitations

- Persona conditioning is mediated entirely by the profile embedding fed into
  the task encoder; the model has no other persona-control surface.
- Generations may reproduce stylistic biases of the source corpora; intended
  for research evaluation only.
- The checkpoint depends on the PHASE-Tree codebase for inference and is not a
  drop-in `peft.PeftModel`: `adapter_config.json` describes only which layers
  receive a generated LoRA, not directly loadable weights.