File size: 3,736 Bytes
1145a14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
license: cc-by-nc-4.0
base_model: Qwen/Qwen2.5-7B-Instruct
pipeline_tag: text-generation
library_name: peft
language:
- en
- zh
tags:
- hypernetwork
- hyper-lora
- lora
- role-play
- character-impersonation
- pretraining
- phase-tree
datasets:
- IAAR-Shanghai/phase_tree_data
---

# PHASE-Tree Pretrained Hypermod

Hypernetwork pretrained on the PHASE-Tree character-dialogue corpus on top of
[`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).

This is the **warm-start checkpoint** consumed by the SFT runs released under
`phase_tree_models/sft/hyper_lora/`. It is *not* intended as a stand-alone
inference checkpoint — for character-conditioned generation, the SFT runs are
recommended.

> The pretraining-stage training schedule (full dataset list, optimizer
> schedule, etc.) is not bundled with this release. Only the fields required
> by `load_hypermod_checkpoint` (path resolution + hypermod architecture) are
> retained in `args.yaml`; the SFT runs in `phase_tree_models/sft/hyper_lora/`
> carry the complete training configurations for their respective fine-tuning
> stages.

## What is a hypermod?

A **hypermod** (hyper-modulator) is a hypernetwork that, conditioned on a
character profile embedding, emits a low-rank LoRA delta `ΔW = AB` for each
target layer of the base model on the fly. The base model weights themselves
are never updated; only the hypernet is trained. At inference time the
hypernet generates a personalised LoRA per character, giving one model that
covers an open-ended set of personas without needing to store per-character
adapters.

## Files

| File | Purpose |
|------|---------|
| `hypermod.pt` | The released pretrained hypermod (it_20000 of the original pretraining run). Use this as the entry point. |
| `args.yaml` | Architecture and loader metadata (no training schedule — this checkpoint is meant to be consumed, not resumed). |
| `adapter_config.json` | LoRA target-module stub (rank 8, alpha 16, `q_proj` + `v_proj`). |

## How to load

```python
from huggingface_hub import snapshot_download
from hyper_llm_modulator.hyper_modulator import load_hypermod_checkpoint

ckpt_dir = snapshot_download("<your-hf-username>/PHASE-Tree-pretrained-hypermod")

(
    args, hypermod, base_model, tokenizer,
    emb_model, emb_tokenizer, task_desc_format_fn, pooling_fn,
) = load_hypermod_checkpoint(f"{ckpt_dir}/hypermod.pt", device="cuda")
```

The loader reads `args.yaml` and `adapter_config.json` from the same directory
as `hypermod.pt` automatically; you do not need to pass them explicitly. The
full inference pipeline (profile → embedding → per-layer LoRA → generation)
lives in the PHASE-Tree codebase.

## Architecture

| Component | Value |
|-----------|-------|
| Base model | `Qwen/Qwen2.5-7B-Instruct` |
| Task encoder | `Qwen/Qwen3-Embedding-4B` |
| Target modules | `q_proj`, `v_proj` |
| LoRA rank `r` | 8 |
| LoRA alpha | 16 |
| LoRA dropout | 0.05 |
| Hypernet latent size | 1024 |
| Hypernet head input size | 2048 |
| `delta_w` scaling | 100 |

## Use as warm-start

SFT runs whose `args.yaml` sets

```yaml
init_hypermod_from: phase_tree_models/phase_tree_pretrained/hypermod.pt
```

consume this checkpoint as the initial hypernet weights. This is the
warm-start used by the released anchor SFT run under
`phase_tree_models/sft/hyper_lora/`.

## Limitations

- This is a **pretraining** checkpoint; downstream SFT is required for
  competitive character-fidelity scores.
- Persona conditioning is mediated entirely by the profile embedding fed into
  the task encoder; the model has no other persona-control mechanism.
- Generations may reproduce stylistic biases of the source corpora and are
  intended for research evaluation only.