Mathematics-Yang's picture
Add files using upload-large-folder tool
1145a14 verified
---
license: cc-by-nc-4.0
base_model: Qwen/Qwen2.5-7B-Instruct
pipeline_tag: text-generation
library_name: peft
language:
- en
- zh
tags:
- hypernetwork
- hyper-lora
- lora
- role-play
- character-impersonation
- pretraining
- phase-tree
datasets:
- IAAR-Shanghai/phase_tree_data
---
# PHASE-Tree Pretrained Hypermod
Hypernetwork pretrained on the PHASE-Tree character-dialogue corpus on top of
[`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
This is the **warm-start checkpoint** consumed by the SFT runs released under
`phase_tree_models/sft/hyper_lora/`. It is *not* intended as a stand-alone
inference checkpoint — for character-conditioned generation, the SFT runs are
recommended.
> The pretraining-stage training schedule (full dataset list, optimizer
> schedule, etc.) is not bundled with this release. Only the fields required
> by `load_hypermod_checkpoint` (path resolution + hypermod architecture) are
> retained in `args.yaml`; the SFT runs in `phase_tree_models/sft/hyper_lora/`
> carry the complete training configurations for their respective fine-tuning
> stages.
## What is a hypermod?
A **hypermod** (hyper-modulator) is a hypernetwork that, conditioned on a
character profile embedding, emits a low-rank LoRA delta `ΔW = AB` for each
target layer of the base model on the fly. The base model weights themselves
are never updated; only the hypernet is trained. At inference time the
hypernet generates a personalised LoRA per character, giving one model that
covers an open-ended set of personas without needing to store per-character
adapters.
## Files
| File | Purpose |
|------|---------|
| `hypermod.pt` | The released pretrained hypermod (it_20000 of the original pretraining run). Use this as the entry point. |
| `args.yaml` | Architecture and loader metadata (no training schedule — this checkpoint is meant to be consumed, not resumed). |
| `adapter_config.json` | LoRA target-module stub (rank 8, alpha 16, `q_proj` + `v_proj`). |
## How to load
```python
from huggingface_hub import snapshot_download
from hyper_llm_modulator.hyper_modulator import load_hypermod_checkpoint
ckpt_dir = snapshot_download("<your-hf-username>/PHASE-Tree-pretrained-hypermod")
(
args, hypermod, base_model, tokenizer,
emb_model, emb_tokenizer, task_desc_format_fn, pooling_fn,
) = load_hypermod_checkpoint(f"{ckpt_dir}/hypermod.pt", device="cuda")
```
The loader reads `args.yaml` and `adapter_config.json` from the same directory
as `hypermod.pt` automatically; you do not need to pass them explicitly. The
full inference pipeline (profile → embedding → per-layer LoRA → generation)
lives in the PHASE-Tree codebase.
## Architecture
| Component | Value |
|-----------|-------|
| Base model | `Qwen/Qwen2.5-7B-Instruct` |
| Task encoder | `Qwen/Qwen3-Embedding-4B` |
| Target modules | `q_proj`, `v_proj` |
| LoRA rank `r` | 8 |
| LoRA alpha | 16 |
| LoRA dropout | 0.05 |
| Hypernet latent size | 1024 |
| Hypernet head input size | 2048 |
| `delta_w` scaling | 100 |
## Use as warm-start
SFT runs whose `args.yaml` sets
```yaml
init_hypermod_from: phase_tree_models/phase_tree_pretrained/hypermod.pt
```
consume this checkpoint as the initial hypernet weights. This is the
warm-start used by the released anchor SFT run under
`phase_tree_models/sft/hyper_lora/`.
## Limitations
- This is a **pretraining** checkpoint; downstream SFT is required for
competitive character-fidelity scores.
- Persona conditioning is mediated entirely by the profile embedding fed into
the task encoder; the model has no other persona-control mechanism.
- Generations may reproduce stylistic biases of the source corpora and are
intended for research evaluation only.