--- license: cc-by-nc-4.0 base_model: Qwen/Qwen2.5-7B-Instruct pipeline_tag: text-generation library_name: peft language: - en - zh tags: - hypernetwork - hyper-lora - lora - role-play - character-impersonation - sft - phase-tree datasets: - IAAR-Shanghai/phase_tree_data --- # PHASE-Tree Hyper-LoRA SFT (anchor run) **Variant:** Warm-start, lr=5e-6 (anchor run) The **anchor** SFT run: hypernet warm-started from the PHASE-Tree pretrained checkpoint and fine-tuned at a conservative learning rate of 5e-6 with label smoothing 0.1 and NEFTune noise 5.0. This is the checkpoint reported in the PHASE-Tree paper. During development, six hyper-LoRA SFT cells were trained — an ablation grid over initialisation (warm-start vs cold-start), learning rate (5e-6 vs 1e-5), and trainable vs frozen hypernet output heads. Only this anchor cell is bundled here; the other five are kept locally for reproducibility. ## What is a hypermod? A **hypermod** (hyper-modulator) is a hypernetwork that, conditioned on a character profile embedding, emits a low-rank LoRA delta `ΔW = AB` for each target layer of the base model at inference time. The base model weights are never updated; only the hypernet is trained. A single hypermod therefore generalises across an open-ended set of personas without needing to store a separate adapter per character. ## Files | File | Purpose | |------|---------| | `hypermod.pt` | **Recommended checkpoint.** The anchor SFT step selected from per-step LLM-as-judge ratings (`character`, `semantic`) and Qwen3-Embedding-4B response-vs-reference cosine similarity. | | `args.yaml` | Full training configuration; consumed by the loader to instantiate the hypernet architecture. | | `adapter_config.json` | LoRA target-module stub (rank 8, alpha 16, `q_proj` + `v_proj`). | | `timing_stats.json` | Wall-clock breakdown of the training run (training / validation / other overhead, in seconds). | > Per-step snapshots (`checkpoints/it_5000` … `it_40000`) and the post-hoc > evaluation artefacts (`eval_ckpt_judge_scores/`, `eval_ckpt_val_loss/`) > generated during training are **not bundled** with this release. They can > be regenerated by re-running `src/scripts/train_phase_tree_qwen_7b.sh` > followed by the evaluation scripts under `src/scripts/`. ## How to load ```python from huggingface_hub import snapshot_download from hyper_llm_modulator.hyper_modulator import load_hypermod_checkpoint ckpt_dir = snapshot_download("/PHASE-Tree-hyper-lora-anchor") ( args, hypermod, base_model, tokenizer, emb_model, emb_tokenizer, task_desc_format_fn, pooling_fn, ) = load_hypermod_checkpoint(f"{ckpt_dir}/hypermod.pt", device="cuda") ``` The loader reads `args.yaml` and `adapter_config.json` from the same directory as `hypermod.pt` automatically. The full inference pipeline (profile → embedding → per-layer LoRA → generation) lives in the PHASE-Tree codebase. ## Training configuration | Hyperparameter | Value | |----------------|-------| | Base model | `Qwen/Qwen2.5-7B-Instruct` | | Task encoder | `Qwen/Qwen3-Embedding-4B` | | Initialisation | Warm-start from `phase_tree_models/phase_tree_pretrained/hypermod.pt` | | Target modules | `q_proj`, `v_proj` | | LoRA rank `r` | 8 | | LoRA alpha | 16 | | LoRA dropout | 0.05 | | Hypernet latent size | 1024 | | Hypernet head input size | 2048 | | Freeze hypernet heads | `false` | | Optimizer steps | 40000 | | Effective batch size | 8 (per-device 4 × grad-accum 2) | | Learning rate | 5e-6 | | Warmup fraction | 0.05 | | Weight decay | 0.01 | | Label smoothing | 0.1 | | NEFTune noise α | 5.0 | | Checkpoint cadence | every 5000 steps | | Random seed | 42 | The complete configuration (including dataset lists, sampler settings, and fusion-module placeholders kept for loader compatibility) lives in `args.yaml`. ## Training data The hypermod is jointly fine-tuned on the *train* splits of the eight PHASE-Tree character-dialogue datasets (RAIDEN, CharacterEval, HPD, SimsConv, ChatHaruhi, Friends, StarTrek_TNG, TheOffice), `m6_phase_tree` profile variant. Sampling follows the hierarchical `sqrt_size` strategy with 6 tasks × 2 points per batch. ## Evaluation The released `hypermod.pt` was selected from per-step snapshots of the training run by scoring predictions on a held-out evaluation set along three axes: - **`character` (1–5)** — profile-consistency rating by an LLM judge (see `evaluation/persona_rubric.md` in the PHASE-Tree codebase for the rubric). - **`semantic` (1–5)** — contextual-coherence rating by the same judge. - **`embedding`** — cosine similarity of the predicted and reference response embeddings computed with Qwen3-Embedding-4B. The per-step intermediate snapshots and full evaluation artefacts produced during model selection are not bundled (see the note above the loading section); they can be regenerated from a re-training run via the scripts under `src/scripts/`. ## Limitations - Persona conditioning is mediated entirely by the profile embedding fed into the task encoder; the model has no other persona-control surface. - Generations may reproduce stylistic biases of the source corpora; intended for research evaluation only. - The checkpoint depends on the PHASE-Tree codebase for inference and is not a drop-in `peft.PeftModel`: `adapter_config.json` describes only which layers receive a generated LoRA, not directly loadable weights.