Add files using upload-large-folder tool

1145a14 verified 1 day ago

5.43 kB

license: cc-by-nc-4.0
base_model: Qwen/Qwen2.5-7B-Instruct
pipeline_tag: text-generation
library_name: peft
language:
  - en
  - zh
tags:
  - hypernetwork
  - hyper-lora
  - lora
  - role-play
  - character-impersonation
  - sft
  - phase-tree
datasets:
  - IAAR-Shanghai/phase_tree_data

PHASE-Tree Hyper-LoRA SFT (anchor run)

Variant: Warm-start, lr=5e-6 (anchor run)

The anchor SFT run: hypernet warm-started from the PHASE-Tree pretrained checkpoint and fine-tuned at a conservative learning rate of 5e-6 with label smoothing 0.1 and NEFTune noise 5.0. This is the checkpoint reported in the PHASE-Tree paper.

During development, six hyper-LoRA SFT cells were trained — an ablation grid over initialisation (warm-start vs cold-start), learning rate (5e-6 vs 1e-5), and trainable vs frozen hypernet output heads. Only this anchor cell is bundled here; the other five are kept locally for reproducibility.

What is a hypermod?

A hypermod (hyper-modulator) is a hypernetwork that, conditioned on a character profile embedding, emits a low-rank LoRA delta ΔW = AB for each target layer of the base model at inference time. The base model weights are never updated; only the hypernet is trained. A single hypermod therefore generalises across an open-ended set of personas without needing to store a separate adapter per character.

Files

File	Purpose
`hypermod.pt`	Recommended checkpoint. The anchor SFT step selected from per-step LLM-as-judge ratings (`character`, `semantic`) and Qwen3-Embedding-4B response-vs-reference cosine similarity.
`args.yaml`	Full training configuration; consumed by the loader to instantiate the hypernet architecture.
`adapter_config.json`	LoRA target-module stub (rank 8, alpha 16, `q_proj` + `v_proj`).
`timing_stats.json`	Wall-clock breakdown of the training run (training / validation / other overhead, in seconds).

Per-step snapshots (checkpoints/it_5000 … it_40000) and the post-hoc evaluation artefacts (eval_ckpt_judge_scores/, eval_ckpt_val_loss/) generated during training are not bundled with this release. They can be regenerated by re-running src/scripts/train_phase_tree_qwen_7b.sh followed by the evaluation scripts under src/scripts/.

How to load

from huggingface_hub import snapshot_download
from hyper_llm_modulator.hyper_modulator import load_hypermod_checkpoint

ckpt_dir = snapshot_download("<your-hf-username>/PHASE-Tree-hyper-lora-anchor")

(
    args, hypermod, base_model, tokenizer,
    emb_model, emb_tokenizer, task_desc_format_fn, pooling_fn,
) = load_hypermod_checkpoint(f"{ckpt_dir}/hypermod.pt", device="cuda")

The loader reads args.yaml and adapter_config.json from the same directory as hypermod.pt automatically. The full inference pipeline (profile → embedding → per-layer LoRA → generation) lives in the PHASE-Tree codebase.

Training configuration

Hyperparameter	Value
Base model	`Qwen/Qwen2.5-7B-Instruct`
Task encoder	`Qwen/Qwen3-Embedding-4B`
Initialisation	Warm-start from `phase_tree_models/phase_tree_pretrained/hypermod.pt`
Target modules	`q_proj`, `v_proj`
LoRA rank `r`	8
LoRA alpha	16
LoRA dropout	0.05
Hypernet latent size	1024
Hypernet head input size	2048
Freeze hypernet heads	`false`
Optimizer steps	40000
Effective batch size	8 (per-device 4 × grad-accum 2)
Learning rate	5e-6
Warmup fraction	0.05
Weight decay	0.01
Label smoothing	0.1
NEFTune noise α	5.0
Checkpoint cadence	every 5000 steps
Random seed	42

The complete configuration (including dataset lists, sampler settings, and fusion-module placeholders kept for loader compatibility) lives in args.yaml.

Training data

The hypermod is jointly fine-tuned on the train splits of the eight PHASE-Tree character-dialogue datasets (RAIDEN, CharacterEval, HPD, SimsConv, ChatHaruhi, Friends, StarTrek_TNG, TheOffice), m6_phase_tree profile variant. Sampling follows the hierarchical sqrt_size strategy with 6 tasks × 2 points per batch.

Evaluation

The released hypermod.pt was selected from per-step snapshots of the training run by scoring predictions on a held-out evaluation set along three axes:

character (1–5) — profile-consistency rating by an LLM judge (see evaluation/persona_rubric.md in the PHASE-Tree codebase for the rubric).
semantic (1–5) — contextual-coherence rating by the same judge.
embedding — cosine similarity of the predicted and reference response embeddings computed with Qwen3-Embedding-4B.

The per-step intermediate snapshots and full evaluation artefacts produced during model selection are not bundled (see the note above the loading section); they can be regenerated from a re-training run via the scripts under src/scripts/.

Limitations

Persona conditioning is mediated entirely by the profile embedding fed into the task encoder; the model has no other persona-control surface.
Generations may reproduce stylistic biases of the source corpora; intended for research evaluation only.
The checkpoint depends on the PHASE-Tree codebase for inference and is not a drop-in peft.PeftModel: adapter_config.json describes only which layers receive a generated LoRA, not directly loadable weights.