Add files using upload-large-folder tool

1145a14 verified 2 days ago

3.74 kB

	---
	license: cc-by-nc-4.0
	base_model: Qwen/Qwen2.5-7B-Instruct
	pipeline_tag: text-generation
	library_name: peft
	language:
	- en
	- zh
	tags:
	- hypernetwork
	- hyper-lora
	- lora
	- role-play
	- character-impersonation
	- pretraining
	- phase-tree
	datasets:
	- IAAR-Shanghai/phase_tree_data
	---

	# PHASE-Tree Pretrained Hypermod

	Hypernetwork pretrained on the PHASE-Tree character-dialogue corpus on top of
	[`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).

	This is the warm-start checkpoint consumed by the SFT runs released under
	`phase_tree_models/sft/hyper_lora/`. It is not intended as a stand-alone
	inference checkpoint — for character-conditioned generation, the SFT runs are
	recommended.

	> The pretraining-stage training schedule (full dataset list, optimizer
	> schedule, etc.) is not bundled with this release. Only the fields required
	> by `load_hypermod_checkpoint` (path resolution + hypermod architecture) are
	> retained in `args.yaml`; the SFT runs in `phase_tree_models/sft/hyper_lora/`
	> carry the complete training configurations for their respective fine-tuning
	> stages.

	## What is a hypermod?

	A hypermod (hyper-modulator) is a hypernetwork that, conditioned on a
	character profile embedding, emits a low-rank LoRA delta `ΔW = AB` for each
	target layer of the base model on the fly. The base model weights themselves
	are never updated; only the hypernet is trained. At inference time the
	hypernet generates a personalised LoRA per character, giving one model that
	covers an open-ended set of personas without needing to store per-character
	adapters.

	## Files

	\| File \| Purpose \|
	\|------\|---------\|
	\| `hypermod.pt` \| The released pretrained hypermod (it_20000 of the original pretraining run). Use this as the entry point. \|
	\| `args.yaml` \| Architecture and loader metadata (no training schedule — this checkpoint is meant to be consumed, not resumed). \|
	\| `adapter_config.json` \| LoRA target-module stub (rank 8, alpha 16, `q_proj` + `v_proj`). \|

	## How to load

	```python
	from huggingface_hub import snapshot_download
	from hyper_llm_modulator.hyper_modulator import load_hypermod_checkpoint

	ckpt_dir = snapshot_download("<your-hf-username>/PHASE-Tree-pretrained-hypermod")

	(
	args, hypermod, base_model, tokenizer,
	emb_model, emb_tokenizer, task_desc_format_fn, pooling_fn,
	) = load_hypermod_checkpoint(f"{ckpt_dir}/hypermod.pt", device="cuda")
	```

	The loader reads `args.yaml` and `adapter_config.json` from the same directory
	as `hypermod.pt` automatically; you do not need to pass them explicitly. The
	full inference pipeline (profile → embedding → per-layer LoRA → generation)
	lives in the PHASE-Tree codebase.

	## Architecture

	\| Component \| Value \|
	\|-----------\|-------\|
	\| Base model \| `Qwen/Qwen2.5-7B-Instruct` \|
	\| Task encoder \| `Qwen/Qwen3-Embedding-4B` \|
	\| Target modules \| `q_proj`, `v_proj` \|
	\| LoRA rank `r` \| 8 \|
	\| LoRA alpha \| 16 \|
	\| LoRA dropout \| 0.05 \|
	\| Hypernet latent size \| 1024 \|
	\| Hypernet head input size \| 2048 \|
	\| `delta_w` scaling \| 100 \|

	## Use as warm-start

	SFT runs whose `args.yaml` sets

	```yaml
	init_hypermod_from: phase_tree_models/phase_tree_pretrained/hypermod.pt
	```

	consume this checkpoint as the initial hypernet weights. This is the
	warm-start used by the released anchor SFT run under
	`phase_tree_models/sft/hyper_lora/`.

	## Limitations

	- This is a pretraining checkpoint; downstream SFT is required for
	competitive character-fidelity scores.
	- Persona conditioning is mediated entirely by the profile embedding fed into
	the task encoder; the model has no other persona-control mechanism.
	- Generations may reproduce stylistic biases of the source corpora and are
	intended for research evaluation only.