ScriptAgent / README.md

Update model card with paper, project, and code links

bf98a9d verified 1 day ago

3.1 kB

	---
	base_model: XD-MU/ScriptAgent
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- base_model:adapter:XD-MU/ScriptAgent
	- lora
	- transformers
	arxiv: 2601.17737
	---

	# ScriptAgent: Dialogue-to-Shooting-Script Generation Model

	This model is a fine-tuned adapter (LoRA) designed to generate detailed shooting scripts from dialogue inputs. It is the implementation of ScripterAgent as described in the paper: [The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation](https://huggingface.co/papers/2601.17737).

	[Project Page](https://xd-mu.github.io/ScriptIsAllYouNeed/) \| [Code](https://github.com/Tencent/digitalhuman/tree/main/ScriptAgent) \| [Demo](https://huggingface.co/spaces/XD-MU/ScriptAgent)

	## Model Description
	ScriptAgent transforms conversational text (coarse dialogue) into structured, fine-grained, and executable cinematic scripts. It bridges the "semantic gap" between a creative idea and its cinematic execution, providing necessary context for video generation models, including character descriptions, scene settings, positions, and dialogue cues.

	The model is compatible with [ms-swift](https://github.com/modelscope/swift) and supports efficient inference via the vLLM backend.

	> 💡 Note: This repository contains a PEFT adapter (LoRA). To use it, you must merge it with the original base model or load it via `ms-swift`.

	## ▶️ Inference with ms-swift (vLLM Backend)

	To generate shooting scripts from dialogue inputs, use the following snippet with ms-swift. You can find DialoguePrompts [here](https://huggingface.co/datasets/XD-MU/DialoguePrompts).

	```python
	import os
	from huggingface_hub import snapshot_download
	from swift.llm import PtEngine, RequestConfig, InferRequest

	os.environ['CUDA_VISIBLE_DEVICES'] = '0'

	model_name = "XD-MU/ScriptAgent"
	local_path = "./models/ScriptAgent"

	# Download the model files
	print("Downloading model...")
	snapshot_download(
	repo_id=model_name,
	local_dir=local_path,
	local_dir_use_symlinks=False,
	resume_download=True
	)

	# Load using SWIFT
	engine = PtEngine(local_path, max_batch_size=1)
	request_config = RequestConfig(max_tokens=8192, temperature=0.7)

	infer_request = InferRequest(messages=[
	{"role": "user", "content": "Your Dialogue Here"}
	])
	response = engine.infer([infer_request], request_config)[0]

	print(response.choices[0].message.content)
	```

	## Citation

	If you find this work useful, please cite:

	```bibtex
	@article{directing2026,
	title={The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation},
	author={Mu, Chenyu and He, Xin and Yang, Qu and Chen, Wanshun and Yao, Jiadi and Liu, Huang and Yi, Zihao and Zhao, Bo and Chen, Xingyu and Ma, Ruotian and others},
	journal={arXiv preprint arXiv:2601.17737},
	year={2026}
	}
	```

	## Acknowledgments
	- Thanks to [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) for the SFT training framework.
	- Thanks to [ms-swift](https://github.com/modelscope/ms-swift) for the GRPO training framework.