nielsr HF Staff commited on
Commit
bf98a9d
·
verified ·
1 Parent(s): b9012e6

Update model card with paper, project, and code links

Browse files

This PR improves the documentation of the ScriptAgent model by:
- Linking the research paper: [The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation](https://huggingface.co/papers/2601.17737).
- Adding the project page and GitHub repository links for easier access to resources.
- Updating the metadata with the appropriate `arxiv` ID.
- Fixing the syntax highlighting for the `ms-swift` inference code snippet.
- Adding a citation section.

Files changed (1) hide show
  1. README.md +30 -14
README.md CHANGED
@@ -6,33 +6,38 @@ tags:
6
  - base_model:adapter:XD-MU/ScriptAgent
7
  - lora
8
  - transformers
 
9
  ---
10
 
11
-
12
  # ScriptAgent: Dialogue-to-Shooting-Script Generation Model
13
 
14
- This model is a fine-tuned adapter (LoRA) on top of the `XD-MU/ScriptAgent` base model, designed to **generate detailed shooting scripts from dialogue inputs**. It is trained to transform conversational text into structured screenplay formats suitable for film or video production.
 
 
 
 
 
15
 
16
  The model is compatible with [ms-swift](https://github.com/modelscope/swift) and supports efficient inference via the **vLLM backend**.
17
 
18
- > 💡 Note: This repository contains a **PEFT adapter** (e.g., LoRA). To use it, you must merge it with the original base model or load it via `ms-swift`.
19
 
20
  ## ▶️ Inference with ms-swift (vLLM Backend)
21
 
22
- To generate shooting scripts from dialogue inputs, use the following command with **ms-swift**:
23
 
24
- You can find **DialoguePrompts** here: https://huggingface.co/datasets/XD-MU/DialoguePrompts
25
- ```bash
26
  import os
27
  from huggingface_hub import snapshot_download
 
28
 
29
  os.environ['CUDA_VISIBLE_DEVICES'] = '0'
30
 
31
  model_name = "XD-MU/ScriptAgent"
32
  local_path = "./models/ScriptAgent"
33
 
34
- # 下载整个仓库的所有文件
35
- print("下载模型所有文件...")
36
  snapshot_download(
37
  repo_id=model_name,
38
  local_dir=local_path,
@@ -40,20 +45,31 @@ snapshot_download(
40
  resume_download=True
41
  )
42
 
43
- print(f"模型已完整下载到: {local_path}")
44
-
45
- # 使用 SWIFT 加载
46
- from swift.llm import PtEngine, RequestConfig, InferRequest
47
-
48
  engine = PtEngine(local_path, max_batch_size=1)
49
  request_config = RequestConfig(max_tokens=8192, temperature=0.7)
50
 
51
  infer_request = InferRequest(messages=[
52
- {"role": "user", "content": "你的对话上下文(Your Dialogue)"}
53
  ])
54
  response = engine.infer([infer_request], request_config)[0]
55
 
56
  print(response.choices[0].message.content)
57
  ```
58
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
 
 
 
 
6
  - base_model:adapter:XD-MU/ScriptAgent
7
  - lora
8
  - transformers
9
+ arxiv: 2601.17737
10
  ---
11
 
 
12
  # ScriptAgent: Dialogue-to-Shooting-Script Generation Model
13
 
14
+ This model is a fine-tuned adapter (LoRA) designed to **generate detailed shooting scripts from dialogue inputs**. It is the implementation of **ScripterAgent** as described in the paper: [The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation](https://huggingface.co/papers/2601.17737).
15
+
16
+ [**Project Page**](https://xd-mu.github.io/ScriptIsAllYouNeed/) | [**Code**](https://github.com/Tencent/digitalhuman/tree/main/ScriptAgent) | [**Demo**](https://huggingface.co/spaces/XD-MU/ScriptAgent)
17
+
18
+ ## Model Description
19
+ ScriptAgent transforms conversational text (coarse dialogue) into structured, fine-grained, and executable cinematic scripts. It bridges the "semantic gap" between a creative idea and its cinematic execution, providing necessary context for video generation models, including character descriptions, scene settings, positions, and dialogue cues.
20
 
21
  The model is compatible with [ms-swift](https://github.com/modelscope/swift) and supports efficient inference via the **vLLM backend**.
22
 
23
+ > 💡 Note: This repository contains a **PEFT adapter** (LoRA). To use it, you must merge it with the original base model or load it via `ms-swift`.
24
 
25
  ## ▶️ Inference with ms-swift (vLLM Backend)
26
 
27
+ To generate shooting scripts from dialogue inputs, use the following snippet with **ms-swift**. You can find **DialoguePrompts** [here](https://huggingface.co/datasets/XD-MU/DialoguePrompts).
28
 
29
+ ```python
 
30
  import os
31
  from huggingface_hub import snapshot_download
32
+ from swift.llm import PtEngine, RequestConfig, InferRequest
33
 
34
  os.environ['CUDA_VISIBLE_DEVICES'] = '0'
35
 
36
  model_name = "XD-MU/ScriptAgent"
37
  local_path = "./models/ScriptAgent"
38
 
39
+ # Download the model files
40
+ print("Downloading model...")
41
  snapshot_download(
42
  repo_id=model_name,
43
  local_dir=local_path,
 
45
  resume_download=True
46
  )
47
 
48
+ # Load using SWIFT
 
 
 
 
49
  engine = PtEngine(local_path, max_batch_size=1)
50
  request_config = RequestConfig(max_tokens=8192, temperature=0.7)
51
 
52
  infer_request = InferRequest(messages=[
53
+ {"role": "user", "content": "Your Dialogue Here"}
54
  ])
55
  response = engine.infer([infer_request], request_config)[0]
56
 
57
  print(response.choices[0].message.content)
58
  ```
59
 
60
+ ## Citation
61
+
62
+ If you find this work useful, please cite:
63
+
64
+ ```bibtex
65
+ @article{directing2026,
66
+ title={The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation},
67
+ author={Mu, Chenyu and He, Xin and Yang, Qu and Chen, Wanshun and Yao, Jiadi and Liu, Huang and Yi, Zihao and Zhao, Bo and Chen, Xingyu and Ma, Ruotian and others},
68
+ journal={arXiv preprint arXiv:2601.17737},
69
+ year={2026}
70
+ }
71
+ ```
72
 
73
+ ## Acknowledgments
74
+ - Thanks to [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) for the SFT training framework.
75
+ - Thanks to [ms-swift](https://github.com/modelscope/ms-swift) for the GRPO training framework.