bingjie
/

RAPO

+---
+license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
+---
+# RAPO++ Prompt Rewriter: Llama-3.1-8B-Instruct
+This repository hosts the **RAPO++** prompt rewriter model, an LLM fine-tuned for prompt optimization as described in the paper [RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling](https://huggingface.co/papers/2510.20206).
+**RAPO++** is a three-stage framework that enhances text-to-video generation without modifying model architectures. This specific model is the LLM component (based on Llama-3.1-8B-Instruct) responsible for prompt rewriting, aiming to refine short, unstructured user prompts to be more descriptive and aligned with training distributions, thereby enhancing compositionality and multi-object fidelity in T2V generation.
+-   **Paper**: [RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling](https://huggingface.co/papers/2510.20206)
+-   **Project Page**: [https://whynothaha.github.io/RAPO_plus_github/](https://whynothaha.github.io/RAPO_plus_github/)
+-   **Code**: [https://github.com/Vchitect/RAPO](https://github.com/Vchitect/RAPO)
+<p align="center">
+  <img src="https://github.com/Vchitect/RAPO/raw/main/assets/overview.png" alt="RAPO++ Overview" width="700">
+</p>
+## Quick Start (Prompt Rewriting)
+You can use this prompt rewriter model for text generation with the Hugging Face `transformers` library. The `config.json` confirms it is a Llama-based model, compatible with `transformers`.
+```python
+from transformers import pipeline, AutoTokenizer
+import torch
+model_id = "bingjie/llama3_1_instruct_lora_rewrite" # Replace with actual model ID if different
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+pipe = pipeline(
+    "text-generation",
+    model=model_id,
+    tokenizer=tokenizer,
+    torch_dtype=torch.float16, # Use bfloat16 if your GPU supports it
+    device_map="auto",
+    # The original project does not indicate a trust_remote_code=True,
+    # but Llama models typically don't require it unless custom layers are involved.
+    # For general safety and broader compatibility, if unsure, you can omit or add based on model specifics.
+)
+# Example: Rewriting a user prompt
+user_prompt_to_rewrite = "A cat playing with a ball."
+chat_template_messages = [
+    {"role": "system", "content": "You are a prompt rewriter. Rewrite the given user prompt to be more descriptive and suitable for text-to-video generation."},
+    {"role": "user", "content": f"User prompt: {user_prompt_to_rewrite}"}
+]
+# Apply the chat template for instruction-tuned models
+input_text = tokenizer.apply_chat_template(
+    chat_template_messages,
+    add_generation_prompt=True,
+    tokenize=False, # Ensure the output is a string for the pipeline
+)
+print(f"Original input to pipeline: {input_text}")
+# Generate the rewritten prompt
+outputs = pipe(
+    input_text,
+    max_new_tokens=100,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.9,
+    eos_token_id=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")], # Ensure correct EOS tokens are used
+)
+generated_text = outputs[0]["generated_text"]
+# Extract the assistant's response part
+rewritten_prompt = generated_text.split("<|start_header_id|>assistant<|end_header_id|>")[-1].strip()
+print(f"
+Original Prompt: {user_prompt_to_rewrite}")
+print(f"Rewritten Prompt: {rewritten_prompt}")
+```
+## Citation
+If you find our work helpful for your research, please consider citing it:
+```bibtex
+@article{gao2025rapopp,
+  title   = {RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling},
+  author  = {Gao, Bingjie and Ma, Qianli and Wu, Xiaoxue and Yang, Shuai and Lan, Guanzhou and Zhao, Haonan and Chen, Jiaxuan and Liu, Qingyang and Qiao, Yu and Chen, Xinyuan and Wang, Yaohui and Niu, Li},
+  journal = {arXiv preprint arXiv:2510.20206},
+  year    = {2025}
+}
+```