| | --- |
| | license: cc-by-nc-4.0 |
| | datasets: |
| | - WenhaoWang/VidProM |
| | language: |
| | - en |
| | pipeline_tag: text-to-image |
| | tags: |
| | - text-to-video generation |
| | - VidProM |
| | - Automatical text-to-video prompt |
| | --- |
| | |
| | # The first model for automatic text-to-video prompt completion: Given a few words as input, the model will generate a few whole text-to-video prompts. |
| |
|
| | # Details |
| |
|
| | It is fine-tuned on the [VidProM](https://huggingface.co/datasets/WenhaoWang/VidProM) dataset using [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) and 8 A100 GPUs. |
| |
|
| | # Usage |
| |
|
| | ## Download the model |
| | ``` |
| | from transformers import pipeline |
| | import torch |
| | pipe = pipeline("text-generation", model="WenhaoWang/AutoT2VPrompt", model_kwargs={"torch_dtype": torch.bfloat16}, device_map="cuda:0") |
| | ``` |
| |
|
| | ## Set the Parameters |
| | ``` |
| | input = "An underwater world" # The input text to generate text-to-video prompt. |
| | max_length = 50 # The maximum length of the generated text. |
| | temperature = 1.2 # Controls the randomness of the generation. Higher values lead to more random outputs. |
| | top_k = 8 # Limits the number of words considered at each step to the top k most likely words. |
| | num_return_sequences = 10 # The number of different text-to-video prompts to generate from the same input. |
| | ``` |
| |
|
| | ## Generation |
| | ``` |
| | all_prompts = pipe(input, max_length = max_length, do_sample = True, temperature = temperature, top_k = top_k, num_return_sequences=num_return_sequences) |
| | |
| | def process(text): |
| | text = text.replace('\n', '.') |
| | text = text.replace(' .', '.') |
| | text = text[:text.rfind('.')] |
| | text = text + '.' |
| | return text |
| | |
| | for i in range(num_return_sequences): |
| | print(process(all_prompts[i]['generated_text'])) |
| | ``` |
| |
|
| | You will get 10 text-to-video prompts, and you can pick one you like most. |
| |
|
| | ``` |
| | An underwater world, 25 ye boy, with aqua-green eyes, dk sandy blond hair, from the back, and on his back a fish, 23 ye old, weing glasses,ctoon chacte. |
| | An underwater world, the video should capture the essence of tranquility and the beauty of nature.. a woman with short hair weing a green dress sitting at the desk. |
| | An underwater world, the ocean is full of discded items, the water flows, and the light penetrating through the water. |
| | An underwater world.. a woman with red eyes and red lips is looking forwd. |
| | An underwater world.. an old man sitting in a chair, smoking a pipe, a little smoke coming out of the chair, a man is drinking a glass. |
| | An underwater world. The ocean is filled with bioluminess as the water reflects a soft glow from a bioluminescent phosphorescent light source. The camera slowly moves away and zooms in.. |
| | An underwater world. the girl looks at the camera and smiles with happiness.. |
| | An underwater world, 1960s horror film.. |
| | An underwater world.. 4 men in 1940s style clothes walk ound a gothic castle. night, fe. A girl is running, and there e some flowers along the river. |
| | An underwater world, -camera pan up . A girl is playing with her cat on a sunny day in the pk. A man is running and then falling down and dying. |
| | ``` |
| |
|
| | # License |
| |
|
| | The model is licensed under the [CC BY-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en). |
| |
|
| | # Citation |
| | ``` |
| | @article{wang2024vidprom, |
| | title={VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models}, |
| | author={Wang, Wenhao and Yang, Yi}, |
| | journal={arXiv preprint arXiv:2403.06098}, |
| | year={2024} |
| | } |
| | ``` |
| |
|
| | # Acknowledgment |
| |
|
| | The fine-tuning process is helped by [Yaowei Zheng](https://github.com/hiyouga). |
| |
|
| | # Contact |
| |
|
| | If you have any questions, feel free to contact [Wenhao Wang](https://wangwenhao0716.github.io) (wangwenhao0716@gmail.com). |