File size: 4,143 Bytes
61e57ee a9e5ce0 61e57ee a9e5ce0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | ---
license: mit
tags:
- text-to-video
- prompt-engineering
- video-generation
- llm
- rag
- research
datasets:
- junchenfu/llmpopcorn_prompts
- junchenfu/microlens_rag
pipeline_tag: text-generation
---
# LLMPopcorn Usage Instructions
Welcome to LLMPopcorn! This guide will help you generate video titles and prompts, as well as create AI-generated videos based on those prompts.
## Prerequisites
### Install Required Python Packages
Before running the scripts, ensure that you have installed the necessary Python packages. You can do this by executing the following command:
```bash
pip install torch transformers diffusers tqdm numpy pandas sentence-transformers faiss-cpu openai huggingface_hub safetensors
```
**Download the MicroLens Dataset**:
Download the following files from the [MicroLens dataset](https://github.com/westlake-repl/MicroLens) and place them in the `Microlens/` folder:
| File | Description |
|------|-------------|
| `MicroLens-100k_likes_and_views.txt` | Video engagement stats (tab-separated) |
| `MicroLens-100k_title_en.csv` | Cover image descriptions (comma-separated) |
| `Microlens100K_captions_en.csv` | Video captions in English (tab-separated) |
| `MicroLens-100k_comment_en.txt` | User comments (tab-separated) |
| `tags_to_summary.csv` | Video category tags (comma-separated) |
Your directory structure should look like:
```
LLMPopcorn/
βββ Microlens/
β βββ MicroLens-100k_likes_and_views.txt
β βββ MicroLens-100k_title_en.csv
β βββ Microlens100K_captions_en.csv
β βββ MicroLens-100k_comment_en.txt
β βββ tags_to_summary.csv
βββ PE.py
βββ pipline.py
βββ ...
```
## Step 1: Generate Video Titles and Prompts
To generate video titles and prompts, run the `LLMPopcorn.py` script:
```bash
python LLMPopcorn.py
```
To enhance LLMPopcorn, execute the `PE.py` script:
```bash
python PE.py
```
## Step 2: Generate AI Videos
To create AI-generated videos, execute the `generating_images_videos_three.py` script:
```bash
python generating_images_videos_three.py
```
## Step 3: Clone the Evaluation Code
Then, following the instructions in the MMRA repository, you can evaluate the generated videos.
## Tutorial: Using the Prompts Dataset
You can easily download and use the structured prompts directly from Hugging Face:
### 1. Install `datasets`
```bash
pip install datasets
```
### 2. Load the Dataset in Python
```python
from datasets import load_dataset
# Load the LLMPopcorn prompts
dataset = load_dataset("junchenfu/llmpopcorn_prompts")
# Access the data (abstract or concrete)
for item in dataset["train"]:
print(f"Type: {item['type']}, Prompt: {item['prompt']}")
```
This dataset contains both abstract and concrete prompts, which you can use as input for the video generation scripts in Step 2.
## RAG Reference Dataset: MicroLens
For the RAG-enhanced pipeline (`PE.py` + `pipline.py`), we provide a pre-processed version of the MicroLens dataset on Hugging Face so you don't need to download and process the raw files manually.
The dataset is available at: [**junchenfu/microlens_rag**](https://huggingface.co/datasets/junchenfu/microlens_rag)
It contains **19,560** video entries across **22 categories** with the following fields:
| Column | Description |
|--------|-------------|
| `video_id` | Unique video identifier |
| `title_en` | Cover image description (used as title) |
| `cover_desc` | Cover image description |
| `caption_en` | Full video caption in English |
| `partition` | Video category (e.g., Anime, Game, Delicacy) |
| `likes` | Number of likes |
| `views` | Number of views |
| `comment_count` | Number of comments (used as popularity signal) |
### Load the RAG Dataset in Python
```python
from datasets import load_dataset
rag_dataset = load_dataset("junchenfu/microlens_rag")
# Access as a pandas DataFrame
df = rag_dataset["train"].to_pandas()
print(df.head())
print(f"Total: {len(df)} videos, {df['partition'].nunique()} categories")
```
|