LLMPopcorn Usage Instructions

Welcome to LLMPopcorn! This guide will help you generate video titles and prompts, as well as create AI-generated videos based on those prompts.

Prerequisites

Install Required Python Packages

Before running the scripts, ensure that you have installed the necessary Python packages. You can do this by executing the following command:

pip install torch transformers diffusers tqdm numpy pandas sentence-transformers faiss-cpu openai huggingface_hub safetensors

Download the MicroLens Dataset:
Download the following files from the MicroLens dataset and place them in the Microlens/ folder:

File Description
MicroLens-100k_likes_and_views.txt Video engagement stats (tab-separated)
MicroLens-100k_title_en.csv Cover image descriptions (comma-separated)
Microlens100K_captions_en.csv Video captions in English (tab-separated)
MicroLens-100k_comment_en.txt User comments (tab-separated)
tags_to_summary.csv Video category tags (comma-separated)

Your directory structure should look like:

LLMPopcorn/
β”œβ”€β”€ Microlens/
β”‚   β”œβ”€β”€ MicroLens-100k_likes_and_views.txt
β”‚   β”œβ”€β”€ MicroLens-100k_title_en.csv
β”‚   β”œβ”€β”€ Microlens100K_captions_en.csv
β”‚   β”œβ”€β”€ MicroLens-100k_comment_en.txt
β”‚   └── tags_to_summary.csv
β”œβ”€β”€ PE.py
β”œβ”€β”€ pipline.py
└── ...

Step 1: Generate Video Titles and Prompts

To generate video titles and prompts, run the LLMPopcorn.py script:

python LLMPopcorn.py

To enhance LLMPopcorn, execute the PE.py script:

python PE.py

Step 2: Generate AI Videos

To create AI-generated videos, execute the generating_images_videos_three.py script:

python generating_images_videos_three.py

Step 3: Clone the Evaluation Code

Then, following the instructions in the MMRA repository, you can evaluate the generated videos.

Tutorial: Using the Prompts Dataset

You can easily download and use the structured prompts directly from Hugging Face:

1. Install datasets

pip install datasets

2. Load the Dataset in Python

from datasets import load_dataset

# Load the LLMPopcorn prompts
dataset = load_dataset("junchenfu/llmpopcorn_prompts")

# Access the data (abstract or concrete)
for item in dataset["train"]:
    print(f"Type: {item['type']}, Prompt: {item['prompt']}")

This dataset contains both abstract and concrete prompts, which you can use as input for the video generation scripts in Step 2.

RAG Reference Dataset: MicroLens

For the RAG-enhanced pipeline (PE.py + pipline.py), we provide a pre-processed version of the MicroLens dataset on Hugging Face so you don't need to download and process the raw files manually.

The dataset is available at: junchenfu/microlens_rag

It contains 19,560 video entries across 22 categories with the following fields:

Column Description
video_id Unique video identifier
title_en Cover image description (used as title)
cover_desc Cover image description
caption_en Full video caption in English
partition Video category (e.g., Anime, Game, Delicacy)
likes Number of likes
views Number of views
comment_count Number of comments (used as popularity signal)

Load the RAG Dataset in Python

from datasets import load_dataset

rag_dataset = load_dataset("junchenfu/microlens_rag")

# Access as a pandas DataFrame
df = rag_dataset["train"].to_pandas()
print(df.head())
print(f"Total: {len(df)} videos, {df['partition'].nunique()} categories")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train junchenfu/LLMPopcorn