Upload README.md with huggingface_hub

a9e5ce0 verified 4 days ago

4.14 kB

	---
	license: mit
	tags:
	- text-to-video
	- prompt-engineering
	- video-generation
	- llm
	- rag
	- research
	datasets:
	- junchenfu/llmpopcorn_prompts
	- junchenfu/microlens_rag
	pipeline_tag: text-generation
	---

	# LLMPopcorn Usage Instructions

	Welcome to LLMPopcorn! This guide will help you generate video titles and prompts, as well as create AI-generated videos based on those prompts.

	## Prerequisites

	### Install Required Python Packages

	Before running the scripts, ensure that you have installed the necessary Python packages. You can do this by executing the following command:

	```bash
	pip install torch transformers diffusers tqdm numpy pandas sentence-transformers faiss-cpu openai huggingface_hub safetensors
	```

	Download the MicroLens Dataset:
	Download the following files from the [MicroLens dataset](https://github.com/westlake-repl/MicroLens) and place them in the `Microlens/` folder:

	\| File \| Description \|
	\|------\|-------------\|
	\| `MicroLens-100k_likes_and_views.txt` \| Video engagement stats (tab-separated) \|
	\| `MicroLens-100k_title_en.csv` \| Cover image descriptions (comma-separated) \|
	\| `Microlens100K_captions_en.csv` \| Video captions in English (tab-separated) \|
	\| `MicroLens-100k_comment_en.txt` \| User comments (tab-separated) \|
	\| `tags_to_summary.csv` \| Video category tags (comma-separated) \|

	Your directory structure should look like:
	```
	LLMPopcorn/
	├── Microlens/
	│ ├── MicroLens-100k_likes_and_views.txt
	│ ├── MicroLens-100k_title_en.csv
	│ ├── Microlens100K_captions_en.csv
	│ ├── MicroLens-100k_comment_en.txt
	│ └── tags_to_summary.csv
	├── PE.py
	├── pipline.py
	└── ...
	```

	## Step 1: Generate Video Titles and Prompts

	To generate video titles and prompts, run the `LLMPopcorn.py` script:
	```bash
	python LLMPopcorn.py
	```

	To enhance LLMPopcorn, execute the `PE.py` script:
	```bash
	python PE.py
	```

	## Step 2: Generate AI Videos

	To create AI-generated videos, execute the `generating_images_videos_three.py` script:
	```bash
	python generating_images_videos_three.py
	```

	## Step 3: Clone the Evaluation Code

	Then, following the instructions in the MMRA repository, you can evaluate the generated videos.

	## Tutorial: Using the Prompts Dataset

	You can easily download and use the structured prompts directly from Hugging Face:

	### 1. Install `datasets`
	```bash
	pip install datasets
	```

	### 2. Load the Dataset in Python
	```python
	from datasets import load_dataset

	# Load the LLMPopcorn prompts
	dataset = load_dataset("junchenfu/llmpopcorn_prompts")

	# Access the data (abstract or concrete)
	for item in dataset["train"]:
	print(f"Type: {item['type']}, Prompt: {item['prompt']}")
	```

	This dataset contains both abstract and concrete prompts, which you can use as input for the video generation scripts in Step 2.

	## RAG Reference Dataset: MicroLens

	For the RAG-enhanced pipeline (`PE.py` + `pipline.py`), we provide a pre-processed version of the MicroLens dataset on Hugging Face so you don't need to download and process the raw files manually.

	The dataset is available at: [junchenfu/microlens_rag](https://huggingface.co/datasets/junchenfu/microlens_rag)

	It contains 19,560 video entries across 22 categories with the following fields:

	\| Column \| Description \|
	\|--------\|-------------\|
	\| `video_id` \| Unique video identifier \|
	\| `title_en` \| Cover image description (used as title) \|
	\| `cover_desc` \| Cover image description \|
	\| `caption_en` \| Full video caption in English \|
	\| `partition` \| Video category (e.g., Anime, Game, Delicacy) \|
	\| `likes` \| Number of likes \|
	\| `views` \| Number of views \|
	\| `comment_count` \| Number of comments (used as popularity signal) \|

	### Load the RAG Dataset in Python

	```python
	from datasets import load_dataset

	rag_dataset = load_dataset("junchenfu/microlens_rag")

	# Access as a pandas DataFrame
	df = rag_dataset["train"].to_pandas()
	print(df.head())
	print(f"Total: {len(df)} videos, {df['partition'].nunique()} categories")
	```