| --- |
| title: Zeeb |
| emoji: π¬ |
| colorFrom: purple |
| colorTo: pink |
| sdk: gradio |
| sdk_version: 5.31.0 |
| python_version: '3.11' |
| app_file: app.py |
| pinned: false |
| short_description: "Video-LLM - OLMo 2 + LoRA + VQ-VAE text-to-video" |
| --- |
| |
| # Zeeb β Video-LLM |
|
|
| Text-to-Video generation using **OLMo 2 1B Instruct** + **LoRA** + **VQ-VAE**. |
|
|
| ## Pipeline |
| ``` |
| Text Prompt β LLM (constrained decoding) β Visual Tokens β VQ-VAE Decoder β Video |
| ``` |
|
|
| ## Training Pipeline |
| 1. Train VQ-VAE on 50K COCO images (real photos) |
| 2. Tokenize 10K OpenVid-1M clips through VQ-VAE |
| 3. Fine-tune OLMo 2 1B + LoRA on tokenized data |
| 4. Push trained model to EeshaAI/zeeb |
|
|