Spaces:

eeshaAI
/

Zeeb

Sleeping

eeshaAI commited on 20 days ago

Commit

b0ebb4e

verified ·

1 Parent(s): ff28908

Fix README YAML

Files changed (1) hide show

README.md CHANGED Viewed

@@ -8,28 +8,20 @@ sdk_version: 5.31.0
 python_version: '3.11'
 app_file: app.py
 pinned: false
-short_description: LoRA fine-tune OLMo 2 1B for video token generation
 ---
-# Zeeb — Video-LLM Trainer
-Fine-tune **OLMo 2 1B Instruct** with **LoRA (r=4)** to generate video tokens using visual tokenization.
 ## Pipeline
 ```
-Text Prompt → LLM (OLMo 2 1B + LoRA) → Visual Tokens → VQ-VAE Decoder → Video
 ```
-## How It Works
-1. Click **"Start Training"** to begin
-2. The model downloads OLMo 2 1B Instruct from HuggingFace
-3. Expands vocabulary with 1,024 visual tokens
-4. Applies LoRA (r=4) for memory-efficient fine-tuning
-5. Trains for 3 epochs on tokenized video data
-6. Merges LoRA weights and pushes to [EeshaAI/zeeb](https://huggingface.co/EeshaAI/zeeb)
-## Files
-- `app.py` — Gradio training interface
-- `train_on_hf_spaces.py` — Training logic (OLMo 2 1B + LoRA)
-- `tokenized_dataset.json` — Tokenized video-text training data
-- `requirements.txt` — Python dependencies

 python_version: '3.11'
 app_file: app.py
 pinned: false
+short_description: "Video-LLM - OLMo 2 + LoRA + VQ-VAE text-to-video"
 ---
+# Zeeb — Video-LLM
+Text-to-Video generation using **OLMo 2 1B Instruct** + **LoRA** + **VQ-VAE**.
 ## Pipeline
 ```
+Text Prompt → LLM (constrained decoding) → Visual Tokens → VQ-VAE Decoder → Video
 ```
+## Training Pipeline
+1. Train VQ-VAE on 50K COCO images (real photos)
+2. Tokenize 10K OpenVid-1M clips through VQ-VAE
+3. Fine-tune OLMo 2 1B + LoRA on tokenized data
+4. Push trained model to EeshaAI/zeeb