eeshaAI commited on
Commit
b0ebb4e
Β·
verified Β·
1 Parent(s): ff28908

Fix README YAML

Browse files
Files changed (1) hide show
  1. README.md +9 -17
README.md CHANGED
@@ -8,28 +8,20 @@ sdk_version: 5.31.0
8
  python_version: '3.11'
9
  app_file: app.py
10
  pinned: false
11
- short_description: LoRA fine-tune OLMo 2 1B for video token generation
12
  ---
13
 
14
- # Zeeb β€” Video-LLM Trainer
15
 
16
- Fine-tune **OLMo 2 1B Instruct** with **LoRA (r=4)** to generate video tokens using visual tokenization.
17
 
18
  ## Pipeline
19
  ```
20
- Text Prompt β†’ LLM (OLMo 2 1B + LoRA) β†’ Visual Tokens β†’ VQ-VAE Decoder β†’ Video
21
  ```
22
 
23
- ## How It Works
24
- 1. Click **"Start Training"** to begin
25
- 2. The model downloads OLMo 2 1B Instruct from HuggingFace
26
- 3. Expands vocabulary with 1,024 visual tokens
27
- 4. Applies LoRA (r=4) for memory-efficient fine-tuning
28
- 5. Trains for 3 epochs on tokenized video data
29
- 6. Merges LoRA weights and pushes to [EeshaAI/zeeb](https://huggingface.co/EeshaAI/zeeb)
30
-
31
- ## Files
32
- - `app.py` β€” Gradio training interface
33
- - `train_on_hf_spaces.py` β€” Training logic (OLMo 2 1B + LoRA)
34
- - `tokenized_dataset.json` β€” Tokenized video-text training data
35
- - `requirements.txt` β€” Python dependencies
 
8
  python_version: '3.11'
9
  app_file: app.py
10
  pinned: false
11
+ short_description: "Video-LLM - OLMo 2 + LoRA + VQ-VAE text-to-video"
12
  ---
13
 
14
+ # Zeeb β€” Video-LLM
15
 
16
+ Text-to-Video generation using **OLMo 2 1B Instruct** + **LoRA** + **VQ-VAE**.
17
 
18
  ## Pipeline
19
  ```
20
+ Text Prompt β†’ LLM (constrained decoding) β†’ Visual Tokens β†’ VQ-VAE Decoder β†’ Video
21
  ```
22
 
23
+ ## Training Pipeline
24
+ 1. Train VQ-VAE on 50K COCO images (real photos)
25
+ 2. Tokenize 10K OpenVid-1M clips through VQ-VAE
26
+ 3. Fine-tune OLMo 2 1B + LoRA on tokenized data
27
+ 4. Push trained model to EeshaAI/zeeb