Update README.md
Browse files
README.md
CHANGED
|
@@ -12,3 +12,42 @@ short_description: generate embeddings of youtube video text and image data
|
|
| 12 |
---
|
| 13 |
|
| 14 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 15 |
+
|
| 16 |
+
# Hugging Face Space for multimodal embeddings
|
| 17 |
+
|
| 18 |
+
Files:
|
| 19 |
+
- app.py
|
| 20 |
+
- requirements.txt
|
| 21 |
+
- finetuned_multimodal.pt <-- upload your 3GB checkpoint here
|
| 22 |
+
|
| 23 |
+
How to deploy:
|
| 24 |
+
1. Create a new Space on Hugging Face (https://huggingface.co/spaces).
|
| 25 |
+
- SDK: Gradio
|
| 26 |
+
- Hardware: If you need GPU inference, switch to a GPU runtime (note: GPU access may require a paid plan).
|
| 27 |
+
2. Upload these files to the Space repository. For a large checkpoint (~3GB) use Git LFS or the web UI file upload (the web UI supports large uploads).
|
| 28 |
+
3. Wait for the Space to build. The app serves:
|
| 29 |
+
- Web UI at `/` (Gradio)
|
| 30 |
+
- API endpoint at `/api/get_embedding`
|
| 31 |
+
|
| 32 |
+
API example:
|
| 33 |
+
POST JSON to `/api/get_embedding`:
|
| 34 |
+
{
|
| 35 |
+
"title": "My video",
|
| 36 |
+
"description": "Some description",
|
| 37 |
+
"tags": "tag1,tag2",
|
| 38 |
+
"thumbnail_url": "https://..."
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
Response:
|
| 42 |
+
{
|
| 43 |
+
"embedding": [0.123, -0.456, ...] # fused vector used in training
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
Notes:
|
| 47 |
+
- The app replicates the same fused embedding pipeline used in your notebook:
|
| 48 |
+
text -> text encoder -> text_proj
|
| 49 |
+
thumbnail -> image encoder -> img_proj
|
| 50 |
+
fused = MultiheadAttention(query=text_proj, key=img_proj, value=img_proj)
|
| 51 |
+
fused is returned as the embedding vector.
|
| 52 |
+
- If your checkpoint uses different key names, the loader prints a warning; update loader accordingly.
|
| 53 |
+
- If your model requires different shapes (text_dim/img_dim/proj_dim) adjust MultimodalRegressor init params.
|