Spaces:

MeshMax
/

VidTower

Sleeping

App Files Files Community

MeshMax commited on Oct 4, 2025

Commit

fa94305

verified ·

1 Parent(s): 1a00779

Update README.md

Browse files

Files changed (1) hide show

README.md +39 -0

README.md CHANGED Viewed

@@ -12,3 +12,42 @@ short_description: generate embeddings of youtube video text and image data
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# Hugging Face Space for multimodal embeddings
+Files:
+- app.py
+- requirements.txt
+- finetuned_multimodal.pt  <-- upload your 3GB checkpoint here
+How to deploy:
+1. Create a new Space on Hugging Face (https://huggingface.co/spaces).
+   - SDK: Gradio
+   - Hardware: If you need GPU inference, switch to a GPU runtime (note: GPU access may require a paid plan).
+2. Upload these files to the Space repository. For a large checkpoint (~3GB) use Git LFS or the web UI file upload (the web UI supports large uploads).
+3. Wait for the Space to build. The app serves:
+   - Web UI at `/` (Gradio)
+   - API endpoint at `/api/get_embedding`
+API example:
+POST JSON to `/api/get_embedding`:
+{
+  "title": "My video",
+  "description": "Some description",
+  "tags": "tag1,tag2",
+  "thumbnail_url": "https://..."
+}
+Response:
+{
+  "embedding": [0.123, -0.456, ...]  # fused vector used in training
+}
+Notes:
+- The app replicates the same fused embedding pipeline used in your notebook:
+  text -> text encoder -> text_proj
+  thumbnail -> image encoder -> img_proj
+  fused = MultiheadAttention(query=text_proj, key=img_proj, value=img_proj)
+  fused is returned as the embedding vector.
+- If your checkpoint uses different key names, the loader prints a warning; update loader accordingly.
+- If your model requires different shapes (text_dim/img_dim/proj_dim) adjust MultimodalRegressor init params.