Spaces:

MeshMax
/

VidTower

Running

App Files Files Community

VidTower / README.md

MeshMax

Update README.md

fa94305 verified 4 months ago

preview code

raw

history blame contribute delete

1.7 kB

A newer version of the Gradio SDK is available: 6.4.0

Upgrade

metadata

title: VidEmbed
emoji: 💻
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: cc
short_description: generate embeddings of youtube video text and image data

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Hugging Face Space for multimodal embeddings

Files:

app.py
requirements.txt
finetuned_multimodal.pt <-- upload your 3GB checkpoint here

How to deploy:

Create a new Space on Hugging Face (https://huggingface.co/spaces).
- SDK: Gradio
- Hardware: If you need GPU inference, switch to a GPU runtime (note: GPU access may require a paid plan).
Upload these files to the Space repository. For a large checkpoint (~3GB) use Git LFS or the web UI file upload (the web UI supports large uploads).
Wait for the Space to build. The app serves:
- Web UI at / (Gradio)
- API endpoint at /api/get_embedding

API example: POST JSON to /api/get_embedding: { "title": "My video", "description": "Some description", "tags": "tag1,tag2", "thumbnail_url": "https://..." }

Response: { "embedding": [0.123, -0.456, ...] # fused vector used in training }

Notes:

The app replicates the same fused embedding pipeline used in your notebook: text -> text encoder -> text_proj thumbnail -> image encoder -> img_proj fused = MultiheadAttention(query=text_proj, key=img_proj, value=img_proj) fused is returned as the embedding vector.
If your checkpoint uses different key names, the loader prints a warning; update loader accordingly.
If your model requires different shapes (text_dim/img_dim/proj_dim) adjust MultimodalRegressor init params.