Spaces:

MeshMax
/

VidTower

Running

App Files Files Community

VidTower / README.md

MeshMax

Update README.md

fa94305 verified 4 months ago

preview code

raw

history blame contribute delete

1.7 kB

	---
	title: VidEmbed
	emoji: 💻
	colorFrom: purple
	colorTo: blue
	sdk: gradio
	sdk_version: 5.49.0
	app_file: app.py
	pinned: false
	license: cc
	short_description: generate embeddings of youtube video text and image data
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# Hugging Face Space for multimodal embeddings

	Files:
	- app.py
	- requirements.txt
	- finetuned_multimodal.pt <-- upload your 3GB checkpoint here

	How to deploy:
	1. Create a new Space on Hugging Face (https://huggingface.co/spaces).
	- SDK: Gradio
	- Hardware: If you need GPU inference, switch to a GPU runtime (note: GPU access may require a paid plan).
	2. Upload these files to the Space repository. For a large checkpoint (~3GB) use Git LFS or the web UI file upload (the web UI supports large uploads).
	3. Wait for the Space to build. The app serves:
	- Web UI at `/` (Gradio)
	- API endpoint at `/api/get_embedding`

	API example:
	POST JSON to `/api/get_embedding`:
	{
	"title": "My video",
	"description": "Some description",
	"tags": "tag1,tag2",
	"thumbnail_url": "https://..."
	}

	Response:
	{
	"embedding": [0.123, -0.456, ...] # fused vector used in training
	}

	Notes:
	- The app replicates the same fused embedding pipeline used in your notebook:
	text -> text encoder -> text_proj
	thumbnail -> image encoder -> img_proj
	fused = MultiheadAttention(query=text_proj, key=img_proj, value=img_proj)
	fused is returned as the embedding vector.
	- If your checkpoint uses different key names, the loader prints a warning; update loader accordingly.
	- If your model requires different shapes (text_dim/img_dim/proj_dim) adjust MultimodalRegressor init params.