MeshMax commited on
Commit
fa94305
·
verified ·
1 Parent(s): 1a00779

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md CHANGED
@@ -12,3 +12,42 @@ short_description: generate embeddings of youtube video text and image data
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
+
16
+ # Hugging Face Space for multimodal embeddings
17
+
18
+ Files:
19
+ - app.py
20
+ - requirements.txt
21
+ - finetuned_multimodal.pt <-- upload your 3GB checkpoint here
22
+
23
+ How to deploy:
24
+ 1. Create a new Space on Hugging Face (https://huggingface.co/spaces).
25
+ - SDK: Gradio
26
+ - Hardware: If you need GPU inference, switch to a GPU runtime (note: GPU access may require a paid plan).
27
+ 2. Upload these files to the Space repository. For a large checkpoint (~3GB) use Git LFS or the web UI file upload (the web UI supports large uploads).
28
+ 3. Wait for the Space to build. The app serves:
29
+ - Web UI at `/` (Gradio)
30
+ - API endpoint at `/api/get_embedding`
31
+
32
+ API example:
33
+ POST JSON to `/api/get_embedding`:
34
+ {
35
+ "title": "My video",
36
+ "description": "Some description",
37
+ "tags": "tag1,tag2",
38
+ "thumbnail_url": "https://..."
39
+ }
40
+
41
+ Response:
42
+ {
43
+ "embedding": [0.123, -0.456, ...] # fused vector used in training
44
+ }
45
+
46
+ Notes:
47
+ - The app replicates the same fused embedding pipeline used in your notebook:
48
+ text -> text encoder -> text_proj
49
+ thumbnail -> image encoder -> img_proj
50
+ fused = MultiheadAttention(query=text_proj, key=img_proj, value=img_proj)
51
+ fused is returned as the embedding vector.
52
+ - If your checkpoint uses different key names, the loader prints a warning; update loader accordingly.
53
+ - If your model requires different shapes (text_dim/img_dim/proj_dim) adjust MultimodalRegressor init params.