Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,12 @@ datasets:
|
|
| 5 |
metrics:
|
| 6 |
- mse
|
| 7 |
---
|
| 8 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
Expected structure:
|
| 11 |
|
|
@@ -22,7 +27,7 @@ translators/
|
|
| 22 |
best.pt
|
| 23 |
```
|
| 24 |
|
| 25 |
-
Required metadata fields:
|
| 26 |
- `model_name`
|
| 27 |
- `translator_name`
|
| 28 |
- `architecture`
|
|
@@ -32,7 +37,7 @@ Required metadata fields:
|
|
| 32 |
- `hidden_dim` (optional; positive integer; defaults to `in_dim`)
|
| 33 |
- `checkpoint_file` (relative path only; no absolute paths or `..`)
|
| 34 |
|
| 35 |
-
Validation notes:
|
| 36 |
- Metadata schema is strict: unknown keys are rejected.
|
| 37 |
- `model_name` and `translator_name` inside `metadata.yaml` must match their directory names.
|
| 38 |
- `architecture` supported are one of: `linear`, `3layer`, `4layer`, `residual`.
|
|
|
|
| 5 |
metrics:
|
| 6 |
- mse
|
| 7 |
---
|
| 8 |
+
# Translator models for Make it SING (https://huggingface.co/papers/2603.14610)
|
| 9 |
+
Their purpose is to translate the penultimate layer of a a classifier to the image embedding space of CLIP.
|
| 10 |
+
In this repo they currently support (https://huggingface.co/kakaobrain/karlo-v1-alpha) variation but can easily retrained to any other.
|
| 11 |
+
All these models were trained on MSE loss via ridge regression with scikit-learn library and then migrated to PyTorch.
|
| 12 |
+
|
| 13 |
+
## Translators Layout
|
| 14 |
|
| 15 |
Expected structure:
|
| 16 |
|
|
|
|
| 27 |
best.pt
|
| 28 |
```
|
| 29 |
|
| 30 |
+
## Required metadata fields:
|
| 31 |
- `model_name`
|
| 32 |
- `translator_name`
|
| 33 |
- `architecture`
|
|
|
|
| 37 |
- `hidden_dim` (optional; positive integer; defaults to `in_dim`)
|
| 38 |
- `checkpoint_file` (relative path only; no absolute paths or `..`)
|
| 39 |
|
| 40 |
+
## Validation notes:
|
| 41 |
- Metadata schema is strict: unknown keys are rejected.
|
| 42 |
- `model_name` and `translator_name` inside `metadata.yaml` must match their directory names.
|
| 43 |
- `architecture` supported are one of: `linear`, `3layer`, `4layer`, `residual`.
|