File size: 7,045 Bytes
39ec415 546ff88 39ec415 546ff88 39ec415 546ff88 39ec415 546ff88 39ec415 546ff88 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
---
title: Three-View-Style-Embedder
emoji: π¨
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.0.0"
app_file: app.py
pinned: false
---
# Three-View-Style-Embedder
17,000+ μκ°μ μΌλ¬μ€νΈ μ€νμΌμ μλ² λ©νκ³ λΆλ₯νλ Multi-branch Metric Learning λͺ¨λΈ
## κ°μ
Three-View-Style-Embedderλ μΌλ¬μ€νΈμ **μ 체 μ΄λ―Έμ§**, **μΌκ΅΄**, **λ** μΈ κ°μ§ λ·°λ₯Ό κ²°ν©νμ¬ μκ° κ³ μ μ μ€νμΌμ 512μ°¨μ 벑ν°λ‘ μλ² λ©ν©λλ€.
## μν€ν
μ²
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Input Images β
βββββββββββββββββββββ¬ββββββββββββββββββββ¬ββββββββββββββββββββββ€
β Full Image β Face Crop β Eye Crop β
β (νμ) β (μ ν) β (μ ν) β
βββββββββββ¬ββββββββββ΄ββββββββββ¬ββββββββββ΄βββββββββββ¬βββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 3Γ EVA02-Large Encoders (κ° 304M params) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βββββββββββ¬ββββββββββ΄ββββββββββ¬βββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββ
β Gated Fusion Module β
β (κ° λΈλμΉ μ€μλ λμ νμ΅) β
βββββββββββββββββββ¬ββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Style Embedding Head (512-d) β
β L2 Normalized Output β
βββββββββββββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββ΄βββββββββββββ
βΌ βΌ
ββββββββββββββ ββββββββββββββ
β ArcFace β β Multi- β
β Loss β β Similarity β
ββββββββββββββ ββββββββββββββ
```
## νλ‘μ νΈ κ΅¬μ‘°
```
Three-View-Style-Embedder/
βββ config.py # μ€μ κ΄λ¦¬
βββ model.py # λͺ¨λΈ μν€ν
μ²
βββ dataset.py # λ°μ΄ν°μ
λ° λ°μ΄ν°λ‘λ
βββ losses.py # Loss ν¨μ (ArcFace, Multi-Similarity, Center)
βββ trainer.py # νμ΅ λ‘μ§
βββ train.py # νμ΅ μ€ν μ€ν¬λ¦½νΈ
βββ evaluate.py # νκ° μ€ν¬λ¦½νΈ
βββ extract_embeddings.py # μλ² λ© μΆμΆ μ€ν¬λ¦½νΈ
βββ app.py # Hugging Face Spacesμ© Web UI (μλ λ€μ΄λ‘λ)
βββ local_app.py # λ‘컬 μ€νμ© Web UI
βββ inference_utils.py # μΆλ‘ μ νΈλ¦¬ν° (곡ν΅)
βββ requirements.txt # μμ‘΄μ±
βββ README.md # λ¬Έμ
```
## μ€μΉ
```bash
pip install -r requirements.txt
```
Windowsμμλ μμ
ν΄λμ `.venv`λ₯Ό μ¬μ©νλλ‘ κΆμ₯ν©λλ€ (PATHμ μλ λ€λ₯Έ PythonμΌλ‘ μ€ννλ©΄ Gradio/torch λ²μ μΆ©λμ΄ λ μ μμ΅λλ€).
## λ°μ΄ν°μ
ꡬ쑰
```
./dataset/ # μ 체 μ΄λ―Έμ§ (νμ)
βββ artist_name_1/
β βββ image1.jpg
β βββ image2.png
βββ artist_name_2/
βββ ...
./dataset_face/ # μΌκ΅΄ ν¬λ‘ (μ ν)
βββ (λμΌ κ΅¬μ‘°)
./dataset_eyes/ # λ ν¬λ‘ (μ ν)
βββ (λμΌ κ΅¬μ‘°)
```
## μ¬μ©λ²
### 1. νμ΅
```bash
python train.py \
--dataset_root ./dataset \
--dataset_face_root ./dataset_face \
--dataset_eyes_root ./dataset_eyes \
--epochs 100 \
--batch_size 256
```
### 2. μλ² λ© μΆμΆ
νμ΅λ λͺ¨λΈλ‘ λͺ¨λ μκ°μ μλ² λ©μ μΆμΆ:
```bash
python extract_embeddings.py \
--checkpoint ./checkpoints/best_model.pt \
--output ./embeddings/artist_embeddings.npz \
--max_combinations 10 \
--batch_size 256
```
### 3. Web UI μ€ν
#### λ‘컬 μ€ν
```bash
python local_app.py \
--checkpoint ./checkpoints/best_model.pt \
--embeddings ./embeddings/artist_embeddings.npz
```
λλ Windowsμμ:
```bat
run.bat
```
λΈλΌμ°μ μμ `http://localhost:7860` μ μ
#### Hugging Face Spaces μ€ν
`app.py`λ Hugging Face Spacesμμ μλμΌλ‘ λͺ¨λΈμ λ€μ΄λ‘λνμ¬ μ€νλ©λλ€. Spacesμ μ
λ‘λνλ©΄ μλμΌλ‘ μλν©λλ€.
## λͺ¨λΈ μ€ν
| νλͺ© | κ° |
|------|-----|
| Backbone | EVA02-Large-14-CLIP Γ 3 |
| Total Parameters | ~920M |
| Embedding Dimension | 512 |
| Input Size | 224 Γ 224 |
| Loss | ArcFace + Multi-Similarity + Center |
## μ±λ₯
| λ©νΈλ¦ | κ° |
|--------|-----|
| Top-1 Accuracy | ~77% |
| Top-5 Accuracy | ~92% |
| ν΄λμ€ μ | 17,000+ |
## μΆλ ₯ νμΌ
### 체ν¬ν¬μΈνΈ (`./checkpoints/`)
- `best_model.pt`: μ΅κ³ μ±λ₯ λͺ¨λΈ
- `checkpoint_epoch_N.pt`: μν¬ν¬λ³ 체ν¬ν¬μΈνΈ
### μλ² λ© (`./embeddings/`)
- `artist_embeddings.npz`: μκ°λ³ νκ· μλ² λ©
- `artist_embeddings.json`: λ©νλ°μ΄ν°
## API μ¬μ© μμ
```python
from extract_embeddings import load_embeddings, find_similar_artists
from app import get_image_embedding
# μλ² λ© λ‘λ
artist_names, embeddings = load_embeddings('./embeddings/artist_embeddings.npz')
# μ΄λ―Έμ§μμ μλ² λ© μΆμΆ
query_emb = get_image_embedding(model, image, device)
# μ μ¬ μκ° κ²μ
similar = find_similar_artists(query_emb, artist_names, embeddings, top_k=10)
for name, score in similar:
print(f"{name}: {score:.4f}")
```
## λΌμ΄μ μ€
MIT License
|