|
|
--- |
|
|
title: Three-View-Style-Embedder |
|
|
emoji: π¨ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: "4.0.0" |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
# Three-View-Style-Embedder |
|
|
|
|
|
17,000+ μκ°μ μΌλ¬μ€νΈ μ€νμΌμ μλ² λ©νκ³ λΆλ₯νλ Multi-branch Metric Learning λͺ¨λΈ |
|
|
|
|
|
## κ°μ |
|
|
|
|
|
Three-View-Style-Embedderλ μΌλ¬μ€νΈμ **μ 체 μ΄λ―Έμ§**, **μΌκ΅΄**, **λ** μΈ κ°μ§ λ·°λ₯Ό κ²°ν©νμ¬ μκ° κ³ μ μ μ€νμΌμ 512μ°¨μ 벑ν°λ‘ μλ² λ©ν©λλ€. |
|
|
|
|
|
## μν€ν
μ² |
|
|
|
|
|
``` |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β Input Images β |
|
|
βββββββββββββββββββββ¬ββββββββββββββββββββ¬ββββββββββββββββββββββ€ |
|
|
β Full Image β Face Crop β Eye Crop β |
|
|
β (νμ) β (μ ν) β (μ ν) β |
|
|
βββββββββββ¬ββββββββββ΄ββββββββββ¬ββββββββββ΄βββββββββββ¬βββββββββββ |
|
|
β β β |
|
|
βΌ βΌ βΌ |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β 3Γ EVA02-Large Encoders (κ° 304M params) β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β β β |
|
|
βββββββββββ¬ββββββββββ΄ββββββββββ¬βββββββββββ |
|
|
β β |
|
|
βΌ βΌ |
|
|
βββββββββββββββββββββββββββββββββββββββ |
|
|
β Gated Fusion Module β |
|
|
β (κ° λΈλμΉ μ€μλ λμ νμ΅) β |
|
|
βββββββββββββββββββ¬ββββββββββββββββββββ |
|
|
β |
|
|
βΌ |
|
|
βββββββββββββββββββββββββββββββββββββββ |
|
|
β Style Embedding Head (512-d) β |
|
|
β L2 Normalized Output β |
|
|
βββββββββββββββββββ¬ββββββββββββββββββββ |
|
|
β |
|
|
ββββββββββββββ΄βββββββββββββ |
|
|
βΌ βΌ |
|
|
ββββββββββββββ ββββββββββββββ |
|
|
β ArcFace β β Multi- β |
|
|
β Loss β β Similarity β |
|
|
ββββββββββββββ ββββββββββββββ |
|
|
``` |
|
|
|
|
|
## νλ‘μ νΈ κ΅¬μ‘° |
|
|
|
|
|
``` |
|
|
Three-View-Style-Embedder/ |
|
|
βββ config.py # μ€μ κ΄λ¦¬ |
|
|
βββ model.py # λͺ¨λΈ μν€ν
μ² |
|
|
βββ dataset.py # λ°μ΄ν°μ
λ° λ°μ΄ν°λ‘λ |
|
|
βββ losses.py # Loss ν¨μ (ArcFace, Multi-Similarity, Center) |
|
|
βββ trainer.py # νμ΅ λ‘μ§ |
|
|
βββ train.py # νμ΅ μ€ν μ€ν¬λ¦½νΈ |
|
|
βββ evaluate.py # νκ° μ€ν¬λ¦½νΈ |
|
|
βββ extract_embeddings.py # μλ² λ© μΆμΆ μ€ν¬λ¦½νΈ |
|
|
βββ app.py # Hugging Face Spacesμ© Web UI (μλ λ€μ΄λ‘λ) |
|
|
βββ local_app.py # λ‘컬 μ€νμ© Web UI |
|
|
βββ inference_utils.py # μΆλ‘ μ νΈλ¦¬ν° (곡ν΅) |
|
|
βββ requirements.txt # μμ‘΄μ± |
|
|
βββ README.md # λ¬Έμ |
|
|
``` |
|
|
|
|
|
## μ€μΉ |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
Windowsμμλ μμ
ν΄λμ `.venv`λ₯Ό μ¬μ©νλλ‘ κΆμ₯ν©λλ€ (PATHμ μλ λ€λ₯Έ PythonμΌλ‘ μ€ννλ©΄ Gradio/torch λ²μ μΆ©λμ΄ λ μ μμ΅λλ€). |
|
|
|
|
|
## λ°μ΄ν°μ
ꡬ쑰 |
|
|
|
|
|
``` |
|
|
./dataset/ # μ 체 μ΄λ―Έμ§ (νμ) |
|
|
βββ artist_name_1/ |
|
|
β βββ image1.jpg |
|
|
β βββ image2.png |
|
|
βββ artist_name_2/ |
|
|
βββ ... |
|
|
|
|
|
./dataset_face/ # μΌκ΅΄ ν¬λ‘ (μ ν) |
|
|
βββ (λμΌ κ΅¬μ‘°) |
|
|
|
|
|
./dataset_eyes/ # λ ν¬λ‘ (μ ν) |
|
|
βββ (λμΌ κ΅¬μ‘°) |
|
|
``` |
|
|
|
|
|
## μ¬μ©λ² |
|
|
|
|
|
### 1. νμ΅ |
|
|
|
|
|
```bash |
|
|
python train.py \ |
|
|
--dataset_root ./dataset \ |
|
|
--dataset_face_root ./dataset_face \ |
|
|
--dataset_eyes_root ./dataset_eyes \ |
|
|
--epochs 100 \ |
|
|
--batch_size 256 |
|
|
``` |
|
|
|
|
|
### 2. μλ² λ© μΆμΆ |
|
|
|
|
|
νμ΅λ λͺ¨λΈλ‘ λͺ¨λ μκ°μ μλ² λ©μ μΆμΆ: |
|
|
|
|
|
```bash |
|
|
python extract_embeddings.py \ |
|
|
--checkpoint ./checkpoints/best_model.pt \ |
|
|
--output ./embeddings/artist_embeddings.npz \ |
|
|
--max_combinations 10 \ |
|
|
--batch_size 256 |
|
|
``` |
|
|
|
|
|
### 3. Web UI μ€ν |
|
|
|
|
|
#### λ‘컬 μ€ν |
|
|
|
|
|
```bash |
|
|
python local_app.py \ |
|
|
--checkpoint ./checkpoints/best_model.pt \ |
|
|
--embeddings ./embeddings/artist_embeddings.npz |
|
|
``` |
|
|
|
|
|
λλ Windowsμμ: |
|
|
|
|
|
```bat |
|
|
run.bat |
|
|
``` |
|
|
|
|
|
λΈλΌμ°μ μμ `http://localhost:7860` μ μ |
|
|
|
|
|
#### Hugging Face Spaces μ€ν |
|
|
|
|
|
`app.py`λ Hugging Face Spacesμμ μλμΌλ‘ λͺ¨λΈμ λ€μ΄λ‘λνμ¬ μ€νλ©λλ€. Spacesμ μ
λ‘λνλ©΄ μλμΌλ‘ μλν©λλ€. |
|
|
|
|
|
## λͺ¨λΈ μ€ν |
|
|
|
|
|
| νλͺ© | κ° | |
|
|
|------|-----| |
|
|
| Backbone | EVA02-Large-14-CLIP Γ 3 | |
|
|
| Total Parameters | ~920M | |
|
|
| Embedding Dimension | 512 | |
|
|
| Input Size | 224 Γ 224 | |
|
|
| Loss | ArcFace + Multi-Similarity + Center | |
|
|
|
|
|
## μ±λ₯ |
|
|
|
|
|
| λ©νΈλ¦ | κ° | |
|
|
|--------|-----| |
|
|
| Top-1 Accuracy | ~77% | |
|
|
| Top-5 Accuracy | ~92% | |
|
|
| ν΄λμ€ μ | 17,000+ | |
|
|
|
|
|
## μΆλ ₯ νμΌ |
|
|
|
|
|
### 체ν¬ν¬μΈνΈ (`./checkpoints/`) |
|
|
- `best_model.pt`: μ΅κ³ μ±λ₯ λͺ¨λΈ |
|
|
- `checkpoint_epoch_N.pt`: μν¬ν¬λ³ 체ν¬ν¬μΈνΈ |
|
|
|
|
|
### μλ² λ© (`./embeddings/`) |
|
|
- `artist_embeddings.npz`: μκ°λ³ νκ· μλ² λ© |
|
|
- `artist_embeddings.json`: λ©νλ°μ΄ν° |
|
|
|
|
|
## API μ¬μ© μμ |
|
|
|
|
|
```python |
|
|
from extract_embeddings import load_embeddings, find_similar_artists |
|
|
from app import get_image_embedding |
|
|
|
|
|
# μλ² λ© λ‘λ |
|
|
artist_names, embeddings = load_embeddings('./embeddings/artist_embeddings.npz') |
|
|
|
|
|
# μ΄λ―Έμ§μμ μλ² λ© μΆμΆ |
|
|
query_emb = get_image_embedding(model, image, device) |
|
|
|
|
|
# μ μ¬ μκ° κ²μ |
|
|
similar = find_similar_artists(query_emb, artist_names, embeddings, top_k=10) |
|
|
for name, score in similar: |
|
|
print(f"{name}: {score:.4f}") |
|
|
``` |
|
|
|
|
|
## λΌμ΄μ μ€ |
|
|
|
|
|
MIT License |
|
|
|