A newer version of the Gradio SDK is available:
6.5.1
title: Three-View-Style-Embedder
emoji: π¨
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
Three-View-Style-Embedder
17,000+ μκ°μ μΌλ¬μ€νΈ μ€νμΌμ μλ² λ©νκ³ λΆλ₯νλ Multi-branch Metric Learning λͺ¨λΈ
κ°μ
Three-View-Style-Embedderλ μΌλ¬μ€νΈμ μ 체 μ΄λ―Έμ§, μΌκ΅΄, λ μΈ κ°μ§ λ·°λ₯Ό κ²°ν©νμ¬ μκ° κ³ μ μ μ€νμΌμ 512μ°¨μ 벑ν°λ‘ μλ² λ©ν©λλ€.
μν€ν μ²
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Input Images β
βββββββββββββββββββββ¬ββββββββββββββββββββ¬ββββββββββββββββββββββ€
β Full Image β Face Crop β Eye Crop β
β (νμ) β (μ ν) β (μ ν) β
βββββββββββ¬ββββββββββ΄ββββββββββ¬ββββββββββ΄βββββββββββ¬βββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 3Γ EVA02-Large Encoders (κ° 304M params) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βββββββββββ¬ββββββββββ΄ββββββββββ¬βββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββ
β Gated Fusion Module β
β (κ° λΈλμΉ μ€μλ λμ νμ΅) β
βββββββββββββββββββ¬ββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Style Embedding Head (512-d) β
β L2 Normalized Output β
βββββββββββββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββ΄βββββββββββββ
βΌ βΌ
ββββββββββββββ ββββββββββββββ
β ArcFace β β Multi- β
β Loss β β Similarity β
ββββββββββββββ ββββββββββββββ
νλ‘μ νΈ κ΅¬μ‘°
Three-View-Style-Embedder/
βββ config.py # μ€μ κ΄λ¦¬
βββ model.py # λͺ¨λΈ μν€ν
μ²
βββ dataset.py # λ°μ΄ν°μ
λ° λ°μ΄ν°λ‘λ
βββ losses.py # Loss ν¨μ (ArcFace, Multi-Similarity, Center)
βββ trainer.py # νμ΅ λ‘μ§
βββ train.py # νμ΅ μ€ν μ€ν¬λ¦½νΈ
βββ evaluate.py # νκ° μ€ν¬λ¦½νΈ
βββ extract_embeddings.py # μλ² λ© μΆμΆ μ€ν¬λ¦½νΈ
βββ app.py # Hugging Face Spacesμ© Web UI (μλ λ€μ΄λ‘λ)
βββ local_app.py # λ‘컬 μ€νμ© Web UI
βββ inference_utils.py # μΆλ‘ μ νΈλ¦¬ν° (곡ν΅)
βββ requirements.txt # μμ‘΄μ±
βββ README.md # λ¬Έμ
μ€μΉ
pip install -r requirements.txt
Windowsμμλ μμ
ν΄λμ .venvλ₯Ό μ¬μ©νλλ‘ κΆμ₯ν©λλ€ (PATHμ μλ λ€λ₯Έ PythonμΌλ‘ μ€ννλ©΄ Gradio/torch λ²μ μΆ©λμ΄ λ μ μμ΅λλ€).
λ°μ΄ν°μ ꡬ쑰
./dataset/ # μ 체 μ΄λ―Έμ§ (νμ)
βββ artist_name_1/
β βββ image1.jpg
β βββ image2.png
βββ artist_name_2/
βββ ...
./dataset_face/ # μΌκ΅΄ ν¬λ‘ (μ ν)
βββ (λμΌ κ΅¬μ‘°)
./dataset_eyes/ # λ ν¬λ‘ (μ ν)
βββ (λμΌ κ΅¬μ‘°)
μ¬μ©λ²
1. νμ΅
python train.py \
--dataset_root ./dataset \
--dataset_face_root ./dataset_face \
--dataset_eyes_root ./dataset_eyes \
--epochs 100 \
--batch_size 256
2. μλ² λ© μΆμΆ
νμ΅λ λͺ¨λΈλ‘ λͺ¨λ μκ°μ μλ² λ©μ μΆμΆ:
python extract_embeddings.py \
--checkpoint ./checkpoints/best_model.pt \
--output ./embeddings/artist_embeddings.npz \
--max_combinations 10 \
--batch_size 256
3. Web UI μ€ν
λ‘컬 μ€ν
python local_app.py \
--checkpoint ./checkpoints/best_model.pt \
--embeddings ./embeddings/artist_embeddings.npz
λλ Windowsμμ:
run.bat
λΈλΌμ°μ μμ http://localhost:7860 μ μ
Hugging Face Spaces μ€ν
app.pyλ Hugging Face Spacesμμ μλμΌλ‘ λͺ¨λΈμ λ€μ΄λ‘λνμ¬ μ€νλ©λλ€. Spacesμ μ
λ‘λνλ©΄ μλμΌλ‘ μλν©λλ€.
λͺ¨λΈ μ€ν
| νλͺ© | κ° |
|---|---|
| Backbone | EVA02-Large-14-CLIP Γ 3 |
| Total Parameters | ~920M |
| Embedding Dimension | 512 |
| Input Size | 224 Γ 224 |
| Loss | ArcFace + Multi-Similarity + Center |
μ±λ₯
| λ©νΈλ¦ | κ° |
|---|---|
| Top-1 Accuracy | ~77% |
| Top-5 Accuracy | ~92% |
| ν΄λμ€ μ | 17,000+ |
μΆλ ₯ νμΌ
체ν¬ν¬μΈνΈ (./checkpoints/)
best_model.pt: μ΅κ³ μ±λ₯ λͺ¨λΈcheckpoint_epoch_N.pt: μν¬ν¬λ³ 체ν¬ν¬μΈνΈ
μλ² λ© (./embeddings/)
artist_embeddings.npz: μκ°λ³ νκ· μλ² λ©artist_embeddings.json: λ©νλ°μ΄ν°
API μ¬μ© μμ
from extract_embeddings import load_embeddings, find_similar_artists
from app import get_image_embedding
# μλ² λ© λ‘λ
artist_names, embeddings = load_embeddings('./embeddings/artist_embeddings.npz')
# μ΄λ―Έμ§μμ μλ² λ© μΆμΆ
query_emb = get_image_embedding(model, image, device)
# μ μ¬ μκ° κ²μ
similar = find_similar_artists(query_emb, artist_names, embeddings, top_k=10)
for name, score in similar:
print(f"{name}: {score:.4f}")
λΌμ΄μ μ€
MIT License