Instructions to use 2dameneko/neko-pickscore with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use 2dameneko/neko-pickscore with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="2dameneko/neko-pickscore") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoProcessor, AutoModelForZeroShotImageClassification processor = AutoProcessor.from_pretrained("2dameneko/neko-pickscore") model = AutoModelForZeroShotImageClassification.from_pretrained("2dameneko/neko-pickscore") - Notebooks
- Google Colab
- Kaggle
neko-pickscore
neko-pickscore is a custom fine-tuned version of PickScore_v1 (based on CLIP ViT-H/14). It has been trained on a highly curated dataset of 11,000 pairwise preference images to align with a specific, personalized aesthetic taste.
Unlike general-purpose aesthetic scorers, this model's Vision Encoder has been fully fine-tuned using a Bradley-Terry pairwise margin loss to deeply understand specific compositional and stylistic preferences, while the Text Encoder remains frozen to preserve zero-shot language understanding.
π Usage
1. With transformers (Native PyTorch)
You can use this model directly with the Hugging Face transformers library to score images against text prompts.
import torch
from transformers import AutoModel, AutoProcessor
from PIL import Image
model_path = "2dameneko/neko-pickscore"
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load model and processor
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.float32).to(device).eval()
processor = AutoProcessor.from_pretrained(model_path)
# Load an image and define a prompt
image = Image.open("your_image.jpg").convert("RGB")
prompt = "a beautiful landscape, highly detailed"
# Process inputs
inputs = processor(text=[prompt], images=[image], return_tensors="pt", padding=True).to(device)
# Get features and calculate score
with torch.no_grad():
img_feat = model.get_image_features(pixel_values=inputs.pixel_values)
txt_feat = model.get_text_features(input_ids=inputs.input_ids, attention_mask=inputs.attention_mask)
# Normalize
img_feat = img_feat / img_feat.norm(p=2, dim=-1, keepdim=True)
txt_feat = txt_feat / txt_feat.norm(p=2, dim=-1, keepdim=True)
# Calculate PickScore
logit_scale = model.logit_scale.exp()
score = (logit_scale * (img_feat * txt_feat).sum(dim=-1)).item()
print(f"neko-pickscore: {score:.4f}")
2. With nitpick-chan Image Scorer
This model is fully compatible with the custom batch-scoring and hierarchical clustering tool nitpick-chan.
python nitpick-chan.py /path/to/your/images \
--models neko_pickscore \
--mode both \
--tiers 5
π§ Training Details
- Base Model:
yuvalkirstain/PickScore_v1(CLIP ViT-H/14) - Dataset: 11,000 custom pairwise preference pairs (Hierarchical tier-based comparisons).
- Loss Function: Bradley-Terry Pairwise Margin Loss (
-log(sigmoid(score_a - score_b))). - Trainable Parameters: Full Vision Encoder (
632M params). Text Encoder (354M params) was kept frozen to prevent catastrophic forgetting. - Optimizer: 8-bit AdamW (via
bitsandbytes). - Optimizations:
- Scaled Dot-Product Attention (SDPA)
- Gradient Checkpointing (Vision Encoder only)
- OpenCV-based fast image decoding
- GPU-offloaded normalization
- Learning Rate:
1e-5(Cosine schedule with warmup) to gently shift the decision boundary without destroying base CLIP knowledge.
π Resources
- GitHub Repository (Training & Scoring Scripts): https://github.com/2dameneko/nitpick-chan
- Base Model: yuvalkirstain/PickScore_v1
- Downloads last month
- -
Model tree for 2dameneko/neko-pickscore
Base model
yuvalkirstain/PickScore_v1