ConvNext-aesthetic-rater
Model Details
- Architecture:
animetimm/caformer_b36.dbv4-full
- Classes: ['Good', 'Normal', 'Rest'] (3 classes)
- Input size: ~448px (non-square, both sides divisible by 32)
- Best val accuracy: 0.8684
- Training precision: bf16 mixed precision
Training Config
| Parameter |
Value |
| Batch size |
16 |
| Head LR |
0.001 |
| Fine-tune LR |
0.0001 |
| Weight decay |
0.0001 |
| Label smoothing |
0.1 |
| MixUp |
True |
| Scheduler |
CosineAnnealing |
Usage
import timm
import torch
model = timm.create_model("animetimm/caformer_b36.dbv4-full", pretrained=False, num_classes=3)
checkpoint = torch.load("best_model.pth", map_location="cpu", weights_only=False)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
Notes
- Color is important for classification — augmentations preserve hue
- Images are resized so both dimensions are divisible by 32 (non-square)
- At inference, cap longest side to 640px and round both sides to nearest multiple of 32