--- license: mit datasets: - zobeir/GoldNet tags: - image-classification - pytorch - vision-transformer - counterfeit-detection - gold - fine-grained-recognition language: - en --- # GoldNet Model Weights Trained checkpoints for **GoldFormer** and baseline models from the paper: > **GoldFormer: A Texture-Aware Vision Transformer-based Algorithm for Detecting Near-Identical Images** > Z. Raisi, *Algorithms* (MDPI), under review. > Code & dataset: [github.com/zobeirraisi/GoldNet](https://github.com/zobeirraisi/GoldNet) ## Task Binary image classification — **authentic vs. counterfeit gold items** — from ordinary smartphone photographs. The two classes are near-identical to the eye; trained experts reached 89.80% accuracy on a blind subset. ## Available Checkpoints (`weights/`) | File | Model | Accuracy (5-fold CV) | |---|---|---| | `GoldFormer_best.pth` | GoldFormer (CNN + Swin-T + TAAG) | 94.69 ± 0.79% | | `Swin_T_best.pth` | Swin Transformer-Tiny | 94.31 ± 0.78% | | `ViT_B16_best.pth` | ViT-B/16 | 94.31 ± 0.94% | | `ResNet101_best.pth` | ResNet-101 | 92.29 ± 1.01% | | `ResNet50_best.pth` | ResNet-50 | — | | `ResNet18_best.pth` | ResNet-18 | — | | `DenseNet121_best.pth` | DenseNet-121 | — | | `EfficientNet_B3_best.pth` | EfficientNet-B3 | — | | `EfficientNet_B0_best.pth` | EfficientNet-B0 | — | | `MobileNet_V2_best.pth` | MobileNet-V2 | — | All models trained with 5-fold stratified cross-validation, AdamW, AMP (bfloat16), freeze-then-unfreeze fine-tuning on the GoldNet dataset (2,127 images, 1,044 authentic / 1,083 counterfeit). ## Usage ```python import torch from torchvision import transforms from PIL import Image # Download weights # bash fetch_weights.sh (from the GitHub repo) # Load a checkpoint model = torch.load("weights/GoldFormer_best.pth", weights_only=True) model.eval() transform = transforms.Compose([ transforms.Resize((299, 299)), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), ]) img = Image.open("your_image.jpg").convert("RGB") x = transform(img).unsqueeze(0) with torch.no_grad(): logits = model(x) prob_authentic = torch.softmax(logits, dim=1)[0, 0].item() print(f"P(authentic) = {prob_authentic:.3f}") ``` > **Note:** All baseline models use 224×224 input. GoldFormer uses 299×299. > The `models.py` class definitions are in the [GitHub repo](https://github.com/zobeirraisi/GoldNet). ## Citation ```bibtex @article{raisi2026goldformer, title = {GoldFormer: A Texture-Aware Vision Transformer-based Algorithm for Detecting Near-Identical Images}, author = {Raisi, Zobeir}, journal = {Algorithms}, year = {2026}, note = {Under review} } ``` ## License Model weights: [MIT License](https://github.com/zobeirraisi/GoldNet/blob/main/LICENSE) Dataset: [CC BY 4.0](https://github.com/zobeirraisi/GoldNet/blob/main/LICENSE-DATA)