| --- |
| license: mit |
| datasets: |
| - zobeir/GoldNet |
| tags: |
| - image-classification |
| - pytorch |
| - vision-transformer |
| - counterfeit-detection |
| - gold |
| - fine-grained-recognition |
| language: |
| - en |
| --- |
| |
| # GoldNet Model Weights |
|
|
| Trained checkpoints for **GoldFormer** and baseline models from the paper: |
|
|
| > **GoldFormer: A Texture-Aware Vision Transformer-based Algorithm for Detecting Near-Identical Images** |
| > Z. Raisi, *Algorithms* (MDPI), under review. |
| > Code & dataset: [github.com/zobeirraisi/GoldNet](https://github.com/zobeirraisi/GoldNet) |
|
|
| ## Task |
|
|
| Binary image classification — **authentic vs. counterfeit gold items** — from ordinary smartphone photographs. The two classes are near-identical to the eye; trained experts reached 89.80% accuracy on a blind subset. |
|
|
| ## Available Checkpoints (`weights/`) |
|
|
| | File | Model | Accuracy (5-fold CV) | |
| |---|---|---| |
| | `GoldFormer_best.pth` | GoldFormer (CNN + Swin-T + TAAG) | 94.69 ± 0.79% | |
| | `Swin_T_best.pth` | Swin Transformer-Tiny | 94.31 ± 0.78% | |
| | `ViT_B16_best.pth` | ViT-B/16 | 94.31 ± 0.94% | |
| | `ResNet101_best.pth` | ResNet-101 | 92.29 ± 1.01% | |
| | `ResNet50_best.pth` | ResNet-50 | — | |
| | `ResNet18_best.pth` | ResNet-18 | — | |
| | `DenseNet121_best.pth` | DenseNet-121 | — | |
| | `EfficientNet_B3_best.pth` | EfficientNet-B3 | — | |
| | `EfficientNet_B0_best.pth` | EfficientNet-B0 | — | |
| | `MobileNet_V2_best.pth` | MobileNet-V2 | — | |
|
|
| All models trained with 5-fold stratified cross-validation, AdamW, AMP (bfloat16), freeze-then-unfreeze fine-tuning on the GoldNet dataset (2,127 images, 1,044 authentic / 1,083 counterfeit). |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from torchvision import transforms |
| from PIL import Image |
| |
| # Download weights |
| # bash fetch_weights.sh (from the GitHub repo) |
| |
| # Load a checkpoint |
| model = torch.load("weights/GoldFormer_best.pth", weights_only=True) |
| model.eval() |
| |
| transform = transforms.Compose([ |
| transforms.Resize((299, 299)), |
| transforms.ToTensor(), |
| transforms.Normalize([0.485, 0.456, 0.406], |
| [0.229, 0.224, 0.225]), |
| ]) |
| |
| img = Image.open("your_image.jpg").convert("RGB") |
| x = transform(img).unsqueeze(0) |
| |
| with torch.no_grad(): |
| logits = model(x) |
| prob_authentic = torch.softmax(logits, dim=1)[0, 0].item() |
| print(f"P(authentic) = {prob_authentic:.3f}") |
| ``` |
|
|
| > **Note:** All baseline models use 224×224 input. GoldFormer uses 299×299. |
| > The `models.py` class definitions are in the [GitHub repo](https://github.com/zobeirraisi/GoldNet). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{raisi2026goldformer, |
| title = {GoldFormer: A Texture-Aware Vision Transformer-based Algorithm |
| for Detecting Near-Identical Images}, |
| author = {Raisi, Zobeir}, |
| journal = {Algorithms}, |
| year = {2026}, |
| note = {Under review} |
| } |
| ``` |
|
|
| ## License |
|
|
| Model weights: [MIT License](https://github.com/zobeirraisi/GoldNet/blob/main/LICENSE) |
| Dataset: [CC BY 4.0](https://github.com/zobeirraisi/GoldNet/blob/main/LICENSE-DATA) |
|
|