GoldNet / README.md
zobeir's picture
Upload README.md with huggingface_hub
198defb verified
|
Raw
History Blame Contribute Delete
2.96 kB
---
license: mit
datasets:
- zobeir/GoldNet
tags:
- image-classification
- pytorch
- vision-transformer
- counterfeit-detection
- gold
- fine-grained-recognition
language:
- en
---
# GoldNet Model Weights
Trained checkpoints for **GoldFormer** and baseline models from the paper:
> **GoldFormer: A Texture-Aware Vision Transformer-based Algorithm for Detecting Near-Identical Images**
> Z. Raisi, *Algorithms* (MDPI), under review.
> Code & dataset: [github.com/zobeirraisi/GoldNet](https://github.com/zobeirraisi/GoldNet)
## Task
Binary image classification — **authentic vs. counterfeit gold items** — from ordinary smartphone photographs. The two classes are near-identical to the eye; trained experts reached 89.80% accuracy on a blind subset.
## Available Checkpoints (`weights/`)
| File | Model | Accuracy (5-fold CV) |
|---|---|---|
| `GoldFormer_best.pth` | GoldFormer (CNN + Swin-T + TAAG) | 94.69 ± 0.79% |
| `Swin_T_best.pth` | Swin Transformer-Tiny | 94.31 ± 0.78% |
| `ViT_B16_best.pth` | ViT-B/16 | 94.31 ± 0.94% |
| `ResNet101_best.pth` | ResNet-101 | 92.29 ± 1.01% |
| `ResNet50_best.pth` | ResNet-50 | — |
| `ResNet18_best.pth` | ResNet-18 | — |
| `DenseNet121_best.pth` | DenseNet-121 | — |
| `EfficientNet_B3_best.pth` | EfficientNet-B3 | — |
| `EfficientNet_B0_best.pth` | EfficientNet-B0 | — |
| `MobileNet_V2_best.pth` | MobileNet-V2 | — |
All models trained with 5-fold stratified cross-validation, AdamW, AMP (bfloat16), freeze-then-unfreeze fine-tuning on the GoldNet dataset (2,127 images, 1,044 authentic / 1,083 counterfeit).
## Usage
```python
import torch
from torchvision import transforms
from PIL import Image
# Download weights
# bash fetch_weights.sh (from the GitHub repo)
# Load a checkpoint
model = torch.load("weights/GoldFormer_best.pth", weights_only=True)
model.eval()
transform = transforms.Compose([
transforms.Resize((299, 299)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225]),
])
img = Image.open("your_image.jpg").convert("RGB")
x = transform(img).unsqueeze(0)
with torch.no_grad():
logits = model(x)
prob_authentic = torch.softmax(logits, dim=1)[0, 0].item()
print(f"P(authentic) = {prob_authentic:.3f}")
```
> **Note:** All baseline models use 224×224 input. GoldFormer uses 299×299.
> The `models.py` class definitions are in the [GitHub repo](https://github.com/zobeirraisi/GoldNet).
## Citation
```bibtex
@article{raisi2026goldformer,
title = {GoldFormer: A Texture-Aware Vision Transformer-based Algorithm
for Detecting Near-Identical Images},
author = {Raisi, Zobeir},
journal = {Algorithms},
year = {2026},
note = {Under review}
}
```
## License
Model weights: [MIT License](https://github.com/zobeirraisi/GoldNet/blob/main/LICENSE)
Dataset: [CC BY 4.0](https://github.com/zobeirraisi/GoldNet/blob/main/LICENSE-DATA)