zobeir
/

GoldNet

Image Classification

vision-transformer

counterfeit-detection

fine-grained-recognition

Model card Files Files and versions

GoldNet / README.md

zobeir's picture

Upload README.md with huggingface_hub

198defb verified 3 days ago

|

History Blame Contribute Delete

2.96 kB

	---
	license: mit
	datasets:
	- zobeir/GoldNet
	tags:
	- image-classification
	- pytorch
	- vision-transformer
	- counterfeit-detection
	- gold
	- fine-grained-recognition
	language:
	- en
	---

	# GoldNet Model Weights

	Trained checkpoints for GoldFormer and baseline models from the paper:

	> GoldFormer: A Texture-Aware Vision Transformer-based Algorithm for Detecting Near-Identical Images
	> Z. Raisi, Algorithms (MDPI), under review.
	> Code & dataset: [github.com/zobeirraisi/GoldNet](https://github.com/zobeirraisi/GoldNet)

	## Task

	Binary image classification — authentic vs. counterfeit gold items — from ordinary smartphone photographs. The two classes are near-identical to the eye; trained experts reached 89.80% accuracy on a blind subset.

	## Available Checkpoints (`weights/`)

	\| File \| Model \| Accuracy (5-fold CV) \|
	\|---\|---\|---\|
	\| `GoldFormer_best.pth` \| GoldFormer (CNN + Swin-T + TAAG) \| 94.69 ± 0.79% \|
	\| `Swin_T_best.pth` \| Swin Transformer-Tiny \| 94.31 ± 0.78% \|
	\| `ViT_B16_best.pth` \| ViT-B/16 \| 94.31 ± 0.94% \|
	\| `ResNet101_best.pth` \| ResNet-101 \| 92.29 ± 1.01% \|
	\| `ResNet50_best.pth` \| ResNet-50 \| — \|
	\| `ResNet18_best.pth` \| ResNet-18 \| — \|
	\| `DenseNet121_best.pth` \| DenseNet-121 \| — \|
	\| `EfficientNet_B3_best.pth` \| EfficientNet-B3 \| — \|
	\| `EfficientNet_B0_best.pth` \| EfficientNet-B0 \| — \|
	\| `MobileNet_V2_best.pth` \| MobileNet-V2 \| — \|

	All models trained with 5-fold stratified cross-validation, AdamW, AMP (bfloat16), freeze-then-unfreeze fine-tuning on the GoldNet dataset (2,127 images, 1,044 authentic / 1,083 counterfeit).

	## Usage

	```python
	import torch
	from torchvision import transforms
	from PIL import Image

	# Download weights
	# bash fetch_weights.sh (from the GitHub repo)

	# Load a checkpoint
	model = torch.load("weights/GoldFormer_best.pth", weights_only=True)
	model.eval()

	transform = transforms.Compose([
	transforms.Resize((299, 299)),
	transforms.ToTensor(),
	transforms.Normalize([0.485, 0.456, 0.406],
	[0.229, 0.224, 0.225]),
	])

	img = Image.open("your_image.jpg").convert("RGB")
	x = transform(img).unsqueeze(0)

	with torch.no_grad():
	logits = model(x)
	prob_authentic = torch.softmax(logits, dim=1)[0, 0].item()
	print(f"P(authentic) = {prob_authentic:.3f}")
	```

	> Note: All baseline models use 224×224 input. GoldFormer uses 299×299.
	> The `models.py` class definitions are in the [GitHub repo](https://github.com/zobeirraisi/GoldNet).

	## Citation

	```bibtex
	@article{raisi2026goldformer,
	title = {GoldFormer: A Texture-Aware Vision Transformer-based Algorithm
	for Detecting Near-Identical Images},
	author = {Raisi, Zobeir},
	journal = {Algorithms},
	year = {2026},
	note = {Under review}
	}
	```

	## License

	Model weights: [MIT License](https://github.com/zobeirraisi/GoldNet/blob/main/LICENSE)
	Dataset: [CC BY 4.0](https://github.com/zobeirraisi/GoldNet/blob/main/LICENSE-DATA)