vadishev
/

picotrust

image-watermarking

Model card Files Files and versions

picotrust / README.md

vadishev's picture

Upload README.md with huggingface_hub

34c4e14 verified 6 days ago

|

history blame contribute delete

2.95 kB

	---
	license: apache-2.0
	tags:
	- steganography
	- watermarking
	- image-watermarking
	- pytorch
	---

	# PicoTrust — Robust Image Steganography

	PicoTrust is a neural image steganography model that encodes hidden binary messages into images, surviving real-world distortions like JPEG compression, blur, noise, and color shifts.

	Based on StegaStamp (Tancik et al., CVPR 2020) with significant architectural improvements.

	## Architecture

	- Encoder: U-Net (512x512) + E_post refinement (1-channel grayscale residual, softsign bounding with strength annealing)
	- Decoder: StegaStamp-style CNN (256x256) with compact STN for geometric alignment
	- Discriminator: WGAN PatchGAN
	- Message: 100 bits per image
	- Bounding: Softsign residual with strength annealing (1.0 → target over training)

	## Checkpoints

	\| Model \| File \| Strength \| PSNR \| Bit Accuracy \| JPEG Q10 \|
	\|-------\|------\|----------\|------\|-------------\|----------\|
	\| v4 \| `v4/picotrust_v4_200k.pt` \| 0.02 \| 35.56 dB \| 97.8% \| 97.6% \|
	\| v2 \| `v2/picotrust_v2_200k.pt` \| 0.03 \| 32.82 dB \| 98.4% \| 98.6% \|

	- v4: Best balance of visual quality and accuracy. +2.7 dB PSNR over v2 with <1% accuracy trade-off.
	- v2: Highest raw accuracy and JPEG robustness. Zero colour shifts (architectural guarantee).

	Both models have zero colour shifts (grayscale residual) and >92% accuracy across all distortion types.

	## Robustness (v4 @ 200k steps)

	\| Distortion \| Bit Accuracy \|
	\|------------\|-------------\|
	\| Clean \| 97.8% \|
	\| JPEG Q10 \| 97.6% \|
	\| Gaussian blur (σ=3) \| 98.0% \|
	\| Gaussian noise (σ=0.05) \| 96.6% \|
	\| Brightness ±0.3 \| 94.6% \|
	\| Contrast ±0.3 \| 98.2% \|

	## Usage

	```python
	import torch
	from picode.models.picotrust import Encoder, Decoder

	# Load checkpoint
	ckpt = torch.load("picotrust_v4_200k.pt", map_location="cpu")

	# Reconstruct encoder/decoder
	encoder = Encoder(message_length=100, image_size=512)
	decoder = Decoder(message_length=100, image_size=256)

	encoder.load_state_dict(ckpt["encoder"])
	decoder.load_state_dict(ckpt["decoder"])

	# Encode: image (1,3,512,512) [0,1] + message (1,100) binary
	encoded = encoder(image, message)

	# Decode: returns probabilities (1,100) in [0,1]
	decoded = decoder(encoded)
	bits = (decoded > 0.5).float()
	```

	## Training

	Trained on COCO train2017 (~118K images) for 200K steps on a single NVIDIA T4 GPU.

	Config: `configs/picotrust_v4.yaml` (v4) / `configs/picotrust_v2.yaml` (v2)

	## Key Design Decisions

	1. Grayscale residual: 1-channel E_post output broadcast to 3 channels — eliminates colour shifts by construction
	2. Softsign bounding: `strength * x / (1 + \|x\|)` — non-vanishing gradients unlike tanh
	3. Strength annealing: Start unbounded (1.0) → anneal to target — prevents bootstrap collapse
	4. MSE message loss: No trivial equilibrium unlike BCE
	5. Zero-init E_post: Residual starts at exactly zero, grows gradually

	## License

	Apache 2.0