PicoTrust β Robust Image Steganography
PicoTrust is a neural image steganography model that encodes hidden binary messages into images, surviving real-world distortions like JPEG compression, blur, noise, and color shifts.
Based on StegaStamp (Tancik et al., CVPR 2020) with significant architectural improvements.
Architecture
- Encoder: U-Net (512x512) + E_post refinement (1-channel grayscale residual, softsign bounding with strength annealing)
- Decoder: StegaStamp-style CNN (256x256) with compact STN for geometric alignment
- Discriminator: WGAN PatchGAN
- Message: 100 bits per image
- Bounding: Softsign residual with strength annealing (1.0 β target over training)
Checkpoints
| Model | File | Strength | PSNR | Bit Accuracy | JPEG Q10 |
|---|---|---|---|---|---|
| v4 | v4/picotrust_v4_200k.pt |
0.02 | 35.56 dB | 97.8% | 97.6% |
| v2 | v2/picotrust_v2_200k.pt |
0.03 | 32.82 dB | 98.4% | 98.6% |
- v4: Best balance of visual quality and accuracy. +2.7 dB PSNR over v2 with <1% accuracy trade-off.
- v2: Highest raw accuracy and JPEG robustness. Zero colour shifts (architectural guarantee).
Both models have zero colour shifts (grayscale residual) and >92% accuracy across all distortion types.
Robustness (v4 @ 200k steps)
| Distortion | Bit Accuracy |
|---|---|
| Clean | 97.8% |
| JPEG Q10 | 97.6% |
| Gaussian blur (Ο=3) | 98.0% |
| Gaussian noise (Ο=0.05) | 96.6% |
| Brightness Β±0.3 | 94.6% |
| Contrast Β±0.3 | 98.2% |
Usage
import torch
from picode.models.picotrust import Encoder, Decoder
# Load checkpoint
ckpt = torch.load("picotrust_v4_200k.pt", map_location="cpu")
# Reconstruct encoder/decoder
encoder = Encoder(message_length=100, image_size=512)
decoder = Decoder(message_length=100, image_size=256)
encoder.load_state_dict(ckpt["encoder"])
decoder.load_state_dict(ckpt["decoder"])
# Encode: image (1,3,512,512) [0,1] + message (1,100) binary
encoded = encoder(image, message)
# Decode: returns probabilities (1,100) in [0,1]
decoded = decoder(encoded)
bits = (decoded > 0.5).float()
Training
Trained on COCO train2017 (~118K images) for 200K steps on a single NVIDIA T4 GPU.
Config: configs/picotrust_v4.yaml (v4) / configs/picotrust_v2.yaml (v2)
Key Design Decisions
- Grayscale residual: 1-channel E_post output broadcast to 3 channels β eliminates colour shifts by construction
- Softsign bounding:
strength * x / (1 + |x|)β non-vanishing gradients unlike tanh - Strength annealing: Start unbounded (1.0) β anneal to target β prevents bootstrap collapse
- MSE message loss: No trivial equilibrium unlike BCE
- Zero-init E_post: Residual starts at exactly zero, grows gradually
License
Apache 2.0
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support