| --- |
| license: apache-2.0 |
| tags: |
| - steganography |
| - watermarking |
| - image-watermarking |
| - pytorch |
| --- |
| |
| # PicoTrust — Robust Image Steganography |
|
|
| PicoTrust is a neural image steganography model that encodes hidden binary messages into images, surviving real-world distortions like JPEG compression, blur, noise, and color shifts. |
|
|
| Based on StegaStamp (Tancik et al., CVPR 2020) with significant architectural improvements. |
|
|
| ## Architecture |
|
|
| - **Encoder**: U-Net (512x512) + E_post refinement (1-channel grayscale residual, softsign bounding with strength annealing) |
| - **Decoder**: StegaStamp-style CNN (256x256) with compact STN for geometric alignment |
| - **Discriminator**: WGAN PatchGAN |
| - **Message**: 100 bits per image |
| - **Bounding**: Softsign residual with strength annealing (1.0 → target over training) |
| |
| ## Checkpoints |
| |
| | Model | File | Strength | PSNR | Bit Accuracy | JPEG Q10 | |
| |-------|------|----------|------|-------------|----------| |
| | **v4** | `v4/picotrust_v4_200k.pt` | 0.02 | **35.56 dB** | 97.8% | 97.6% | |
| | **v2** | `v2/picotrust_v2_200k.pt` | 0.03 | 32.82 dB | **98.4%** | **98.6%** | |
| |
| - **v4**: Best balance of visual quality and accuracy. +2.7 dB PSNR over v2 with <1% accuracy trade-off. |
| - **v2**: Highest raw accuracy and JPEG robustness. Zero colour shifts (architectural guarantee). |
| |
| Both models have zero colour shifts (grayscale residual) and >92% accuracy across all distortion types. |
| |
| ## Robustness (v4 @ 200k steps) |
| |
| | Distortion | Bit Accuracy | |
| |------------|-------------| |
| | Clean | 97.8% | |
| | JPEG Q10 | 97.6% | |
| | Gaussian blur (σ=3) | 98.0% | |
| | Gaussian noise (σ=0.05) | 96.6% | |
| | Brightness ±0.3 | 94.6% | |
| | Contrast ±0.3 | 98.2% | |
| |
| ## Usage |
| |
| ```python |
| import torch |
| from picode.models.picotrust import Encoder, Decoder |
| |
| # Load checkpoint |
| ckpt = torch.load("picotrust_v4_200k.pt", map_location="cpu") |
|
|
| # Reconstruct encoder/decoder |
| encoder = Encoder(message_length=100, image_size=512) |
| decoder = Decoder(message_length=100, image_size=256) |
|
|
| encoder.load_state_dict(ckpt["encoder"]) |
| decoder.load_state_dict(ckpt["decoder"]) |
|
|
| # Encode: image (1,3,512,512) [0,1] + message (1,100) binary |
| encoded = encoder(image, message) |
|
|
| # Decode: returns probabilities (1,100) in [0,1] |
| decoded = decoder(encoded) |
| bits = (decoded > 0.5).float() |
| ``` |
| |
| ## Training |
| |
| Trained on COCO train2017 (~118K images) for 200K steps on a single NVIDIA T4 GPU. |
| |
| Config: `configs/picotrust_v4.yaml` (v4) / `configs/picotrust_v2.yaml` (v2) |
| |
| ## Key Design Decisions |
| |
| 1. **Grayscale residual**: 1-channel E_post output broadcast to 3 channels — eliminates colour shifts by construction |
| 2. **Softsign bounding**: `strength * x / (1 + |x|)` — non-vanishing gradients unlike tanh |
| 3. **Strength annealing**: Start unbounded (1.0) → anneal to target — prevents bootstrap collapse |
| 4. **MSE message loss**: No trivial equilibrium unlike BCE |
| 5. **Zero-init E_post**: Residual starts at exactly zero, grows gradually |
| |
| ## License |
| |
| Apache 2.0 |
| |