File size: 2,947 Bytes
34c4e14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: apache-2.0
tags:
  - steganography
  - watermarking
  - image-watermarking
  - pytorch
---

# PicoTrust — Robust Image Steganography

PicoTrust is a neural image steganography model that encodes hidden binary messages into images, surviving real-world distortions like JPEG compression, blur, noise, and color shifts.

Based on StegaStamp (Tancik et al., CVPR 2020) with significant architectural improvements.

## Architecture

- **Encoder**: U-Net (512x512) + E_post refinement (1-channel grayscale residual, softsign bounding with strength annealing)
- **Decoder**: StegaStamp-style CNN (256x256) with compact STN for geometric alignment
- **Discriminator**: WGAN PatchGAN
- **Message**: 100 bits per image
- **Bounding**: Softsign residual with strength annealing (1.0 → target over training)

## Checkpoints

| Model | File | Strength | PSNR | Bit Accuracy | JPEG Q10 |
|-------|------|----------|------|-------------|----------|
| **v4** | `v4/picotrust_v4_200k.pt` | 0.02 | **35.56 dB** | 97.8% | 97.6% |
| **v2** | `v2/picotrust_v2_200k.pt` | 0.03 | 32.82 dB | **98.4%** | **98.6%** |

- **v4**: Best balance of visual quality and accuracy. +2.7 dB PSNR over v2 with <1% accuracy trade-off.
- **v2**: Highest raw accuracy and JPEG robustness. Zero colour shifts (architectural guarantee).

Both models have zero colour shifts (grayscale residual) and >92% accuracy across all distortion types.

## Robustness (v4 @ 200k steps)

| Distortion | Bit Accuracy |
|------------|-------------|
| Clean | 97.8% |
| JPEG Q10 | 97.6% |
| Gaussian blur (σ=3) | 98.0% |
| Gaussian noise (σ=0.05) | 96.6% |
| Brightness ±0.3 | 94.6% |
| Contrast ±0.3 | 98.2% |

## Usage

```python
import torch
from picode.models.picotrust import Encoder, Decoder

# Load checkpoint
ckpt = torch.load("picotrust_v4_200k.pt", map_location="cpu")

# Reconstruct encoder/decoder
encoder = Encoder(message_length=100, image_size=512)
decoder = Decoder(message_length=100, image_size=256)

encoder.load_state_dict(ckpt["encoder"])
decoder.load_state_dict(ckpt["decoder"])

# Encode: image (1,3,512,512) [0,1] + message (1,100) binary
encoded = encoder(image, message)

# Decode: returns probabilities (1,100) in [0,1]
decoded = decoder(encoded)
bits = (decoded > 0.5).float()
```

## Training

Trained on COCO train2017 (~118K images) for 200K steps on a single NVIDIA T4 GPU.

Config: `configs/picotrust_v4.yaml` (v4) / `configs/picotrust_v2.yaml` (v2)

## Key Design Decisions

1. **Grayscale residual**: 1-channel E_post output broadcast to 3 channels — eliminates colour shifts by construction
2. **Softsign bounding**: `strength * x / (1 + |x|)` — non-vanishing gradients unlike tanh
3. **Strength annealing**: Start unbounded (1.0) → anneal to target — prevents bootstrap collapse
4. **MSE message loss**: No trivial equilibrium unlike BCE
5. **Zero-init E_post**: Residual starts at exactly zero, grows gradually

## License

Apache 2.0