File size: 4,774 Bytes
5ff2e7f cdd9e1f 5ff2e7f ec51225 5ff2e7f b483a85 5ff2e7f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | ---
license: mit
language:
- ru
- en
tags:
- reinforcement-learning
- ppo
- network
- privacy
- censorship-circumvention
- vless
- research
pipeline_tag: reinforcement-learning
library_name: pytorch
---
# AlphaBypass.3 🧠
<a href="https://ibb.co/B5d8JDdf"><img src="https://i.ibb.co/TxpzGXpw/logo.jpg" alt="logo"></a>
> *"The first RL agent trained to understand what a national firewall finds suspicious - and what it doesn't."*
## What is this?
AlphaBypass is a **PPO-based reinforcement learning agent** trained to automatically discover optimal [VLESS+REALITY](https://github.com/XTLS/Xray-core) proxy configurations that evade Roskomnadzor's (Russian Internet Censorship Agency) Deep Packet Inspection (DPI) systems.
Instead of manually tuning parameters, a neural network figures it out by trial and error - against a real, live DPI system. It learns what combinations of transport, fingerprint, domain, and other parameters actually work.
**This is a research project** studying automated network censorship through adversarial machine learning. Any resemblance to practical use is purely coincidental :)
---
## Model Details
| Property | Value |
|----------|-------|
| Architecture | MLP, 3×512 hidden layers with LayerNorm |
| Parameters | ~787K |
| Algorithm | PPO (Proximal Policy Optimization) |
| Action space | Mixed discrete + continuous |
| Observation space | 75-dimensional vector |
| Training episodes (basically VPNs tried) | ~1,100 |
| Target protocol | VLESS + REALITY (xray-core) |
| Success rate | **93%** |
| Avg reward | +0.81 (scale: −1.0 to +1.0) |
---
## Reward Function
```python
def compute_reward(metrics, baseline_mbps=32.0):
if not metrics.connected:
return -1.0
r = 0.50 * connection_quality(metrics) # ping, loss, connect time
r += 0.35 * metrics.stability_ratio # probe success rate
r += 0.15 * log_speed_score(metrics, baseline_mbps)
return r
```
---
## Usage
Requires [xray-core](https://github.com/XTLS/Xray-core).
### Load and query the model
```python
import torch
import numpy as np
from agent import PolicyNetwork
from environment import decode_action
policy = PolicyNetwork()
ck = torch.load("best.pt", map_location="cpu", weights_only=False)
policy.load_state_dict(ck["policy_state"])
policy.eval()
obs = torch.zeros(1, 75)
with torch.no_grad():
logits, mu, _, _ = policy(obs)
discrete = np.array([l.argmax().item() for l in logits])
continuous = mu.squeeze().numpy()
config = decode_action(discrete, continuous)
print(f"{config.transport_type}:{config.proxy_port} → {config.dest_domain}")
print(f"fingerprint={config.fingerprint} frag={config.fragment_strategy}")
```
### Server config example
```json
{
"inbounds": [{
"port": 443,
"protocol": "vless",
"settings": {
"clients": [{"id": "YOUR-UUID-HERE", "flow": ""}],
"decryption": "none"
},
"streamSettings": {
"network": "grpc",
"security": "reality",
"grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
"realitySettings": {
"dest": "YOUR-SNI-DOMAIN:443",
"serverNames": ["YOUR-SNI-DOMAIN"],
"privateKey": "YOUR-PRIVATE-KEY",
"shortIds": ["YOUR-SHORT-ID"]
}
}
}],
"outbounds": [{"tag": "direct", "protocol": "freedom"}]
}
```
### Client config example
```json
{
"inbounds": [{
"port": 10808,
"protocol": "socks",
"settings": {"auth": "noauth", "udp": true}
}],
"outbounds": [{
"protocol": "vless",
"settings": {
"vnext": [{
"address": "YOUR-SERVER-IP",
"port": 443,
"users": [{"id": "YOUR-UUID-HERE", "encryption": "none"}]
}]
},
"streamSettings": {
"network": "grpc",
"security": "reality",
"grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
"realitySettings": {
"fingerprint": "safari",
"serverName": "YOUR-SNI-DOMAIN",
"publicKey": "YOUR-PUBLIC-KEY",
"shortId": "YOUR-SHORT-ID"
}
}
}]
}
```
---
## Limitations
- DPI behavior varies by provider and region - results may differ.
- REALITY is fundamentally difficult to block without collateral damage. Some success may be protocol strength, not agent cleverness.
- No memory between deployments - unaware of overnight DPI updates.
- 787K parameters is intentional. The problem doesn't need GPT-6.
---
## Citation
```bibtex
@misc{alphabypass2026,
title = {AlphaBypass: Reinforcement Learning for Automated DPI Evasion},
year = {2026},
url = {https://huggingface.co/NickupAI/alphabypass3}
}
```
---
## License
MIT. Use responsibly. Especially if you live somewhere where VPNs are considered a thought crime.
*"It's not about hiding. It's about the right to reach the open internet."* |