File size: 4,774 Bytes
5ff2e7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cdd9e1f
5ff2e7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec51225
5ff2e7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b483a85
5ff2e7f
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
license: mit
language:
- ru
- en
tags:
- reinforcement-learning
- ppo
- network
- privacy
- censorship-circumvention
- vless
- research
pipeline_tag: reinforcement-learning
library_name: pytorch
---

# AlphaBypass.3 🧠
<a href="https://ibb.co/B5d8JDdf"><img src="https://i.ibb.co/TxpzGXpw/logo.jpg" alt="logo"></a>
> *"The first RL agent trained to understand what a national firewall finds suspicious - and what it doesn't."*

## What is this?

AlphaBypass is a **PPO-based reinforcement learning agent** trained to automatically discover optimal [VLESS+REALITY](https://github.com/XTLS/Xray-core) proxy configurations that evade Roskomnadzor's (Russian Internet Censorship Agency) Deep Packet Inspection (DPI) systems.

Instead of manually tuning parameters, a neural network figures it out by trial and error - against a real, live DPI system. It learns what combinations of transport, fingerprint, domain, and other parameters actually work.

**This is a research project** studying automated network censorship through adversarial machine learning. Any resemblance to practical use is purely coincidental :)

---

## Model Details

| Property | Value |
|----------|-------|
| Architecture | MLP, 3×512 hidden layers with LayerNorm |
| Parameters | ~787K |
| Algorithm | PPO (Proximal Policy Optimization) |
| Action space | Mixed discrete + continuous |
| Observation space | 75-dimensional vector |
| Training episodes (basically VPNs tried) | ~1,100 |
| Target protocol | VLESS + REALITY (xray-core) |
| Success rate | **93%** |
| Avg reward | +0.81 (scale: −1.0 to +1.0) |

---

## Reward Function

```python
def compute_reward(metrics, baseline_mbps=32.0):
    if not metrics.connected:
        return -1.0

    r  = 0.50 * connection_quality(metrics)  # ping, loss, connect time
    r += 0.35 * metrics.stability_ratio      # probe success rate
    r += 0.15 * log_speed_score(metrics, baseline_mbps)
    return r
```

---

## Usage

Requires [xray-core](https://github.com/XTLS/Xray-core).

### Load and query the model

```python
import torch
import numpy as np
from agent import PolicyNetwork
from environment import decode_action

policy = PolicyNetwork()
ck = torch.load("best.pt", map_location="cpu", weights_only=False)
policy.load_state_dict(ck["policy_state"])
policy.eval()

obs = torch.zeros(1, 75)
with torch.no_grad():
    logits, mu, _, _ = policy(obs)

discrete   = np.array([l.argmax().item() for l in logits])
continuous = mu.squeeze().numpy()
config     = decode_action(discrete, continuous)

print(f"{config.transport_type}:{config.proxy_port} → {config.dest_domain}")
print(f"fingerprint={config.fingerprint} frag={config.fragment_strategy}")
```

### Server config example

```json
{
  "inbounds": [{
    "port": 443,
    "protocol": "vless",
    "settings": {
      "clients": [{"id": "YOUR-UUID-HERE", "flow": ""}],
      "decryption": "none"
    },
    "streamSettings": {
      "network": "grpc",
      "security": "reality",
      "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
      "realitySettings": {
        "dest": "YOUR-SNI-DOMAIN:443",
        "serverNames": ["YOUR-SNI-DOMAIN"],
        "privateKey": "YOUR-PRIVATE-KEY",
        "shortIds": ["YOUR-SHORT-ID"]
      }
    }
  }],
  "outbounds": [{"tag": "direct", "protocol": "freedom"}]
}
```

### Client config example

```json
{
  "inbounds": [{
    "port": 10808,
    "protocol": "socks",
    "settings": {"auth": "noauth", "udp": true}
  }],
  "outbounds": [{
    "protocol": "vless",
    "settings": {
      "vnext": [{
        "address": "YOUR-SERVER-IP",
        "port": 443,
        "users": [{"id": "YOUR-UUID-HERE", "encryption": "none"}]
      }]
    },
    "streamSettings": {
      "network": "grpc",
      "security": "reality",
      "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
      "realitySettings": {
        "fingerprint": "safari",
        "serverName": "YOUR-SNI-DOMAIN",
        "publicKey": "YOUR-PUBLIC-KEY",
        "shortId": "YOUR-SHORT-ID"
      }
    }
  }]
}
```

---

## Limitations

- DPI behavior varies by provider and region - results may differ.
- REALITY is fundamentally difficult to block without collateral damage. Some success may be protocol strength, not agent cleverness.
- No memory between deployments - unaware of overnight DPI updates.
- 787K parameters is intentional. The problem doesn't need GPT-6.

---

## Citation

```bibtex
@misc{alphabypass2026,
  title = {AlphaBypass: Reinforcement Learning for Automated DPI Evasion},
  year  = {2026},
  url   = {https://huggingface.co/NickupAI/alphabypass3}
}
```

---

## License

MIT. Use responsibly. Especially if you live somewhere where VPNs are considered a thought crime.

*"It's not about hiding. It's about the right to reach the open internet."*