| --- |
| license: mit |
| language: |
| - ru |
| - en |
| tags: |
| - reinforcement-learning |
| - ppo |
| - network |
| - privacy |
| - censorship-circumvention |
| - vless |
| - research |
| pipeline_tag: reinforcement-learning |
| library_name: pytorch |
| --- |
| |
| # AlphaBypass.3 🧠 |
| <a href="https://ibb.co/B5d8JDdf"><img src="https://i.ibb.co/TxpzGXpw/logo.jpg" alt="logo"></a> |
| > *"The first RL agent trained to understand what a national firewall finds suspicious - and what it doesn't."* |
|
|
| ## What is this? |
|
|
| AlphaBypass is a **PPO-based reinforcement learning agent** trained to automatically discover optimal [VLESS+REALITY](https://github.com/XTLS/Xray-core) proxy configurations that evade Roskomnadzor's (Russian Internet Censorship Agency) Deep Packet Inspection (DPI) systems. |
|
|
| Instead of manually tuning parameters, a neural network figures it out by trial and error - against a real, live DPI system. It learns what combinations of transport, fingerprint, domain, and other parameters actually work. |
|
|
| **This is a research project** studying automated network censorship through adversarial machine learning. Any resemblance to practical use is purely coincidental :) |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |----------|-------| |
| | Architecture | MLP, 3×512 hidden layers with LayerNorm | |
| | Parameters | ~787K | |
| | Algorithm | PPO (Proximal Policy Optimization) | |
| | Action space | Mixed discrete + continuous | |
| | Observation space | 75-dimensional vector | |
| | Training episodes (basically VPNs tried) | ~1,100 | |
| | Target protocol | VLESS + REALITY (xray-core) | |
| | Success rate | **93%** | |
| | Avg reward | +0.81 (scale: −1.0 to +1.0) | |
|
|
| --- |
|
|
| ## Reward Function |
|
|
| ```python |
| def compute_reward(metrics, baseline_mbps=32.0): |
| if not metrics.connected: |
| return -1.0 |
| |
| r = 0.50 * connection_quality(metrics) # ping, loss, connect time |
| r += 0.35 * metrics.stability_ratio # probe success rate |
| r += 0.15 * log_speed_score(metrics, baseline_mbps) |
| return r |
| ``` |
|
|
| --- |
|
|
| ## Usage |
|
|
| Requires [xray-core](https://github.com/XTLS/Xray-core). |
|
|
| ### Load and query the model |
|
|
| ```python |
| import torch |
| import numpy as np |
| from agent import PolicyNetwork |
| from environment import decode_action |
| |
| policy = PolicyNetwork() |
| ck = torch.load("best.pt", map_location="cpu", weights_only=False) |
| policy.load_state_dict(ck["policy_state"]) |
| policy.eval() |
| |
| obs = torch.zeros(1, 75) |
| with torch.no_grad(): |
| logits, mu, _, _ = policy(obs) |
| |
| discrete = np.array([l.argmax().item() for l in logits]) |
| continuous = mu.squeeze().numpy() |
| config = decode_action(discrete, continuous) |
| |
| print(f"{config.transport_type}:{config.proxy_port} → {config.dest_domain}") |
| print(f"fingerprint={config.fingerprint} frag={config.fragment_strategy}") |
| ``` |
|
|
| ### Server config example |
|
|
| ```json |
| { |
| "inbounds": [{ |
| "port": 443, |
| "protocol": "vless", |
| "settings": { |
| "clients": [{"id": "YOUR-UUID-HERE", "flow": ""}], |
| "decryption": "none" |
| }, |
| "streamSettings": { |
| "network": "grpc", |
| "security": "reality", |
| "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"}, |
| "realitySettings": { |
| "dest": "YOUR-SNI-DOMAIN:443", |
| "serverNames": ["YOUR-SNI-DOMAIN"], |
| "privateKey": "YOUR-PRIVATE-KEY", |
| "shortIds": ["YOUR-SHORT-ID"] |
| } |
| } |
| }], |
| "outbounds": [{"tag": "direct", "protocol": "freedom"}] |
| } |
| ``` |
|
|
| ### Client config example |
|
|
| ```json |
| { |
| "inbounds": [{ |
| "port": 10808, |
| "protocol": "socks", |
| "settings": {"auth": "noauth", "udp": true} |
| }], |
| "outbounds": [{ |
| "protocol": "vless", |
| "settings": { |
| "vnext": [{ |
| "address": "YOUR-SERVER-IP", |
| "port": 443, |
| "users": [{"id": "YOUR-UUID-HERE", "encryption": "none"}] |
| }] |
| }, |
| "streamSettings": { |
| "network": "grpc", |
| "security": "reality", |
| "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"}, |
| "realitySettings": { |
| "fingerprint": "safari", |
| "serverName": "YOUR-SNI-DOMAIN", |
| "publicKey": "YOUR-PUBLIC-KEY", |
| "shortId": "YOUR-SHORT-ID" |
| } |
| } |
| }] |
| } |
| ``` |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - DPI behavior varies by provider and region - results may differ. |
| - REALITY is fundamentally difficult to block without collateral damage. Some success may be protocol strength, not agent cleverness. |
| - No memory between deployments - unaware of overnight DPI updates. |
| - 787K parameters is intentional. The problem doesn't need GPT-6. |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{alphabypass2026, |
| title = {AlphaBypass: Reinforcement Learning for Automated DPI Evasion}, |
| year = {2026}, |
| url = {https://huggingface.co/NickupAI/alphabypass3} |
| } |
| ``` |
|
|
| --- |
|
|
| ## License |
|
|
| MIT. Use responsibly. Especially if you live somewhere where VPNs are considered a thought crime. |
|
|
| *"It's not about hiding. It's about the right to reach the open internet."* |