File size: 4,774 Bytes

---
license: mit
language:
- ru
- en
tags:
- reinforcement-learning
- ppo
- network
- privacy
- censorship-circumvention
- vless
- research
pipeline_tag: reinforcement-learning
library_name: pytorch
---

# AlphaBypass.3 🧠
<a href="https://ibb.co/B5d8JDdf"><img src="https://i.ibb.co/TxpzGXpw/logo.jpg" alt="logo"></a>
> *"The first RL agent trained to understand what a national firewall finds suspicious - and what it doesn't."*

## What is this?

AlphaBypass is a **PPO-based reinforcement learning agent** trained to automatically discover optimal [VLESS+REALITY](https://github.com/XTLS/Xray-core) proxy configurations that evade Roskomnadzor's (Russian Internet Censorship Agency) Deep Packet Inspection (DPI) systems.

Instead of manually tuning parameters, a neural network figures it out by trial and error - against a real, live DPI system. It learns what combinations of transport, fingerprint, domain, and other parameters actually work.

**This is a research project** studying automated network censorship through adversarial machine learning. Any resemblance to practical use is purely coincidental :)

---

## Model Details

| Property | Value |
|----------|-------|
| Architecture | MLP, 3×512 hidden layers with LayerNorm |
| Parameters | ~787K |
| Algorithm | PPO (Proximal Policy Optimization) |
| Action space | Mixed discrete + continuous |
| Observation space | 75-dimensional vector |
| Training episodes (basically VPNs tried) | ~1,100 |
| Target protocol | VLESS + REALITY (xray-core) |
| Success rate | **93%** |
| Avg reward | +0.81 (scale: −1.0 to +1.0) |

---

## Reward Function

```python
def compute_reward(metrics, baseline_mbps=32.0):
    if not metrics.connected:
        return -1.0

    r  = 0.50 * connection_quality(metrics)  # ping, loss, connect time
    r += 0.35 * metrics.stability_ratio      # probe success rate
    r += 0.15 * log_speed_score(metrics, baseline_mbps)
    return r
```

---

## Usage

Requires [xray-core](https://github.com/XTLS/Xray-core).

### Load and query the model

```python
import torch
import numpy as np
from agent import PolicyNetwork
from environment import decode_action

policy = PolicyNetwork()
ck = torch.load("best.pt", map_location="cpu", weights_only=False)
policy.load_state_dict(ck["policy_state"])
policy.eval()

obs = torch.zeros(1, 75)
with torch.no_grad():
    logits, mu, _, _ = policy(obs)

discrete   = np.array([l.argmax().item() for l in logits])
continuous = mu.squeeze().numpy()
config     = decode_action(discrete, continuous)

print(f"{config.transport_type}:{config.proxy_port} → {config.dest_domain}")
print(f"fingerprint={config.fingerprint} frag={config.fragment_strategy}")
```

### Server config example

```json
{
  "inbounds": [{
    "port": 443,
    "protocol": "vless",
    "settings": {
      "clients": [{"id": "YOUR-UUID-HERE", "flow": ""}],
      "decryption": "none"
    },
    "streamSettings": {
      "network": "grpc",
      "security": "reality",
      "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
      "realitySettings": {
        "dest": "YOUR-SNI-DOMAIN:443",
        "serverNames": ["YOUR-SNI-DOMAIN"],
        "privateKey": "YOUR-PRIVATE-KEY",
        "shortIds": ["YOUR-SHORT-ID"]
      }
    }
  }],
  "outbounds": [{"tag": "direct", "protocol": "freedom"}]
}
```

### Client config example

```json
{
  "inbounds": [{
    "port": 10808,
    "protocol": "socks",
    "settings": {"auth": "noauth", "udp": true}
  }],
  "outbounds": [{
    "protocol": "vless",
    "settings": {
      "vnext": [{
        "address": "YOUR-SERVER-IP",
        "port": 443,
        "users": [{"id": "YOUR-UUID-HERE", "encryption": "none"}]
      }]
    },
    "streamSettings": {
      "network": "grpc",
      "security": "reality",
      "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
      "realitySettings": {
        "fingerprint": "safari",
        "serverName": "YOUR-SNI-DOMAIN",
        "publicKey": "YOUR-PUBLIC-KEY",
        "shortId": "YOUR-SHORT-ID"
      }
    }
  }]
}
```

---

## Limitations

- DPI behavior varies by provider and region - results may differ.
- REALITY is fundamentally difficult to block without collateral damage. Some success may be protocol strength, not agent cleverness.
- No memory between deployments - unaware of overnight DPI updates.
- 787K parameters is intentional. The problem doesn't need GPT-6.

---

## Citation

```bibtex
@misc{alphabypass2026,
  title = {AlphaBypass: Reinforcement Learning for Automated DPI Evasion},
  year  = {2026},
  url   = {https://huggingface.co/NickupAI/alphabypass3}
}
```

---

## License

MIT. Use responsibly. Especially if you live somewhere where VPNs are considered a thought crime.

*"It's not about hiding. It's about the right to reach the open internet."*