--- license: mit language: - ru - en tags: - reinforcement-learning - ppo - network - privacy - censorship-circumvention - vless - research pipeline_tag: reinforcement-learning library_name: pytorch --- # AlphaBypass.3 🧠 logo > *"The first RL agent trained to understand what a national firewall finds suspicious - and what it doesn't."* ## What is this? AlphaBypass is a **PPO-based reinforcement learning agent** trained to automatically discover optimal [VLESS+REALITY](https://github.com/XTLS/Xray-core) proxy configurations that evade Roskomnadzor's (Russian Internet Censorship Agency) Deep Packet Inspection (DPI) systems. Instead of manually tuning parameters, a neural network figures it out by trial and error - against a real, live DPI system. It learns what combinations of transport, fingerprint, domain, and other parameters actually work. **This is a research project** studying automated network censorship through adversarial machine learning. Any resemblance to practical use is purely coincidental :) --- ## Model Details | Property | Value | |----------|-------| | Architecture | MLP, 3×512 hidden layers with LayerNorm | | Parameters | ~787K | | Algorithm | PPO (Proximal Policy Optimization) | | Action space | Mixed discrete + continuous | | Observation space | 75-dimensional vector | | Training episodes (basically VPNs tried) | ~1,100 | | Target protocol | VLESS + REALITY (xray-core) | | Success rate | **93%** | | Avg reward | +0.81 (scale: −1.0 to +1.0) | --- ## Reward Function ```python def compute_reward(metrics, baseline_mbps=32.0): if not metrics.connected: return -1.0 r = 0.50 * connection_quality(metrics) # ping, loss, connect time r += 0.35 * metrics.stability_ratio # probe success rate r += 0.15 * log_speed_score(metrics, baseline_mbps) return r ``` --- ## Usage Requires [xray-core](https://github.com/XTLS/Xray-core). ### Load and query the model ```python import torch import numpy as np from agent import PolicyNetwork from environment import decode_action policy = PolicyNetwork() ck = torch.load("best.pt", map_location="cpu", weights_only=False) policy.load_state_dict(ck["policy_state"]) policy.eval() obs = torch.zeros(1, 75) with torch.no_grad(): logits, mu, _, _ = policy(obs) discrete = np.array([l.argmax().item() for l in logits]) continuous = mu.squeeze().numpy() config = decode_action(discrete, continuous) print(f"{config.transport_type}:{config.proxy_port} → {config.dest_domain}") print(f"fingerprint={config.fingerprint} frag={config.fragment_strategy}") ``` ### Server config example ```json { "inbounds": [{ "port": 443, "protocol": "vless", "settings": { "clients": [{"id": "YOUR-UUID-HERE", "flow": ""}], "decryption": "none" }, "streamSettings": { "network": "grpc", "security": "reality", "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"}, "realitySettings": { "dest": "YOUR-SNI-DOMAIN:443", "serverNames": ["YOUR-SNI-DOMAIN"], "privateKey": "YOUR-PRIVATE-KEY", "shortIds": ["YOUR-SHORT-ID"] } } }], "outbounds": [{"tag": "direct", "protocol": "freedom"}] } ``` ### Client config example ```json { "inbounds": [{ "port": 10808, "protocol": "socks", "settings": {"auth": "noauth", "udp": true} }], "outbounds": [{ "protocol": "vless", "settings": { "vnext": [{ "address": "YOUR-SERVER-IP", "port": 443, "users": [{"id": "YOUR-UUID-HERE", "encryption": "none"}] }] }, "streamSettings": { "network": "grpc", "security": "reality", "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"}, "realitySettings": { "fingerprint": "safari", "serverName": "YOUR-SNI-DOMAIN", "publicKey": "YOUR-PUBLIC-KEY", "shortId": "YOUR-SHORT-ID" } } }] } ``` --- ## Limitations - DPI behavior varies by provider and region - results may differ. - REALITY is fundamentally difficult to block without collateral damage. Some success may be protocol strength, not agent cleverness. - No memory between deployments - unaware of overnight DPI updates. - 787K parameters is intentional. The problem doesn't need GPT-6. --- ## Citation ```bibtex @misc{alphabypass2026, title = {AlphaBypass: Reinforcement Learning for Automated DPI Evasion}, year = {2026}, url = {https://huggingface.co/NickupAI/alphabypass3} } ``` --- ## License MIT. Use responsibly. Especially if you live somewhere where VPNs are considered a thought crime. *"It's not about hiding. It's about the right to reach the open internet."*