Update README.md

ec51225 verified 14 days ago

4.77 kB

	---
	license: mit
	language:
	- ru
	- en
	tags:
	- reinforcement-learning
	- ppo
	- network
	- privacy
	- censorship-circumvention
	- vless
	- research
	pipeline_tag: reinforcement-learning
	library_name: pytorch
	---

	# AlphaBypass.3 🧠
	<a href="https://ibb.co/B5d8JDdf"><img src="https://i.ibb.co/TxpzGXpw/logo.jpg" alt="logo"></a>
	> "The first RL agent trained to understand what a national firewall finds suspicious - and what it doesn't."

	## What is this?

	AlphaBypass is a PPO-based reinforcement learning agent trained to automatically discover optimal [VLESS+REALITY](https://github.com/XTLS/Xray-core) proxy configurations that evade Roskomnadzor's (Russian Internet Censorship Agency) Deep Packet Inspection (DPI) systems.

	Instead of manually tuning parameters, a neural network figures it out by trial and error - against a real, live DPI system. It learns what combinations of transport, fingerprint, domain, and other parameters actually work.

	This is a research project studying automated network censorship through adversarial machine learning. Any resemblance to practical use is purely coincidental :)

	---

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| MLP, 3×512 hidden layers with LayerNorm \|
	\| Parameters \| ~787K \|
	\| Algorithm \| PPO (Proximal Policy Optimization) \|
	\| Action space \| Mixed discrete + continuous \|
	\| Observation space \| 75-dimensional vector \|
	\| Training episodes (basically VPNs tried) \| ~1,100 \|
	\| Target protocol \| VLESS + REALITY (xray-core) \|
	\| Success rate \| 93% \|
	\| Avg reward \| +0.81 (scale: −1.0 to +1.0) \|

	---

	## Reward Function

	```python
	def compute_reward(metrics, baseline_mbps=32.0):
	if not metrics.connected:
	return -1.0

	r = 0.50 * connection_quality(metrics) # ping, loss, connect time
	r += 0.35 * metrics.stability_ratio # probe success rate
	r += 0.15 * log_speed_score(metrics, baseline_mbps)
	return r
	```

	---

	## Usage

	Requires [xray-core](https://github.com/XTLS/Xray-core).

	### Load and query the model

	```python
	import torch
	import numpy as np
	from agent import PolicyNetwork
	from environment import decode_action

	policy = PolicyNetwork()
	ck = torch.load("best.pt", map_location="cpu", weights_only=False)
	policy.load_state_dict(ck["policy_state"])
	policy.eval()

	obs = torch.zeros(1, 75)
	with torch.no_grad():
	logits, mu, _, _ = policy(obs)

	discrete = np.array([l.argmax().item() for l in logits])
	continuous = mu.squeeze().numpy()
	config = decode_action(discrete, continuous)

	print(f"{config.transport_type}:{config.proxy_port} → {config.dest_domain}")
	print(f"fingerprint={config.fingerprint} frag={config.fragment_strategy}")
	```

	### Server config example

	```json
	{
	"inbounds": [{
	"port": 443,
	"protocol": "vless",
	"settings": {
	"clients": [{"id": "YOUR-UUID-HERE", "flow": ""}],
	"decryption": "none"
	},
	"streamSettings": {
	"network": "grpc",
	"security": "reality",
	"grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
	"realitySettings": {
	"dest": "YOUR-SNI-DOMAIN:443",
	"serverNames": ["YOUR-SNI-DOMAIN"],
	"privateKey": "YOUR-PRIVATE-KEY",
	"shortIds": ["YOUR-SHORT-ID"]
	}
	}
	}],
	"outbounds": [{"tag": "direct", "protocol": "freedom"}]
	}
	```

	### Client config example

	```json
	{
	"inbounds": [{
	"port": 10808,
	"protocol": "socks",
	"settings": {"auth": "noauth", "udp": true}
	}],
	"outbounds": [{
	"protocol": "vless",
	"settings": {
	"vnext": [{
	"address": "YOUR-SERVER-IP",
	"port": 443,
	"users": [{"id": "YOUR-UUID-HERE", "encryption": "none"}]
	}]
	},
	"streamSettings": {
	"network": "grpc",
	"security": "reality",
	"grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
	"realitySettings": {
	"fingerprint": "safari",
	"serverName": "YOUR-SNI-DOMAIN",
	"publicKey": "YOUR-PUBLIC-KEY",
	"shortId": "YOUR-SHORT-ID"
	}
	}
	}]
	}
	```

	---

	## Limitations

	- DPI behavior varies by provider and region - results may differ.
	- REALITY is fundamentally difficult to block without collateral damage. Some success may be protocol strength, not agent cleverness.
	- No memory between deployments - unaware of overnight DPI updates.
	- 787K parameters is intentional. The problem doesn't need GPT-6.

	---

	## Citation

	```bibtex
	@misc{alphabypass2026,
	title = {AlphaBypass: Reinforcement Learning for Automated DPI Evasion},
	year = {2026},
	url = {https://huggingface.co/NickupAI/alphabypass3}
	}
	```

	---

	## License

	MIT. Use responsibly. Especially if you live somewhere where VPNs are considered a thought crime.

	"It's not about hiding. It's about the right to reach the open internet."