Spaces:

omm7
/

toxic-royale-env

Sleeping

App Files Files Community

toxic-royale-env / VLOG_SCRIPT_FINAL.md

omm7

Upload folder using huggingface_hub

b0620f3 verified about 1 month ago

preview code

raw

history blame contribute delete

5.04 kB

Toxic Royale Env — Final Vlog Script

Target length: 90-120 seconds.

Main Script

Magnus Carlsen once said Clash Royale is harder than chess. When the world chess champion says that, I had to take it personally.

I am a Clash Royale player myself. I have pushed to 15,000 trophies, the max you can reach on Trophy Road, and when the hackathon judges said, "impress us," I looked around and saw people automating Mario, Pac-Man, and other games with RL and OpenEnv. There were only about 24 hours left, so I decided to build something from my own obsession: a Clash Royale-style RL environment.

This is Toxic Royale.

The idea is not just to make a random bot that plays cards. I wanted an agent that learns the same macro principles real players use: elixir discipline, bridge pressure, spell value, punish windows, and avoiding overcommit. Some of the playstyle comes from my own experience, and some of it is inspired by top players like Mohamed Light.

Technically, Toxic Royale is built on OpenEnv and deployed as a Hugging Face Space. The agent observes elixir, tower HP, hand, board units, opponent pressure, and a compact text state. Each step, it can either wait or play a card into a legal zone. The simulator enforces the important constraints: you cannot place troops illegally, elixir matters, tower damage matters, and the opponent responds with a scripted but reactive policy.

The reward function is where the strategy lives. I shaped it around real Clash Royale fundamentals: reward tower damage and crown advantage, penalize illegal actions and high-elixir stalling, reward efficient damage per elixir, reward spell value, and reward punishing the opponent after they overcommit. I also removed silly shortcuts, like rewarding emotes, because I wanted the agent to win by playing better.

For training evidence, I ran fast RL experiments across versions V2.1 to V2.7, then scaled the final version to 300,000 episodes. The final V2.7 run reached a best reward of +7.6362, with the last 100 episodes averaging +0.6640. The repo includes the reward curves, metrics, and a thrilling match visualization so the judges can inspect what actually happened instead of just trusting a claim.

I also included a Colab notebook and GRPO training pipeline so the environment can be rerun. The Space is live on Hugging Face, the README links the artifacts, and the plots show the agent improving.

And the fun part: I connected this idea back to real Clash Royale through a BlueStacks autoplay pipeline. In controlled tests, it started beating my own baseline decisions. Right now I am around 12k trophies, and on a fresh account this bot is destroying beginners on Trophy Road. If you want to try it, scan my clan QR code, send a request, and in clan chat tell me you want to battle my bot: ToXic RoYaLe.

Screen Recording Cues

Open with a Clash Royale clip or BlueStacks gameplay.
Show the Hugging Face Space: https://huggingface.co/spaces/omm7/toxic-royale-env
Show README.md, especially the Judges section.
Show the environment config: openenv.yaml
Show the OpenEnv server implementation: server/toxic_royale_env_environment.py server/app.py
Show the simulator and reward logic: simulator.py reward_config.py data/reward_config.json
Show training scripts: training/OpenEnv_Hackathon_GRPO_Colab.ipynb training/run_full_pipeline.sh training/train_grpo_one_step.py training/train_fast_rl.py
Show evidence plots: outputs/judge_assets/reward_curve.png outputs/judge_assets/thrilling_match.png outputs/versions/V2_7/V2_7_300k_run/report.png outputs/versions/V2_7/V2_7_300k_run/thrilling_match.png
Show final run summary: outputs/versions/V2_7/V2_7_300k_run/SUMMARY.md
End with the clan QR code and the phrase: "Ask for ToXic RoYaLe in clan chat."

Short Technical Talking Points

OpenEnv environment: openenv.yaml, server/app.py, server/toxic_royale_env_environment.py
Core simulator: simulator.py
Reward configuration: data/reward_config.json, reward_config.py
GRPO/TRL reproducibility: training/OpenEnv_Hackathon_GRPO_Colab.ipynb, training/run_full_pipeline.sh
Fast RL evidence: training/train_fast_rl.py, training/experiment_runner.py
Versioned evidence: outputs/versions/V2_1 through outputs/versions/V2_7
Final large run: outputs/versions/V2_7/V2_7_300k_run/SUMMARY.md
Public Space: https://huggingface.co/spaces/omm7/toxic-royale-env

One-Line Hook Options

"Magnus Carlsen said Clash Royale is harder than chess, so I trained an RL agent to prove him right."
"I had 24 hours left, 15k Clash Royale trophies, and one idea: make my playstyle trainable."
"This is Toxic Royale: an OpenEnv where an RL agent learns elixir, pressure, punish windows, and tower damage."

Ending Line

The repo, Space, Colab, plots, and training evidence are all linked in the README. And if you want to battle the bot, scan the clan QR code and ask for ToXic RoYaLe.