I made a superhuman Generals.io agent with self-play RL [P]
Our take
The recent achievement of creating a superhuman AI agent for Generals.io, detailed in a captivating blog post by /u/shrekofspeed, underscores a compelling trend in reinforcement learning: the power of scaling and a shift away from reliance on human-engineered heuristics. The agent, dubbed "AverageJoe," not only surpassed previous algorithmic attempts but also claimed the top spot on the human 1v1 leaderboard, a remarkable feat in the competitive realm of real-time strategy (RTS) games. This success builds upon earlier explorations of agentic workflows, such as those described in Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost, demonstrating a broader movement towards more efficient and effective AI training methodologies. Furthermore, the challenges overcome in this project resonate with the ongoing pursuit of superior optical character recognition models, as highlighted in Find the best open-source OCR models in one place at Papers with Code, where benchmarking and optimization are critical for achieving state-of-the-art results.
What truly distinguishes this project is the deliberate decision to prioritize scaling through architectural improvements—specifically, the adoption of JAX and a Vision Transformer—over extensive manual tuning and human-derived reward shaping. This approach, the author argues, represents a more sustainable and ultimately more powerful strategy for achieving superhuman performance. The reimplementation in JAX, known for its performance and suitability for numerical computation, coupled with the use of a Vision Transformer (replacing a traditional CNN) allowed for significantly improved data processing and feature extraction within the game environment. This echoes a broader sentiment within the AI research community – that computational resources, strategically deployed, can often outperform intricate hand-crafted solutions. The open-sourcing of both the agent and the JAX-based simulator further democratizes access to this technology, enabling others to build upon this work and explore the potential of self-play RL in RTS games, and potentially beyond.
The significance of this development extends beyond mere leaderboard dominance. Generals.io, while a compelling game for demonstrating AI capabilities, exemplifies the complexity of imperfect-information, real-time environments—characteristics common to many real-world decision-making scenarios. The success of AverageJoe suggests that similar scaling-focused approaches could be applied to other complex domains, such as financial trading, logistics optimization, and even autonomous robotics. It signals a move away from the traditional paradigm of painstakingly crafting reward functions and feature engineering and towards a more data-driven approach where the AI learns through extensive self-play, guided by fundamental architectural choices that facilitate scale. The author’s detailed guide provides invaluable insights into the challenges encountered and the lessons learned during this process, offering a roadmap for others venturing into similar territory.
Looking ahead, the question becomes: how far can this scaling strategy be pushed? Will we see similar superhuman agents emerge in other complex, imperfect-information games? The progress made in positional embeddings, as explored in High Dimensional, Dynamic Rotary Positional Embedding demonstrates a continued refinement of underlying AI techniques. The combination of increasingly powerful hardware, refined architectures like Transformers, and innovative training methodologies such as self-play RL presents a compelling vision for the future of AI decision-making—one where sophisticated agents learn and adapt in complex environments with minimal human intervention. The performance of AverageJoe is not just a victory in a game; it’s a glimpse into a future where AI increasingly tackles real-world challenges with remarkable autonomy and effectiveness.
Hi everyone,
I trained a self-play RL agent for Generals.io that reached superhuman-level and ranked #1 on the human 1v1 leaderboard.
It began as my master's thesis where the goal was to beat a prior algorithm based agent. We succeeded using behavior cloning, RL fine-tuning and reward shaping, but the agent was still consistently beaten by the top players.
So I gave it a round two and fixed the largest bottlenecks:
- Reimplemented the whole pipeline in JAX (from NumPy/Torch)
- Used Vision Transformer instead of the CNN
Both are a result of the same idea: to invest in scaling rather than human priors and ad-hoc patches.
The blog is written as a guide for anyone building something similar — the dead ends, the decisions, and the intuitions and tricks I picked up along the way.
It's all open source, including the fast JAX simulator — handy on its own if you want an imperfect-information RTS env to play with.
Links
- Guide: https://kam.mff.cuni.cz/~straka/blog/generals.html
- Simulator (JAX): https://github.com/strakam/generals-bots
- Agent: https://github.com/strakam/AverageJoe
I hope you find the blogpost entertaining!
Feedback and questions welcome 🤗.
[link] [comments]
Read on the original site
Open the publisher's page for the full experience