2 min readfrom Machine Learning

I Trained an AI to Beat Final Fight… Here’s What Happened [p]

Our take

In this post, I delve into my experience training an AI agent using Behavior Cloning on the classic arcade game Final Fight. By relying solely on demonstrations, I evaluated the agent's performance in the first stage, navigating challenges like action space remapping and trajectory alignment issues. While the agent shows promise, consistency and survival remain hurdles. I’m eager for community insights on enhancing BC performance, transitioning to PPO, and addressing partial observability.
I Trained an AI to Beat Final Fight… Here’s What Happened [p]
I Trained an AI to Beat Final Fight… Here’s What Happened [p]

Hey everyone,

I’ve been experimenting with Behavior Cloning on a classic arcade game (Final Fight), and I wanted to share the results and get some feedback from the community.

The setup is fairly simple: I trained an agent purely from demonstrations (no reward shaping initially), then evaluated how far it could go in the first stage. I also plan to extend this with GAIL + PPO to see how much performance improves beyond imitation.

A couple of interesting challenges came up:

  • Action space remapping (MultiBinary → emulator input)
  • Trajectory alignment issues (obs/action offset bugs 😅)
  • LSTM policy behaving differently under evaluation vs manual rollout
  • Managing rollouts efficiently without loading everything into memory

The agent can already make some progress, but still struggles with consistency and survival.

I’d love to hear thoughts on:

  • Improving BC performance with limited trajectories
  • Best practices for transitioning BC → PPO
  • Handling partial observability in these environments

Here’s the code if you want to see the full process and results:
notebooks-rl/final_fight at main · paulo101977/notebooks-rl

Any feedback is very welcome!

submitted by /u/AgeOfEmpires4AOE4
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Tagged with

#financial modeling with spreadsheets#rows.com#big data performance#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#no-code spreadsheet solutions#Behavior Cloning#Final Fight#agent#demonstrations#reward shaping#GAIL#PPO#action space remapping#MultiBinary#emulator input#trajectory alignment#LSTM policy#evaluation