2 min readfrom Machine Learning

I Trained an AI to Beat Final Fight… Here’s What Happened [P]

Our take

In my latest project, I trained an AI agent using Behavior Cloning to tackle the classic arcade game Final Fight. The initial phase involved learning from demonstrations without reward shaping, and I assessed its performance in the first stage. While the agent shows potential, it faces challenges with action space remapping and trajectory alignment. I plan to enhance its capabilities by integrating GAIL and PPO methods. I welcome community feedback on improving performance, transitioning techniques, and addressing partial observability.
I Trained an AI to Beat Final Fight… Here’s What Happened [P]
I Trained an AI to Beat Final Fight… Here’s What Happened [P]

Hey everyone,

I’ve been experimenting with Behavior Cloning on a classic arcade game (Final Fight), and I wanted to share the results and get some feedback from the community.

The setup is fairly simple: I trained an agent purely from demonstrations (no reward shaping initially), then evaluated how far it could go in the first stage. I also plan to extend this with GAIL + PPO to see how much performance improves beyond imitation.

A couple of interesting challenges came up:

  • Action space remapping (MultiBinary → emulator input)
  • Trajectory alignment issues (obs/action offset bugs 😅)
  • LSTM policy behaving differently under evaluation vs manual rollout
  • Managing rollouts efficiently without loading everything into memory

The agent can already make some progress, but still struggles with consistency and survival.

I’d love to hear thoughts on:

  • Improving BC performance with limited trajectories
  • Best practices for transitioning BC → PPO
  • Handling partial observability in these environments

Here’s the code if you want to see the full process and results:
notebooks-rl/final_fight at main · paulo101977/notebooks-rl

Any feedback is very welcome!

submitted by /u/AgeOfEmpires4AOE4
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Tagged with

#financial modeling with spreadsheets#rows.com#big data performance#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#no-code spreadsheet solutions#Final Fight#Behavior Cloning#PPO#agent#LSTM policy#GAIL#demonstrations#Trajectory alignment#action space remapping#reward shaping#evaluation#rollouts#obs/action offset