2 min readfrom Machine Learning

WebHarbor - We "dock" the real websites into local for web agents! [R]

Our take

Introducing WebHarbor, a community-driven project designed to simplify web agent benchmarking by docking real websites into local environments. This innovative initiative packages 15 popular sites, including Amazon and GitHub, as self-contained Flask + SQLite apps within a single Docker image. With a control plane that resets each site in under one second, it empowers human-in-the-loop coding agents for efficient web interactions. As we aim to expand to over 100 popular websites, we invite contributions to enhance this transformative resource.

The recent announcement of the WebHarbor project marks a significant advancement in the realm of web agent environments. By packaging 15 popular websites—such as Amazon, GitHub, and BBC News—into self-contained Flask and SQLite applications within a Docker image, WebHarbor addresses many challenges faced by developers working with live web data. The ability to reset each site to a byte-identical state in under one second offers a streamlined approach for training and evaluating web agents, particularly in light of issues like reCAPTCHA, geo-blocks, and content drift that have historically complicated web agent development. This innovation is set to transform how agents interact with the web, making the process more efficient and reliable.

The community-driven aspect of WebHarbor is particularly noteworthy. The project's call for contributions not only invites developers to mirror additional websites but also emphasizes collaborative growth in the field. This aligns well with the spirit of open-source development, encouraging a shared commitment to improving web agent capabilities. Contributors can engage in the coding-agent pipeline and even co-author the final research paper, fostering a sense of ownership and community within the project. This collaborative approach resonates with similar themes found in our recent pieces, such as Continual Harness: Online Adaptation for Self-Improving Foundation Agents which explores the evolution of adaptive systems, and Integrating 3D Heat Equation into a PINN for Real-Time Aerospace Simulation (C++ WASM Engine), highlighting the integration of complex systems in innovative applications.

The implications of WebHarbor extend beyond mere technical execution. The project represents a shift towards more controlled, adaptable environments for web agents, which is crucial for robust machine learning applications. Traditional methods of benchmarking against the live web have been fraught with issues that hinder progress. By creating a lightweight, easily resettable environment, WebHarbor not only enhances the training process but also opens the door for extensive experimentation and innovation. This is particularly important as organizations increasingly rely on automated systems that must navigate the complexities of real-time data.

Moreover, the project's future potential is compelling. With a goal to encompass over 100 popular websites, WebHarbor aims to provide a comprehensive suite of environments that can drive forward the capabilities of AI-driven web interactions. As the project evolves, it will be crucial to monitor how these new environments influence the development of web agents, particularly in terms of their ability to learn and adapt in real-world scenarios. This could have substantial ramifications for various industries, particularly those that depend heavily on web data for analytics and customer engagement.

In conclusion, WebHarbor is not just a technical innovation; it represents a new paradigm in how we think about web agents and their training environments. As the project progresses, it invites us to consider the broader implications of such advancements on productivity and efficiency in web-based applications. Will we see a wave of new AI capabilities emerge from these refined environments, leading to smarter, more intuitive systems? Only time will tell, but the groundwork is undoubtedly being laid for a future where web agents can thrive in a controlled yet dynamic landscape.

Hello! Excited to share our latest community-driven research project: WebHarbor: Docking Real Websites for Evolving GUI Agent Environments!

TL;DR: 15 popular websites (Amazon, GitHub, BBC News, arXiv, Booking, Hugging Face, etc.) packaged as self-contained Flask + SQLite apps in a single Docker image, with a control plane that resets each site to byte-identical state in <1 second, all by human-in-the-loop coding agent (e.g., Claude Code or CodeX). We support all 643 WebVoyager tasks out of the box.

Call for contribution: Our Next goal is 100+ popular websites — covering all of Online-Mind2Web (147 sites) and beyond. Two tracks:

  • Contribute a new mirror site (use the coding-agent pipeline → human verify → open PR) → co-author on the final paper
  • Review submitted PRs (5 reviews → co-author)

We also released useful skills for you(your coding agent) to work on it! Typically you can create a new mirron within 1 day! See more contribution details at Contribute Guide.

Why WebHarbor: running web agent benchmarks on the live web is a nightmare — reCAPTCHA, geo-blocks, content drift, network flakiness, and tasks that go stale within months. Plus you can't reset the live web, which rules out heavy RL training. You will need a lightweight, easy-to-reset, task-driven evolving environments for web agent, both evaluation and training!

Related Resources:

Name Link
🏠 WebHarbor Project Page WebHarbor
🤗 HuggingFace Dataset ChilleD/WebHarbor
💻 WebHarbor GitHub Code Repo
📊 Contribution Guide Guide Details
📝 Contribution Request Form Google Form

Welcome suggestions and discussions!

submitted by /u/ArtichokeHelpful7462
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Tagged with

#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#rows.com#AI-driven spreadsheet solutions#no-code spreadsheet solutions#real-time data collaboration#real-time collaboration#self-service analytics tools#large dataset processing#google sheets#financial modeling with spreadsheets#self-service analytics#WebHarbor#Docking#web agents#GUI Agent Environments#Docker image#Flask#SQLite