2 min readfrom Machine Learning

Introducing Papers Without Code [P]

Our take

Papers With Code (PwC) has been relaunched as the definitive resource for identifying state-of-the-art AI models across diverse domains, from 3D generation to AI agents. Leveraging automated parsing of research papers from arXiv and Hugging Face, PwC dynamically generates leaderboards and provides clear visualizations—including scatter plots and tables—to track model performance. Notably, PwC now incorporates evaluations for closed-source models like GPT-5.5, while offering users the option to filter for open-source results. Explore further insights on the evolving landscape of post-doc opportunities in machine learning.
Introducing Papers Without Code [P]

The recent relaunch of Papers With Code (PwC) by Hugging Face’s open-source team, spearheaded by Niels Rogge, represents a significant shift in how the AI community tracks and understands state-of-the-art (SOTA) performance. Previously, navigating the rapidly evolving landscape of AI research, particularly in areas like 3D generation and AI agents, could be a fragmented and time-consuming process. PwC centralizes this information, automatically parsing research papers from arXiv and Hugging Face to generate dynamic leaderboards. This is particularly relevant given the ongoing discussions surrounding the impact of post-doctoral opportunities in ML [Post-docs in ML], and the wider challenges of keeping abreast of new developments. The inclusion of evaluations for closed-source models, like GPT-5.5 and Mythos 5, is a crucial adaptation to the current reality where these models increasingly dominate benchmarks, acknowledging a shift that many in the field have observed; it also echoes concerns raised in articles like Anthropic's new model Fable will silently handicap work on LLMs [Anthropic's new model Fable will silently handicap work on LLMs], where limitations and proprietary control are evolving factors in the landscape.

The brilliance of PwC lies not just in its aggregation, but in its transparency. The scatter plots and tables provide a readily digestible visual representation of model performance, enabling researchers and practitioners to quickly identify leading approaches and understand their relative strengths. The option to disable closed-source model evaluations caters to those prioritizing open-source research, allowing for a focused view of the community-driven advancements. The system’s flexibility to accept submissions from various sources, beyond just arXiv, further broadens its scope and utility. The playful nomenclature of "papers without code," applied to closed-source entries, is a clever way to acknowledge the current paradigm while maintaining a lighthearted and approachable tone. This contrasts with some of the more intense debates surrounding reviewer distributions in academic conferences, as highlighted in ACL ARR May 2026 Reviewer paper distributions [ACL ARR May 2026 Reviewer paper distributions], suggesting a move towards more accessible and practical knowledge sharing.

The broader significance of PwC extends beyond simply tracking SOTA. It fosters a more collaborative and efficient research ecosystem. By providing a centralized, up-to-date resource, PwC empowers researchers to build upon existing work, identify gaps in knowledge, and accelerate the pace of innovation. The platform’s ability to showcase evaluations alongside papers—regardless of their source—promotes a more holistic understanding of model capabilities and limitations. Furthermore, the automatic parsing and leaderboard generation reduces the burden on individual researchers to manually curate and update this information, freeing them to focus on their core research activities. This democratization of knowledge is essential for fostering broader participation and accelerating progress across the AI field.

Looking ahead, the success of PwC will depend on its ability to maintain accuracy, comprehensiveness, and user engagement. The community's feedback, as Niels explicitly invites, will be crucial in shaping its future development. A key question to watch is whether PwC can evolve to incorporate more nuanced evaluation metrics and address the complexities of evaluating models across diverse tasks and datasets. Can the platform effectively represent the trade-offs between different models—considering factors like computational cost, data requirements, and ethical implications—to provide a truly comprehensive picture of the AI landscape? The ongoing evolution of PwC promises to be a fascinating development, and one that will significantly influence the direction of AI research and development.

Introducing Papers Without Code [P]

Hi, Niels here from the open-source team at Hugging Face.

I've recently relaunched paperswithcode.co as a source for finding the state of the art (SOTA) across various AI domains, from 3D generation to AI agents. This is done by automatically parsing research papers published on arXiv/Hugging Face, enabling leaderboards to be created. See BrowseComp below as an example (a scatter plot and a table are available for each benchmark).

- Scatter plot (you can hover over the dots to see the models):

https://preview.redd.it/9rz2r3ffcf6h1.png?width=2880&format=png&auto=webp&s=b3f8e7a870802f6ef8227ecc0619e9e1057554b0

- Table:

https://preview.redd.it/qoqriddw5f6h1.png?width=2862&format=png&auto=webp&s=a0034574f693847537037013672fb61daf27b16e

As you can see, I've added support for viewing evals for closed-source models, too, given that many benchmarks are nowadays dominated by them, like GPT-5.5 and Mythos 5. You can always disable viewing closed-source evals with a toggle or in your PwC settings:

https://preview.redd.it/p3k6jt6q6f6h1.png?width=1582&format=png&auto=webp&s=40149e51d6b326a77e53e33baf70d9850b3de365

When you turn them off, here's what the open model leaderboard looks like:

https://preview.redd.it/tg42sin36f6h1.png?width=2838&format=png&auto=webp&s=1330a117ae9b4e0ce6d459493ae9e8f64107310a

Closed-source papers are treated as regular "papers", although they can be any source, like a blog post (given that PwC supports submitting any source beyond arXiv). See the GPT-5.5 or Mythos 5 papers as examples, with their evals at the bottom. Notice the "closed" tag on their evals. Hence, you could jokingly call these "papers without code".

Let me know what you think of this, and whether anything needs to be changed or added!

Kind regards,
Niels

submitted by /u/NielsRogge
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Related Articles

Tagged with

#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#no-code spreadsheet solutions#rows.com#financial modeling with spreadsheets#AI formula generation techniques#AI#Papers with Code (PwC)#arXiv#State of the Art (SOTA)#Leaderboards#Benchmarks#3D Generation#AI Agents#Closed-source Models#Open-source Models#GPT-5.5#Mythos 5#Evals