Introducing Papers Without Code [P]
Our take
![Introducing Papers Without Code [P]](https://preview.redd.it/9rz2r3ffcf6h1.png?width=140&height=73&auto=webp&s=5b975131682d6d537cdad294296d3173ba6e334a)
The recent relaunch of Papers With Code (PwC) by Hugging Face’s open-source team, spearheaded by Niels Rogge, represents a significant shift in how the AI community tracks and understands state-of-the-art (SOTA) performance. Previously, navigating the rapidly evolving landscape of AI research, particularly in areas like 3D generation and AI agents, could be a fragmented and time-consuming process. PwC centralizes this information, automatically parsing research papers from arXiv and Hugging Face to generate dynamic leaderboards. This is particularly relevant given the ongoing discussions surrounding the impact of post-doctoral opportunities in ML [Post-docs in ML], and the wider challenges of keeping abreast of new developments. The inclusion of evaluations for closed-source models, like GPT-5.5 and Mythos 5, is a crucial adaptation to the current reality where these models increasingly dominate benchmarks, acknowledging a shift that many in the field have observed; it also echoes concerns raised in articles like Anthropic's new model Fable will silently handicap work on LLMs [Anthropic's new model Fable will silently handicap work on LLMs], where limitations and proprietary control are evolving factors in the landscape.
The brilliance of PwC lies not just in its aggregation, but in its transparency. The scatter plots and tables provide a readily digestible visual representation of model performance, enabling researchers and practitioners to quickly identify leading approaches and understand their relative strengths. The option to disable closed-source model evaluations caters to those prioritizing open-source research, allowing for a focused view of the community-driven advancements. The system’s flexibility to accept submissions from various sources, beyond just arXiv, further broadens its scope and utility. The playful nomenclature of "papers without code," applied to closed-source entries, is a clever way to acknowledge the current paradigm while maintaining a lighthearted and approachable tone. This contrasts with some of the more intense debates surrounding reviewer distributions in academic conferences, as highlighted in ACL ARR May 2026 Reviewer paper distributions [ACL ARR May 2026 Reviewer paper distributions], suggesting a move towards more accessible and practical knowledge sharing.
The broader significance of PwC extends beyond simply tracking SOTA. It fosters a more collaborative and efficient research ecosystem. By providing a centralized, up-to-date resource, PwC empowers researchers to build upon existing work, identify gaps in knowledge, and accelerate the pace of innovation. The platform’s ability to showcase evaluations alongside papers—regardless of their source—promotes a more holistic understanding of model capabilities and limitations. Furthermore, the automatic parsing and leaderboard generation reduces the burden on individual researchers to manually curate and update this information, freeing them to focus on their core research activities. This democratization of knowledge is essential for fostering broader participation and accelerating progress across the AI field.
Looking ahead, the success of PwC will depend on its ability to maintain accuracy, comprehensiveness, and user engagement. The community's feedback, as Niels explicitly invites, will be crucial in shaping its future development. A key question to watch is whether PwC can evolve to incorporate more nuanced evaluation metrics and address the complexities of evaluating models across diverse tasks and datasets. Can the platform effectively represent the trade-offs between different models—considering factors like computational cost, data requirements, and ethical implications—to provide a truly comprehensive picture of the AI landscape? The ongoing evolution of PwC promises to be a fascinating development, and one that will significantly influence the direction of AI research and development.
| Hi, Niels here from the open-source team at Hugging Face. I've recently relaunched paperswithcode.co as a source for finding the state of the art (SOTA) across various AI domains, from 3D generation to AI agents. This is done by automatically parsing research papers published on arXiv/Hugging Face, enabling leaderboards to be created. See BrowseComp below as an example (a scatter plot and a table are available for each benchmark). - Scatter plot (you can hover over the dots to see the models): - Table: As you can see, I've added support for viewing evals for closed-source models, too, given that many benchmarks are nowadays dominated by them, like GPT-5.5 and Mythos 5. You can always disable viewing closed-source evals with a toggle or in your PwC settings: When you turn them off, here's what the open model leaderboard looks like: Closed-source papers are treated as regular "papers", although they can be any source, like a blog post (given that PwC supports submitting any source beyond arXiv). See the GPT-5.5 or Mythos 5 papers as examples, with their evals at the bottom. Notice the "closed" tag on their evals. Hence, you could jokingly call these "papers without code". Let me know what you think of this, and whether anything needs to be changed or added! Kind regards, [link] [comments] |
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Reviving PapersWithCode (by Hugging Face) [P]Hi, Niels here from the open-source team at Hugging Face. Like many others, I was a huge fan of paperswithcode. Sadly, that website is no longer maintained after its acquisition by Meta. Hence, I've been working on reviving it. I obviously use AI agents to parse papers at scale and automatically generate leaderboards (for now I'm the one verifying results). So far, I've only parsed high-impact papers for which I know they're SOTA, like Qwen 3.5 and 3.6, RF-DETR for object detection, DINOv3, SOTA embedding models from the MTEB leaderboard, the Open ASR Leaderboard for automatic speech recognition models, etc. For now, it includes the following: trending papers by default based on Github star velocity categorization by domain, e.g., OCR methods, which PwC used to have, e.g., RLVR eval results for high-impact papers, see e.g., Qwen 3.5 at the bottom leaderboards for each domain, e.g., MMTEB or COCO val 2017 support for citation counts (you can also see the most cited papers by domain!) automated linked Github, project page URLs, and artifacts (+ multiple repos are supported on a paper page) support for external papers beyond Arxiv, see e.g., DeepSeek v4 Harness reports for coding agent benchmarks, e.g., Terminal Bench "Sign in with HF" and Storage Buckets are used to store humbnails, paper PDFs, and overall data backups. I'm curious about your feedback + feature requests! Try it at paperswithcode.co https://preview.redd.it/whwji560fw1h1.png?width=3452&format=png&auto=webp&s=55bb7a30c1be58d140f7efcb07a31c6dac5693c7 See e.g. the SOTA leaderboard for Terminal Bench 2.0: https://preview.redd.it/98w9pi89fw1h1.png?width=3456&format=png&auto=webp&s=408fb64b0ba85ba24f55daa81d547d7c68e73951 A paper page looks like this: https://paperswithcode.co/paper/2602.15763 https://preview.redd.it/fiizit6dfw1h1.png?width=3450&format=png&auto=webp&s=9ea05a77ca5583a2fb395dccc95ba52c433362c5 submitted by /u/NielsRogge [link] [comments]
- PapersWithCode new features - week 1 [P]Hi, Niels here from the open-source team at Hugging Face. It's been one week since I launched paperswithcode.co, a revival of the website we all loved. It allows us to keep track of the state-of-the-art (SOTA) across various domains of AI, from agents to computer vision and time-series forecasting. The reception has been great, and I'm excited to extend this over the next few months. This week, I've added the following features: - Support for multiple metrics for a given benchmark: leaderboards now support multiple metrics, see e.g., the Open ASR Leaderboard for automatic speech recognition, which supports both Word Error Rate (WER) and the Inverse Real-Time Factor (RTFx) metrics, or the Object Detection leaderboard, which now also reports frames-per-second (FPS) besides mean average precision (mAP) on COCO. https://preview.redd.it/owlxn0b5u23h1.png?width=2878&format=png&auto=webp&s=1dff2f8feab4f160f77c97ceeb5d90e82382e63c - Support for external papers: We do support submitting papers beyond Arxiv, such as a Github repo, a blog post, BiorXiv, and more. You can submit a paper at paperswithcode.co/submit. AI will automatically enrich it with task and method tags, the GitHub repo, evals, and more. See e.g. DeepSeek-v4 below, which is not on Arxiv: https://preview.redd.it/uogbt0fjw23h1.png?width=2928&format=png&auto=webp&s=8b81e48af69b8935ddeb569d882d866b3e9ba216 - Support for paper lineage: whenever a paper has a follow-up or predecessor, this will be displayed with a small banner above the abstract. See e.g. Mamba-3, DINOv2 and GLM-4.5. https://preview.redd.it/f6vgtd1du23h1.png?width=2228&format=png&auto=webp&s=f8627f7669405f1766eecfd3322e925e15b4806d - New methods: support for new methods based on popularity, including Gated DeltaNet, Kimi Delta Attention, Mamba-2, and more. Each method also lists all papers that cite it. Find all supported methods here. https://preview.redd.it/6pzagifvu23h1.png?width=2984&format=png&auto=webp&s=400efdc9677d1fbd369eedf684e622dd8c807973 - Support for screenshotting a leaderboard for easy sharing on social media: each benchmark now includes a "copy image" button both on the scatter plot and table, which can be shared on social media. Try it on ClawEval, for example. https://preview.redd.it/w7y7t7xnw23h1.png?width=2950&format=png&auto=webp&s=cb70ad91c6ba075e49b743d6e34f157d22266f04 - Added many more evals: we are adding evals gradually, starting with all models supported in the Transformers library. So far, we have about 3k evals! Find them at the bottom of each paper page, e.g. Qwen 3.6. https://preview.redd.it/zao056s9x23h1.png?width=2218&format=png&auto=webp&s=540d87f473be05cb6f9c0aca88afa74fd4373e15 Happy to hear more feature requests and feedback! I will also launch a channel on the Hugging Face Discord server for easier communication. You can also chime in on the GitHub thread here. Cheers, Niels submitted by /u/NielsRogge [link] [comments]