Why ML conference reviews sometimes feel like a “lottery“ [D]
Our take
The perception that machine learning conference reviews resemble a “lottery” often stems from the challenges of evaluating submissions. While strong papers typically receive acceptance, many submissions fall into a gray area where quality varies significantly. With increasing numbers of submissions, reviewers face pressure, leading to inconsistencies in evaluations. Factors like clarity, framing, and the subjective preferences of reviewers can heavily influence outcomes.
The notion that “ML conferences are a lottery” has sparked considerable debate in the machine learning community, reflecting the complexities of paper submissions and reviews. The sentiment, articulated by a user in a recent discussion, captures a nuanced truth: while strong papers typically secure acceptance and weak ones are filtered out, a significant number of submissions reside in a gray area where quality assessments can feel arbitrary. As noted in the commentary, this perceived randomness often arises from the sheer volume of submissions and the limitations placed on reviewers, leading to a situation that feels more like chance than a meritocratic process. This has broader implications for how research is conducted and valued within the field.
In reviewing the submission process, it becomes evident that the challenge lies predominantly in the middle tier of quality. Many papers are indeed good, but not necessarily groundbreaking or easily digestible for reviewers who are already stretched thin by the increasing number of submissions. This situation resonates with discussions from related articles, such as ICML final decisions rant, where the sheer volume of accepted and rejected papers reflects the competitive nature of modern conferences. It highlights the fact that while a significant number of researchers are producing valuable work, the evaluation process may not adequately capture that value due to subjective interpretations and varying standards among reviewers.
Moreover, the dynamics of this “lottery” are exacerbated by the characteristics of the submitting authors. Researchers affiliated with strong institutions often navigate this landscape more successfully, as they tend to produce clearer, more compelling submissions that effectively communicate their contributions. This advantage raises questions about equity in the research community. It suggests that those who have access to better resources, mentorship, and collaboration might experience a smoother path through the review process. As discussed in the article Are modern ML PhDs becoming too incremental, or is this just what research looks like now?, there’s a growing need to consider how institutional biases shape research visibility and impact, creating disparities that could stifle innovation from less established researchers.
Ultimately, the challenge of navigating the review process speaks to a larger issue within the machine learning field: the necessity of a more transparent and equitable evaluation system. As we look to the future, it becomes essential for conferences to refine their review processes, perhaps by fostering greater reviewer consistency and providing clearer guidelines on what constitutes a significant contribution. This will not only enhance the integrity of the review process but also empower a wider array of voices in the academic conversation.
As we ponder these dynamics, one question emerges: how can the community balance the need for rigorous evaluation with the imperative to support diverse and innovative research? The answer may lie in reimagining the structures that govern our conferences, ensuring they truly reflect the breadth of talent and ideas that exist within the field. As the landscape of machine learning continues to evolve, it will be crucial to watch how these discussions unfold and what measures are implemented to address the inherent challenges in the submission and review process.
I’ve been trying to make sense of all the “ML conferences are a lottery” takes, and honestly I think it’s both true and not true depending on what you mean.
If a paper is clearly strong, like genuinely solid contribution, well executed, easy to understand, it usually gets in. And if it’s clearly weak, it usually gets filtered out. The weirdness people complain about mostly lives in the huge middle where papers are good but not undeniable.
That’s also where scale starts to matter. There are just so many submissions now that reviewers are stretched thin, matching isn’t perfect, and everyone has slightly different standards or taste. Add tight timelines and limited back-and-forth, and small things start to matter a lot. Whether a reviewer really “gets” your contribution, how clearly you framed it, or even just how it lands with that particular set of reviewers can swing the outcome.
I think that’s why it feels random. Not because the whole system is broken, but because a big chunk of papers are sitting right near the decision boundary, and decisions there are naturally high-variance.
People often from strong research groups don’t experience this. It’s more that they’re better at pushing their papers out of that borderline zone. Cleaner writing, stronger positioning, more predictable execution. So a larger fraction of their work is clearly above the bar.
So my current take is: it’s not a lottery overall, but it absolutely behaves like one near the cutoff, and that’s where most of the frustration comes from.
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- ICML final decisions rant [D]So, ICML accepted ~6.5K of ~24K; obviously, it doesn't mean that all the rejected papers are "bad," and these rejected papers would cascade to NeurIPS, blowing up NeurIPS' total submission count, and this cycle of massive-influx-small-acceptance would repeat on an endless loop. The reviews themselves can be frustratingly inadequate: - "Only 200 benchmarks included; not included didn't-do-this-benchmark" (exaggerated for dramatic effect, sadly not unrealistic) or - "I don't think this paper, that works, is 'novel'" [out of gut feeling?] or - ACs reiterating the exact same points in the initial reviews without reading the rebuttal discussions. (Or at least, it'd seem that way) On top of all this, (from Reddit threads,) it appears that reviewers raising their score need to perform additional tasks of justifying why they're raising their scores -- which seems like a negative reinforcement signal. Also, it's crazy how people can think of an idea, run all experiments, write a coherent acceptance-ready paper, all over the weekend!!! -- isn't the whole point of research is to sit and simmer with the problem? Not sure what the future of conference publishing/reviewing is... it just feels unproductive. Anyway, just wanted to rant before looping into NeurIPS deadline, for yet another possible rejection. Isn't the whole point of publishing to understand long-standing problems? -- rejection nowadays means nothing. [Neither does acceptance?] Have a good weekend, y'all. submitted by /u/CategoryNormal149 [link] [comments]
- Are modern ML PhDs becoming too incremental, or is this just what research looks like now? [D]I’ve been thinking about the current state of machine learning PhDs, including my own work, and I’d like to hear how others see it. My impression is that a large fraction of modern ML PhD work follows a fairly predictable pattern: take an existing idea, connect it to another existing idea, apply it in a slightly different setting or community, tune the system carefully, add some benchmark results, and present the method as a new state-of-the-art approach. Another common pattern is mostly empirical: run benchmarks, report observations, provide some analysis, and frame that as the main contribution. To be clear, I’m not saying this work is useless. Incremental progress matters, and not every PhD needs to invent a new paradigm. But sometimes it feels like many ML PhDs are closer to extended master’s theses: more experiments, more compute, more polished writing, and more benchmarks, but not necessarily a deeper scientific contribution. What bothers me is that the same pattern appears even in top-tier conference papers. A paper may look strong because it has a clean story, a benchmark win, and good presentation, but after removing the “SOTA” claim, it is not always clear what lasting knowledge remains. Did we learn something general? Did we understand a mechanism better? Did we identify a failure mode? Did we create a reusable method or evaluation protocol? Or did we mostly produce another temporary leaderboard improvement? I’m also reflecting this back onto my own PhD. I see some of the same patterns in my work, so this is not meant as an attack on others. It is more of a concern about the incentives of the field. ML seems to reward publishable deltas: small method variations, new combinations, benchmark improvements, and convincing empirical stories. But I’m less sure whether it consistently rewards deeper understanding. So my question is: Have ML PhDs become lower-quality compared to PhDs in other fields, or is this simply the normal shape of cumulative research in a fast-moving empirical field? And maybe more importantly: What separates a genuinely strong incremental ML PhD from one that is basically a collection of polished benchmark papers? submitted by /u/Hope999991 [link] [comments]