2 min readfrom Machine Learning

Find the best open-source OCR models in one place at Papers with Code [P]

Our take

Navigating the rapidly expanding landscape of open-source Optical Character Recognition (OCR) models can be challenging. Papers with Code now offers a centralized resource consolidating key OCR benchmarks and top-performing models, including recent releases from Baidu (Unlimited OCR) and Mistral (OCR 4). Discover leading benchmarks like OlmOCRBench and OmniDocBench, alongside recommendations for Chandra OCR 2 and Mistral OCR v4—critical tools for digitizing documents and enabling agentic use cases like retrieval-augmented generation. Explore the full overview here.

The recent surge in open-source Optical Character Recognition (OCR) models, neatly cataloged by Niels Rogge on Papers with Code, highlights a critical shift in how organizations are approaching data ingestion and agentic workflows. As token-based billing for large language models continues to drive cost optimization, the ability to efficiently and accurately digitize previously inaccessible data—think scanned documents, PDFs, and legacy archives—becomes paramount. [Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost] demonstrates the ongoing search for efficiency in this space, and OCR plays a crucial role in unlocking that potential. The ability to transform these formats into standardized, machine-readable Markdown, as the original post notes, is directly enabling the rise of AI agents, empowering them with the knowledge they need for tasks like retrieval-augmented generation (RAG) and intelligent customer support. The revival of Papers with Code itself is a welcome development, providing a centralized and invaluable resource for navigating this rapidly evolving landscape.

The proliferation of OCR models on Hugging Face, while offering a wealth of options, also presents a challenge: discerning which model best suits a given application. Rogge’s curated list, showcasing benchmarks like OlmOCRBench and OmniDocBench, alongside top-performing models like Chandra OCR 2 and Mistral OCR v4, addresses this directly. The availability of Chandra OCR 2 for both self-hosting and serverless API access is particularly noteworthy, offering flexibility for organizations with varying infrastructure capabilities. This echoes the spirit of projects like Kuma: compiling PyTorch models into self-contained WebGPU executables [P], which prioritizes accessibility and deployment ease. The inclusion of Baidu’s Unlimited OCR, with its innovative Reference Sliding Window Attention (R-SWA), signals the continued refinement of core OCR techniques and a focus on improving accuracy and efficiency, particularly with complex document layouts.

The significance of this development extends beyond simply improving chatbot performance. Accurate OCR forms the foundation for a wide range of data-driven applications, from automating invoice processing and contract analysis to extracting insights from historical records. The open-source nature of these models democratizes access to this technology, allowing smaller organizations and researchers to leverage the power of AI without the prohibitive costs associated with proprietary solutions. It also fosters a collaborative environment where developers can build upon existing models, leading to further innovation and improvements. The increasing focus on benchmarks—and the community’s desire to see more—points to a maturing field where rigorous evaluation and comparison are becoming standard practice.

Looking ahead, it will be fascinating to observe how these open-source OCR models evolve to handle increasingly complex and diverse document types, including those with low-resolution images, handwritten text, and multiple languages. The interplay between model architecture, training data, and benchmark design will continue to shape the future of OCR. Will we see a convergence around a few dominant models, or will the open-source ecosystem continue to thrive with a diverse range of specialized solutions? And, crucially, how will the integration of OCR with other AI technologies, such as natural language processing and computer vision, unlock even more transformative capabilities?

Hi, I've created an overview of the most important OCR benchmarks, along with the top open models, and links to their paper and code: https://paperswithcode.co/tasks/ocr.

This week, new OCR models were released by Baidu and Mistral.

Baidu released Unlimited OCR, a 3B-parameter model that introduces a key innovation called Reference Sliding Window Attention (R-SWA) and builds on top of DeepSeek OCR. Mistral released OCR 4, which is available via an API.

OCR, or Optical-Character Recognition, is the task of digitizing PDFs or scanned documents. There's, of course, a huge interest in this task, as it enables ingestion of all company data for agentic use cases. AI agents love Markdown; it can be valuable to turn all those messy PDF documents into a standardized, machine-readable format. This enables use cases like agentic RAG (retrieval-augmented generation), which powers chatbots, both internally and for external customer support.

With a large number of OCR releases on Hugging Face over the last few months, it may be hard to know which one to use.

Hence, I've built this page, which lists the major OCR benchmarks, along with the top-performing models and links to their code. This is obviously made available on Papers with Code, the website I'm maintaining (it's a revival of the old website, which was taken down).

The top recommended benchmarks are OlmOCRBench, created by Ai2, and OmniDocBench, created by Shanghai AI Laboratory.

Current top recommendations are Chandra OCR 2 by Datalab and Mistral OCR v4. The former is openly available, hence you can either self-host it or use their serverless API.

Let me know which other tasks you want to see major benchmarks for now!

Cheers,

Niels

open-source @ HF

submitted by /u/NielsRogge
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Tagged with

#financial modeling with spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#no-code spreadsheet solutions#natural language processing for spreadsheets#spreadsheet API integration#rows.com#AI formula generation techniques#big data management in spreadsheets#self-service analytics tools#machine learning in spreadsheet applications#conversational data analysis#large dataset processing#real-time data collaboration#intelligent data visualization#data visualization tools#enterprise data management#big data performance#self-service analytics#data analysis tools