NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]

Our take

We're excited to announce the release of NuExtract3, an open-weight 4B model designed for efficient information extraction from complex documents. Based on Qwen3.5-4B and licensed under Apache-2.0, this model simplifies tasks like converting document images to Markdown and extracting structured data from tables and forms. With easy self-hosting options and robust documentation, you can start transforming your data workflows today. Try it for free on our Hugging Face space, and explore related insights in our blog post on AI-native engineering.

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]

The recent release of NuExtract3 marks a significant step forward in the realm of information extraction. Developed by Numind and based on the Qwen3.5-4B architecture, this open-weight model aims to simplify the process of extracting structured data from complex documents—ranging from PDFs and invoices to screenshots and multi-page forms. As the landscape of data management evolves, tools like NuExtract3 will play a crucial role in helping users navigate the complexities of document processing, particularly in environments where traditional tools may fall short. This innovation resonates with ongoing discussions in our field, such as the challenges posed by ambiguous user intent in workflows, as discussed in [One thing that's been bothering me lately: benchmark performance often tells me almost nothing about whether a workflow will survive production usage.[D]](/post/one-thing-that-s-been-bothering-me-lately-benchmark-performa-cmpgve7tt0b3vs0glwe5ptmcd).

What sets NuExtract3 apart is its accessibility and versatility. The model is designed to handle both text and visual inputs seamlessly, making it a robust alternative for organizations that rely on diverse document types. By enabling the conversion of document images to Markdown and extracting structured data through a target JSON template, NuExtract3 empowers users to streamline their workflows without the need for extensive technical expertise. This aligns with the broader movement towards democratizing AI technologies, allowing businesses of all sizes to leverage advanced capabilities without the prohibitive costs often associated with proprietary solutions. It also reminds us of the importance of user-centered design in AI development, as outlined in Presentation: AI Native Engineering.

The implications of NuExtract3 extend beyond its immediate functionality; they also signal a shift in how we think about data extraction and management. With the model being open-weight and self-hostable, it invites experimentation and customization, catering to the needs of diverse users, from innovators in startups to established enterprises. The extensive documentation and the provision of multiple quantization options ensure that a wide range of hardware can support its deployment. This could lead to a paradigm shift where organizations prioritize solutions that are adaptable and tailored to their specific processes, rather than relying on one-size-fits-all models.

As we look ahead, the introduction of models like NuExtract3 prompts us to consider the future of document processing in a rapidly evolving technological landscape. The ability to extract valuable insights from visually structured data will be pivotal as organizations continue to generate and manage vast quantities of information. Moreover, the call for feedback and community engagement from the Numind team highlights the collaborative spirit necessary for innovation in this space. It raises an important question: how will the community respond to such open models, and what new applications will emerge as a result?

In conclusion, NuExtract3's release is more than just a technical advancement; it represents a shifting tide in how we approach data extraction and management. By making complex document processing more accessible and efficient, it invites users to explore new possibilities for productivity and innovation. As we observe the developments that follow, it will be fascinating to see how the model is adopted and adapted across various industries, shaping the future of data management in meaningful ways.

Disclaimer: I work for Numind, the company behind this open-weight model

We just released a 4B model based on Qwen3.5-4B, under Apache-2.0 license. The goal is to make information extraction from complex documents more practical with an open model: PDFs, screenshots, forms, tables, receipts, invoices, multi-page documents, and other visually structured inputs.

Try it, we have a huggingface space that is completely free (you don't even have to sign-up): https://huggingface.co/spaces/numind/NuExtract3

If you ever used NuMarkdown, NuExtract3 is the successor.

There are some examples to guide you. Feel free to re-use this model for any task.

https://preview.redd.it/pm2xbooyxn2h1.png?width=1672&format=png&auto=webp&s=1a8a7b262190c8325159496dae98c3d2dfab493c

https://preview.redd.it/b5z7ylfzxn2h1.png?width=1758&format=png&auto=webp&s=a07b3abd6e5065c2635de047bdf154357f903e4c

A few things it is designed for:

converting document images to Markdown
extracting structured data from documents using a target json template
handling tables, forms, and layout-heavy pages
working with both text and visual document inputs
serving as a local/open-weight alternative for document extraction pipelines

It was trained on a node of 8xH100 for 3 days to train on as much context as we could, so it should perform fairly well even on long document. For Markdown, we'd still recommend going page by page for the best results and inference speed, since you can parallelize better this way.

It's very easy to self-host, since we provide fairly extensive documentation, Safetensors, GGUF and MLX weights. With as little as 4GB of VRAM, you should be good to go. We provide multiple quantizations (GPTQ, W8A8, FP8, Q4, Q6...) so you should be able to run it anywhere.

We mostly tried vLLM, SGLang, llama.cpp.

We have a blog post and a pretty decent model card:

I'm currently writing a paper on this model so I'll post it as soon as it's accepted. It's not yet on Arxiv yet as it has been submitted in a peer-review journal/conference.

I'll try to answer as many questions as possible if you have any. We would really appreciate feedback from the community.

We also have a discord if you're interested
https://discord.com/invite/3tsEtJNCDe

submitted by /u/Gailenstorm
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article →