NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]
Our take
![NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]](https://preview.redd.it/pm2xbooyxn2h1.png?width=140&height=78&auto=webp&s=30ec102187e4eb4582c79f124c41b8a3c129d1fd)
The recent release of NuExtract3 marks a significant step forward in the realm of information extraction. Developed by Numind and based on the Qwen3.5-4B architecture, this open-weight model aims to simplify the process of extracting structured data from complex documents—ranging from PDFs and invoices to screenshots and multi-page forms. As the landscape of data management evolves, tools like NuExtract3 will play a crucial role in helping users navigate the complexities of document processing, particularly in environments where traditional tools may fall short. This innovation resonates with ongoing discussions in our field, such as the challenges posed by ambiguous user intent in workflows, as discussed in [One thing that's been bothering me lately: benchmark performance often tells me almost nothing about whether a workflow will survive production usage.[D]](/post/one-thing-that-s-been-bothering-me-lately-benchmark-performa-cmpgve7tt0b3vs0glwe5ptmcd).
What sets NuExtract3 apart is its accessibility and versatility. The model is designed to handle both text and visual inputs seamlessly, making it a robust alternative for organizations that rely on diverse document types. By enabling the conversion of document images to Markdown and extracting structured data through a target JSON template, NuExtract3 empowers users to streamline their workflows without the need for extensive technical expertise. This aligns with the broader movement towards democratizing AI technologies, allowing businesses of all sizes to leverage advanced capabilities without the prohibitive costs often associated with proprietary solutions. It also reminds us of the importance of user-centered design in AI development, as outlined in Presentation: AI Native Engineering.
The implications of NuExtract3 extend beyond its immediate functionality; they also signal a shift in how we think about data extraction and management. With the model being open-weight and self-hostable, it invites experimentation and customization, catering to the needs of diverse users, from innovators in startups to established enterprises. The extensive documentation and the provision of multiple quantization options ensure that a wide range of hardware can support its deployment. This could lead to a paradigm shift where organizations prioritize solutions that are adaptable and tailored to their specific processes, rather than relying on one-size-fits-all models.
As we look ahead, the introduction of models like NuExtract3 prompts us to consider the future of document processing in a rapidly evolving technological landscape. The ability to extract valuable insights from visually structured data will be pivotal as organizations continue to generate and manage vast quantities of information. Moreover, the call for feedback and community engagement from the Numind team highlights the collaborative spirit necessary for innovation in this space. It raises an important question: how will the community respond to such open models, and what new applications will emerge as a result?
In conclusion, NuExtract3's release is more than just a technical advancement; it represents a shifting tide in how we approach data extraction and management. By making complex document processing more accessible and efficient, it invites users to explore new possibilities for productivity and innovation. As we observe the developments that follow, it will be fascinating to see how the model is adopted and adapted across various industries, shaping the future of data management in meaningful ways.
| Disclaimer: I work for Numind, the company behind this open-weight model We just released a 4B model based on Qwen3.5-4B, under Apache-2.0 license. The goal is to make information extraction from complex documents more practical with an open model: PDFs, screenshots, forms, tables, receipts, invoices, multi-page documents, and other visually structured inputs. Try it, we have a huggingface space that is completely free (you don't even have to sign-up): https://huggingface.co/spaces/numind/NuExtract3 If you ever used NuMarkdown, NuExtract3 is the successor. There are some examples to guide you. Feel free to re-use this model for any task. A few things it is designed for:
It was trained on a node of 8xH100 for 3 days to train on as much context as we could, so it should perform fairly well even on long document. For Markdown, we'd still recommend going page by page for the best results and inference speed, since you can parallelize better this way. It's very easy to self-host, since we provide fairly extensive documentation, Safetensors, GGUF and MLX weights. With as little as 4GB of VRAM, you should be good to go. We provide multiple quantizations (GPTQ, W8A8, FP8, Q4, Q6...) so you should be able to run it anywhere. We mostly tried vLLM, SGLang, llama.cpp. We have a blog post and a pretty decent model card:
I'm currently writing a paper on this model so I'll post it as soon as it's accepted. It's not yet on Arxiv yet as it has been submitted in a peer-review journal/conference. I'll try to answer as many questions as possible if you have any. We would really appreciate feedback from the community. We also have a discord if you're interested [link] [comments] |
Read on the original site
Open the publisher's page for the full experience