Hiding messages in the least significant mantissa bits of fine-tuned ONNX model weights [P]
Our take
![Hiding messages in the least significant mantissa bits of fine-tuned ONNX model weights [P]](https://external-preview.redd.it/xL20TWLoDXtutsGuMHS1qdEyNEn6zkliHGGNaYV1H4A.png?width=640&crop=smart&auto=webp&s=2b035893689551e28412391b858ab5c0323b052d)
The recent project "ONNXStego," detailed by u/Admin-ABC-XYZ on Reddit, presents a fascinating, albeit niche, exploration of steganography within machine learning models, specifically leveraging the ONNX format. The core concept – embedding data within the least significant bits of fine-tuned model weights – is elegantly practical. The author’s journey, documented with commendable transparency, reveals a process of iterative refinement, moving from initially simplistic approaches to a solution that cleverly exploits the natural modifications inherent in fine-tuning. This contrasts sharply with earlier, more detectable methods, such as directly writing into random weights or relying on deterministic coordinate maps, as described in the post. It's encouraging to see this kind of methodical exploration, particularly given the broader discussions around data security and model integrity happening within the AI community, as exemplified by the ongoing efforts to evaluate long-term memory limits in LLM chatbots [Evaluating long-term memory limits in stateless LLM chatbots — feedback needed]. The project’s focus on ONNX, a widely adopted format for model deployment, further enhances its practical value.
What makes "ONNXStego" particularly noteworthy is its awareness of existing research and the gaps it aims to fill. The author acknowledges that similar concepts have been explored academically, but highlights a lack of readily available, well-documented implementations, especially those specifically targeting ONNX models. This aligns with a broader trend in the field, where academic research doesn’t always translate seamlessly into practical, accessible tools. The project’s genesis— stemming from a need within a larger, undisclosed project—is a relatable scenario for many researchers and developers. It's reminiscent of the challenges faced in building specialized pipelines for tasks like translation and voice processing for low-resource languages [NagaTranslate: Building a translation and voice pipeline for low-resource Nagaland creoles (Whisper, VITS, LLMs)]. The author’s candor about their own evolving understanding of cryptography and steganography is also valuable, fostering a sense of collaborative learning within the machine learning community. This self-reflective approach is crucial for driving innovation, especially in areas where expertise is still developing.
The technical ingenuity of hiding data within modifications made during fine-tuning—essentially using the training process itself as a camouflage—is compelling. The author rightly points out that this approach avoids the suspicion that might arise from simply injecting foreign data into a model. While the project is currently considered “closed,” the documentation and security considerations provided within the repository are substantial contributions. The author’s willingness to share the project and solicit feedback underscores a commitment to open science and collaborative improvement. The very nature of this endeavor – subtly concealing information within the complex mathematical structures of neural networks – speaks to a deeper need for secure and resilient AI systems, a concern that is increasingly relevant given the growing reliance on AI in sensitive applications. It’s important to note that while the author’s initial attempts at a coordinate-based system were ultimately abandoned due to their detectability, the rigorous analysis of those failed approaches provides valuable insights for future investigations.
Looking ahead, the potential implications of this work extend beyond simple steganography. Could this type of technique be adapted for model watermarking, allowing for the verification of model provenance and combating malicious modifications? Furthermore, the techniques employed here could inspire new methods for adversarial attacks, where attackers subtly alter model weights to induce specific behaviors. The exploration of symbolic math and reasoning within LLMs [MathFormer: Testing whether symbolic math is pattern matching or reasoning] demonstrates a similar drive to understand the underlying mechanics of AI models, and "ONNXStego" contributes to this broader effort. The key question now is whether we'll see further development of this approach, potentially incorporating more sophisticated cryptographic techniques to enhance security and expand the capacity for hidden data.
| Hey everyone, I'd like to share my project along with a short explanation of the process and why it came about in the first place. To start off, I'm not exactly the best at cryptography/steganography, in my case it's always been something that sat in the background, as one of the sub-fields needed for another (main) field I'm actually interested in. For this project I tried to look up as much information as possible about what's currently considered best practice (I mainly relied on NIST for this), what implications exist, and what potential "attacks" exist against this way of hiding information, but I honestly can't say whether I covered everything, which is why I wanted to share this project here, mainly for the sake of learning. I'd be grateful for any feedback on what I could have done better / what I might have missed, etc. Right now, I consider this project closed at this point and will most likely not update it further, although I'd like to apply all the feedback to my own knowledge going forward. For over a month I did a lot of research into using ML models as a carrier for hiding data. I needed this as one of the stages for my main project. That's how I ended up on the topic of hiding information in model weights. Initially I assumed a simple method of directly writing data into randomly selected weights. I quickly concluded, though, that this would be absurdly trivial to detect, and potentially also to read. Next came the idea of using something like a deterministic coordinate map describing where to read the data from (location-id + position-id). The program wouldn't modify all the bits needed to write the message instead, it would write separate bits representing already-existing values (pointing to specific locations in the model) from which the existing 0s and 1s would need to be read. In practice, only parties A and B would know how to derive these positions. This way, someone unaware of the algorithm would only see what looks like noise of varying values. However, after a theoretical analysis of a practical implementation, this idea had serious flaws. Even setting aside the fact that the main goal was steganography and not encryption, the mere presence of additional data could be relatively easily detected, for instance through delta analysis against a reference model, or through analysis of the statistical properties of the weights. On top of that, this method would really only allow transmitting a very small amount of data, because just indicating, say, the word "example" would look like this: "01100101011110000110000101101101011100000110110001100101", so it would be extremely impractical. In other words, even if the hidden message itself couldn't be read, one could still suspect that the model contains hidden information, which would defeat the whole point of steganography. While I found the previous option conceptually pretty interesting, I moved on, which led me to the question: "How do I hide data in the weights in a way that won't be visible?" That led me to the next idea: since every fine-tuning process naturally changes some of a model's weights anyway, why not hide information only in the weights that get modified during training regardless? In that case, the fine-tuning itself would provide a natural and logical explanation for the presence of those changes, including when compared against a reference model. It was only later that I found out that similar/identical concepts had already been described in the scientific literature, although they remain a fairly niche research direction. Skipping over the implementation details (since everything is described in the README and SECURITY files, and I don't want to dump even bigger wall of text here), this is how the first implementation of the solution (part of my main project) came about. After further research I noticed that most existing publications focus on the academic side, while the available GitHub repositories were often poorly documented, limited in functionality, good steganographically but weak cryptographically, or were just a small piece of larger projects. Personally, I couldn't find any project implementing a similar idea specifically using models saved in the ONNX format. So I decided to split this part off and refine it as a separate proof of concept, and that's how ONNXStego came about. If anyone's interested in the security, limitations, or implementation details, feel free to check out the repository. I personally learned a great deal from this project and tried to describe the final conclusions/information I gathered while learning as precisely as possible, so I'm hoping the project can also be useful to others for their own purposes or projects. (If this counts as self-promotion, I apologize in advance, and I can remove this post for that reason too if needed, I tried to describe the whole process behind it as accurately as I could, to make the post as educationally useful as possible). [link] [comments] |
Read on the original site
Open the publisher's page for the full experience