DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]

Our take

Explore the groundbreaking implementation of a DCGAN inference engine on the CH32H417, a dual-core RISC-V microcontroller. With 12.6M parameters and only 512KB of SRAM, this innovative project generates 64x64 cat faces in 26 seconds, limited by SD card access speed. The model, using int8 per channel quantization, streams layer weights from an SD card while storing intermediate activations in DTCM.

The recent endeavor to run a Deep Convolutional Generative Adversarial Network (DCGAN) on a dual-core RISC-V microcontroller, specifically the CH32H417, marks an intriguing intersection of machine learning and low-cost hardware. With a model boasting 12.6 million parameters and a unique implementation in pure C, this project stands out in a landscape where ARM's CMSIS NN ecosystem has been the dominant player in low-power embedded inference. The approach taken here not only showcases the potential for generative models in unconventional environments but also highlights the growing importance of open-source hardware platforms in the AI landscape. This innovation resonates with discussions around the challenges of automating tasks and the saturation of AI inference platforms, as seen in articles like [Is AI inference platform really that saturated now? [D]](/post/is-ai-inference-platform-really-that-saturated-now-d-cmplilja20ixhs0glxfkpgl80) and Automating Revenue Forecast Sheet based on Period of Performance and Deal Close Date.

The choice to utilize quantum random data for seeding the latent vector adds a layer of complexity and creativity to the project, underscoring a commitment to innovation that goes beyond standard practices. By generating images of cat faces that are classified based on a quantum bit, the project intertwines machine learning with elements of quantum computing, suggesting a playful yet profound exploration of generative art. It challenges the conventional thinking around data generation and inspires a re-evaluation of how we perceive randomness and quality in AI outputs. This creative constraint may not dramatically enhance image quality, but it exemplifies the human-centered approach that drives many innovators in the AI space today.

Moreover, the constraints of the hardware—such as the 512KB SRAM and reliance on SD cards for layer weights—reflect a growing movement toward efficient computing. As organizations face increasing demands for real-time data processing, understanding how to optimize inference on constrained devices becomes crucial. This aligns with the broader conversation about the future of AI in everyday applications, where efficiency and accessibility are paramount. The project serves as a reminder that while high-end hardware can facilitate advanced AI capabilities, significant breakthroughs can also occur in environments with limited resources. This perspective is echoed in the analysis of potential pitfalls in AI predictions, as seen in [The famous METR AI time horizons graph contains numerous severe errors [D]](/post/the-famous-metr-ai-time-horizons-graph-contains-numerous-sev-cmplvdna30jfrs0glxtzxwmva).

Looking ahead, the implications of running GANs on microcontrollers are significant. As RISC-V continues to gain traction, we may witness a wave of innovation that democratizes access to advanced AI technologies. This development could enable smaller companies and individual developers to leverage generative models in a wide array of applications—from art and entertainment to practical solutions in industries that require cost-effective data processing. The success of this project raises important questions: Could such advancements lead to a new era of AI-driven creativity, or will hardware limitations continue to pose challenges? As we explore these possibilities, the ongoing evolution of AI tools in resource-constrained environments is certainly a space worth watching.

Just thought I'd share, I ran a DCGAN on a dual core RISC-V microcontroller, the CH32H417 generating 64x64 cat faces. This is a new RISC-V MCU, so no TFLite, no CMSIS NN and no external memory. It's a pure C inference engine, bit-identical to PyTorch reference outputs.

The model is 12.6M parameters with int8 per channel quantization. Intermediate activations are stored in DTCM and layer weights stream from SD card using double buffering so the next layer loads while the current one computes. The total available SRAM is 512KB shared between both cores and the inference engine and time to generate one image is 26 seconds, it could be faster, but SD card access speed is the bottleneck rather than computation.

The z vector is seeded from 200 bytes of quantum random data (ANU QRNG vacuum fluctuation source), transformed via Box-Muller into the latent vector. which is not strictly necessary for image quality but it was a fun constraint for the art installation side of the project.

The generated cat is classified as "motivated" or "demotivated" based on a single quantum bit, which selects from a phrase bank with four fragment slots combining into one of 131,072 possible spoken verdicts output through the onboard DAC...

As far as I can tell nobody else is running GAN inference on these low cost RISC-V microcontrollers, cause ARM has the CMSIS NN ecosystem for this kind of thing but RISC-V MCUs especially in the CH32 space have nothing, so the entire inference engine is written from scratch.

Paper:

TinyGAN: Generative Image Synthesis on a RISC-V Microcontroller with Quantum Entropy Sampling

submitted by /u/Separate-Choice
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article →