Lucebox is a plug-and-play computer built for running local AI models and agents at full speed. Inside the custom chassis, a Ryzen AI MAX+ 395 with 128GB of unified LPDDR5X memory is paired with an RTX 3090, and the two work together through an open-source inference engine hand-tuned for exactly this hardware.
The architecture is what makes it fast. Large models live in the 128GB unified memory tier, while the 3090’s high-bandwidth VRAM acts as a fast tier. Speculative decoding (DFlash) and speculative prefill (PFlash) bridge the two, producing inference speeds up to 10x higher than llama.cpp on the same silicon and beating machines like the Mac Studio and DGX Spark at a fraction of their effective cost.
Getting started takes minutes, not weeks. The whole stack comes pre-installed, and a single CLI command deploys any open model. There is no driver configuration, no quantization trial and error, no environment debugging. The software is fully open source on GitHub (Luce-Org/lucebox-hub), with thousands of stars and dozens of contributors improving the kernels in the open.
For developers and teams, the payoff is threefold: top-of-class tokens per second at $4,900, complete data privacy since nothing touches the cloud, and a fixed hardware cost that replaces ever-growing API bills. If you want to run agents around the clock on hardware you own, Lucebox is the computer for it.