Run large language models entirely in the browser via WebAssembly — the high-profile 2025-2026 path to private, client-side LLM inference.
llama.cpp's WebAssembly build (and the wllama wrapper) brings the popular ggml inference engine into the browser as a Wasm module, enabling on-device LLM inference without a server round-trip. Models are streamed into the page, parsed in Wasm, and run with the same quantized kernels used in the native CLI. As of 2026 the project supports modern quantized formats (Q4_K_M, Q5_K_M, Q8_0, IQ-series) and runs in both browser and Node hosts, with wllama providing a clean JavaScript API.
The JavaScript host and transpiler for WebAssembly components — run any component in Node.js or the browser.
Deno's globally distributed edge platform — runs JavaScript, TypeScript, and WebAssembly modules at the network edge.
The WebAssembly Binary Toolkit — essential tools for inspecting, converting, and validating Wasm binaries.