Tool80kFeatured

llama.cpp (Wasm)

Run large language models entirely in the browser via WebAssembly — the high-profile 2025-2026 path to private, client-side LLM inference.

Visit Website View on GitHub

About

llama.cpp's WebAssembly build (and the wllama wrapper) brings the popular ggml inference engine into the browser as a Wasm module, enabling on-device LLM inference without a server round-trip. Models are streamed into the page, parsed in Wasm, and run with the same quantized kernels used in the native CLI. As of 2026 the project supports modern quantized formats (Q4_K_M, Q5_K_M, Q8_0, IQ-series) and runs in both browser and Node hosts, with wllama providing a clean JavaScript API.

llama.cpp (Wasm)

About

Tags

Jco

Deno Deploy

WABT

llama.cpp (Wasm)

About

Tags

Jco

Deno Deploy

WABT

llama.cpp (Wasm)

About

Tags

Related Projects

Jco

Deno Deploy

WABT

llama.cpp (Wasm)

About

Tags

Related Projects

Jco

Deno Deploy

WABT