peg ::= mill

PEG grammars as LLM output constraints — grammar expressiveness that regex and CFG solutions can't match.

TypeScript-native parser generator. Parametric PEG rules today, per-token constrained decoding for open-weight LLMs on the roadmap.

npm CI License: Apache 2.0

Install in one command

Existing PEG.js 0.10.0 grammars compile without changes.

$ npm install -g pegmill
$ pegmill grammar.pegjs
pegmill v0.1.2 ⟦ parse ⟧ grammar.pegjs ⟦ rules ⟧ 42 rules loaded ⟦ compile ⟧ generating parser ⟦ emit ⟧ parser.js build completed in 48ms

Four things PEG.js never had

Parametric rules

Write SepList<Elem, Sep> once. Podlite uses it to compress nine formatting-code variants — A<>, B<>, C<>, through Z<> — into a single rule.

Drop-in for PEG.js 0.10.0

Run pegmill grammar.pegjs against existing grammars with zero edits. Podlite runs 1600+ parser tests through it in production today.

WASM target (Phase 2)

One grammar, deploy to Node, Deno, Bun, or the browser. No platform binaries. No Python bridge.

Constrained decoding (Phase 4)

PEG grammar becomes a per-token mask for open-weight LLMs — Gemma, GLM, Qwen. Hallucinated structure becomes impossible, not just unlikely.

Write the list rule once, reuse everywhere

Every PEG generator makes you copy-paste a list rule for each combination of element type and separator. Pegmill lets the pattern live in one place:

SepList<Elem, Sep>
  = head:Elem tail:(_ Sep _ Elem)*
    { return [head, ...tail.map(t => t[3])]; }

CsvRow   = SepList<QuotedField, ",">
ProtoVer = SepList<Version, ".">
PathSegs = SepList<Ident, "/">

Inline expressions and string literals work as arguments too: List<[a-z]+, ",">, Tag<"b">.

Where Pegmill fits among existing tools

Constrained decoding already works in two ecosystems. Pegmill covers the third.

Outlines · Python

Regex and Lark CFG constraints. Mature, production-ready. Requires a Python runtime in your stack.

XGrammar · C++ (vLLM default)

Pushdown CFG with 14–80× speedup. Excellent for server-side batch. Python bindings; C++ core.

llama.cpp GBNF · C++

BNF extension baked into llama.cpp. De facto for local inference. Grammar syntax is narrower than PEG.

Pegmill · TypeScript

PEG grammars, predicates, lookahead. WASM target for Node, Deno, Bun, browser. No Python bridge, no C++ build step.

The roadmap bets on constrained decoding

Python and C++ solutions ship today (see the comparison above). Nobody ships a TypeScript stack with PEG grammars and a WASM target. That is the gap Pegmill is walking into — and the 12–18 month window is still open.

Phase 1

Parametric rules — done

Released as pegmill@0.1.2. 1115 spec tests passing, verified against Podlite grammar in production.

Phase 2

WASM backend

Compile grammars to WebAssembly for universal deployment. Q3 2026, dependent on Axona traction.

Phase 3a

@dispatch directive

Table-driven choice routing, 4.7× fewer rule attempts for Podlite-class grammars. Prerequisite for per-token constraint checks in Phase 4.

Phase 4

LLM constrained decoding

PEG grammar as hard per-token constraint. Target runtime: Gemma, GLM, Qwen via node-llama-cpp. 12–18 month window before the TypeScript niche fills in.

Full plan with competitive landscape and watch items: ROADMAP.md.

Independent project, Apache 2.0

No VC, no corporate sponsor, no equity on anything. Maintained by Aliaksandr Zahatski, hosted under the pegmill GitHub organization.

The fork from PEG.js 0.10.0 happened after PR #337 — the WASM backend — closed unmerged in the upstream. Peggy's plugin surface could not cover the codegen changes that path required, so forking was the only route. Pegmill keeps PEG.js compatibility as a baseline and treats everything beyond that as new territory.

A star is the fastest way to help

Stars on GitHub and links from your project move the dial faster than anything else at this stage.

If you run Pegmill in production and want to talk about security audits, stability guarantees, or contributing a WASM backend — the issue tracker is the channel. Same for sponsorship and grant-program enquiries.