Make your local LLM finish what your hardware can't.
Modern open-weight models advertise massive context windows. Consumer hardware fits only a fraction. Cumulus runs verifiable multi-pass execution that uses your model's full reasoning width.
Get early access
Cumulus is in private alpha.
Built for operators of open-weight models on Apple Silicon, RTX, and Linux workstations. Drop your email and we'll send the v0 binary the day it ships.
- Local-first — every primary operation completes without network.
- Open-weight model agnostic — works with any OpenAI-compatible local endpoint.
- Markdown vault on disk — uninstall and your data stays.
The local-LLM context gap
Your model was built for context your hardware can't hold.
Every release of an open-weight model pushes the advertised context window higher. Consumer hardware does not keep pace. The KV cache that holds per-token attention state grows linearly with context length — and on a typical workstation, the cache fills long before the model's ceiling does.
The specifications promise capability the hardware cannot deliver.
Advertised context (open-weight models, 2025–2026)
What the model card promises — log-scale to show the full range
What your hardware actually delivers
A typical 70B-class model on a high-end consumer machine
Bars are logarithmic so the smallest model still draws — the gap to usable hardware budget is one to three orders of magnitude regardless of which model you load.
What fails today
Long-document analysis
Truncated silently; later sections never reach the model
A real contract or paper runs 50–200K tokens. KV cache caps the slice.
Codebase audit
Repo doesn't fit; the model invents structure it never saw
Real repos run 200K–800K tokens. The bare model only sees a slice.
Multi-doc cross-reference
Picks the first few docs, ignores the rest — silently
No coverage record means omissions are invisible to you.
…and even when the answer comes back, you can't audit which pages your model actually read.
Three-ledger correctness
Cumulative context, not simultaneous.
The task gets the full breadth. The model never exceeds its real attention width. Each pass operates within the simultaneous limit, produces validated evidence, and contributes to a coverage record — all three ledgers durable, append-only, on disk.
- 01
Evidence
Every cited quote, byte-matched to its source span.
Each observation that survives validation is recorded in an append-only Evidence Ledger linked to a specific source span — byte and character offsets, line numbers, content hashes, extraction-pipeline version. Validation method is named, not implied. The original record is never modified; lifecycle transitions live in a separate event stream.
evidence.jsonlappend-only{ "evidence_id": "ev_01HX7K9P2N3M…", "doc_span": { "start_byte": 18220, "end_byte": 18940 }, "quote_sha256": "9a8b7c…", "validation_status": "verified", "validation_method": "byte_match_unicode_nfc" } - 02
Coverage
A declared scope. Processed, failed, and skipped — recorded.
Before work begins, the user approves a hashed scope manifest. As the engine iterates, every chunk is logged: processed, failed (with reason), or skipped (with reason). At synthesis, completeness is computed against the manifest — not asserted. Bad scope can still produce gaps; the gaps are visible.
coverage.jsonlappend-only{ "coverage_id": "cov_01HX7K9…", "chunk_id": "chunk_004", "status": "processed", "evidence_count": 3, "retries": 1, "duration_ms": 4280 } - 03
Verifier
Per-claim entailment check before anything reaches the answer body.
A separate verifier pass labels each claim entailed, contradicted, unsupported, or uncertain. Unsupported and uncertain claims are excluded from the main answer and surfaced as items requiring review. v0 ships a same-model verifier; v1 adds different-local-model and cloud-opt-in modes — labelled honestly, never sold as independent verification.
verifier.jsonlappend-only{ "verifier_id": "vr_01HX…", "target_id": "ev_01HX7K9P2N3M…", "verifier_status": "entailed", "verifier_profile": "same_model", "self_verified": true }
Auditable correctness records — checked, not proved.
Show the receipts
v0 cannot hallucinate by construction.
Benchmarks back it.
v0 synthesis is deterministic rendering — there is no LLM in the synthesis step. The renderer emits a fixed-format report grounded in active evidence and entailed claims. v1 adds LLM prose with mandatory post-synthesis validation against the entailed-claim set.
Coverage Summary
── 28 of 28 chunks processed against declared scope
── 0 failed, 0 skipped → scope complete
Findings
── 7 indemnification clauses identified
[aggregate, entailed, derived from ev_01..ev_07]
── Mutual indemnification in section 3.2
[extractive, entailed, ev_01]
Evidence
[ev_01] chunk_004, line 44–51, page n/a
"Each Party shall indemnify, defend and hold
harmless the other Party from and against any
and all losses, damages, claims and expenses…"
Verifier Status
── 7 of 8 final-body claims labeled entailed
(same_model, self_verified)
── 1 claim labeled uncertain
→ moved to Items Requiring Review
Items Requiring Review
── "The contract is unfavorable to Party A"
[inferred, uncertain]
Supporting evidence: ev_03, ev_05
Reason: verifier could not determine entailment
Failures / Skipped Chunks
── (none)v0 release gate
All four required. No subset is shipped without the rest.
- tasks pass
- 10/12
- hallucinated facts in body
- 0
- citation validity
- 100%
- injection resisted
- 1/1
Comparison baselines
Published per task. No cherry-picked subsets.
- Bare model
- Naive chunking
- Vector RAG
- Cumulus
Tasks your local LLM can't finish — finished, with an audit trail you can read line by line.
v0 is in private alpha. Drop your email and we'll send the binary the day it ships.
No, it doesn't expand your context window — physics doesn't budge. It does cumulative work, with receipts.