v0 · cumulative-context engine · pre-launch

Make your local LLM finish what your hardware can't.

Modern open-weight models advertise massive context windows. Consumer hardware fits only a fraction. Cumulus runs verifiable multi-pass execution that uses your model's full reasoning width.

Get early access

Open sourceLocal-firstOpen-weight model agnosticNo data leaves your machine

Get early access

Cumulus is in private alpha.

Built for operators of open-weight models on Apple Silicon, RTX, and Linux workstations. Drop your email and we'll send the v0 binary the day it ships.

Local-first — every primary operation completes without network.
Open-weight model agnostic — works with any OpenAI-compatible local endpoint.
Markdown vault on disk — uninstall and your data stays.

Early access

Join the waitlist.

Tell us what you'd use Cumulus for. We'll send the v0 release when it ships.

Primary use case

Pick the one closest to your work. We'll prioritize the v0 task suite around the highest-demand options.

Codebase auditRepos that don't fit the window — full-coverage audits of architecture, contracts, or security boundaries.Contract analysisLong contracts (legal, vendor, partnership) where every clause needs to be cited, not summarized.Multi-doc cross-referenceStitching answers across many documents — research papers, design docs, deposition transcripts.Vault / knowledge analysisObsidian-style vaults, personal notes, or shared knowledge bases with thousands of files.OtherSomething else decomposable that hits the hardware-context wall.

Email me v0 release notes and benchmark publishes. We don't spam, and you can unsubscribe in one click.

No payment required to join the waitlist

The local-LLM context gap

Your model was built for context your hardware can't hold.

Every release of an open-weight model pushes the advertised context window higher. Consumer hardware does not keep pace. The KV cache that holds per-token attention state grows linearly with context length — and on a typical workstation, the cache fills long before the model's ceiling does.

The specifications promise capability the hardware cannot deliver.

Advertised context (open-weight models, 2025–2026)

What the model card promises — log-scale to show the full range

Llama 4 / Scout class10M tokens

Qwen 3 / DeepSeek class1M tokens

Hermes 4 / Mistral class256K tokens

Llama 3 / older 70B128K tokens

What your hardware actually delivers

A typical 70B-class model on a high-end consumer machine

Usable KV cache~30–40K tokens

Bars are logarithmic so the smallest model still draws — the gap to usable hardware budget is one to three orders of magnitude regardless of which model you load.

What fails today

Long-document analysis
Truncated silently; later sections never reach the model
A real contract or paper runs 50–200K tokens. KV cache caps the slice.
Codebase audit
Repo doesn't fit; the model invents structure it never saw
Real repos run 200K–800K tokens. The bare model only sees a slice.
Multi-doc cross-reference
Picks the first few docs, ignores the rest — silently
No coverage record means omissions are invisible to you.

…and even when the answer comes back, you can't audit which pages your model actually read.

Three-ledger correctness

Cumulative context, not simultaneous.

The task gets the full breadth. The model never exceeds its real attention width. Each pass operates within the simultaneous limit, produces validated evidence, and contributes to a coverage record — all three ledgers durable, append-only, on disk.

01
Evidence
Every cited quote, byte-matched to its source span.
Each observation that survives validation is recorded in an append-only Evidence Ledger linked to a specific source span — byte and character offsets, line numbers, content hashes, extraction-pipeline version. Validation method is named, not implied. The original record is never modified; lifecycle transitions live in a separate event stream.
evidence.jsonlappend-only
```
{
  "evidence_id": "ev_01HX7K9P2N3M…",
  "doc_span": { "start_byte": 18220, "end_byte": 18940 },
  "quote_sha256": "9a8b7c…",
  "validation_status": "verified",
  "validation_method": "byte_match_unicode_nfc"
}
```
02
Coverage
A declared scope. Processed, failed, and skipped — recorded.
Before work begins, the user approves a hashed scope manifest. As the engine iterates, every chunk is logged: processed, failed (with reason), or skipped (with reason). At synthesis, completeness is computed against the manifest — not asserted. Bad scope can still produce gaps; the gaps are visible.
coverage.jsonlappend-only
```
{
  "coverage_id": "cov_01HX7K9…",
  "chunk_id": "chunk_004",
  "status": "processed",
  "evidence_count": 3,
  "retries": 1,
  "duration_ms": 4280
}
```
03
Verifier
Per-claim entailment check before anything reaches the answer body.
A separate verifier pass labels each claim entailed, contradicted, unsupported, or uncertain. Unsupported and uncertain claims are excluded from the main answer and surfaced as items requiring review. v0 ships a same-model verifier; v1 adds different-local-model and cloud-opt-in modes — labelled honestly, never sold as independent verification.
verifier.jsonlappend-only
```
{
  "verifier_id": "vr_01HX…",
  "target_id": "ev_01HX7K9P2N3M…",
  "verifier_status": "entailed",
  "verifier_profile": "same_model",
  "self_verified": true
}
```

Auditable correctness records — checked, not proved.

Show the receipts

v0 cannot hallucinate by construction.
Benchmarks back it.

v0 synthesis is deterministic rendering — there is no LLM in the synthesis step. The renderer emits a fixed-format report grounded in active evidence and entailed claims. v1 adds LLM prose with mandatory post-synthesis validation against the entailed-claim set.

cumulus task — phase F · synthesis

deterministic

Coverage Summary
  ── 28 of 28 chunks processed against declared scope
  ── 0 failed, 0 skipped → scope complete

Findings
  ── 7 indemnification clauses identified
       [aggregate, entailed, derived from ev_01..ev_07]
  ── Mutual indemnification in section 3.2
       [extractive, entailed, ev_01]

Evidence
  [ev_01] chunk_004, line 44–51, page n/a
    "Each Party shall indemnify, defend and hold
     harmless the other Party from and against any
     and all losses, damages, claims and expenses…"

Verifier Status
  ── 7 of 8 final-body claims labeled entailed
       (same_model, self_verified)
  ── 1 claim labeled uncertain
       → moved to Items Requiring Review

Items Requiring Review
  ── "The contract is unfavorable to Party A"
       [inferred, uncertain]
       Supporting evidence: ev_03, ev_05
       Reason: verifier could not determine entailment

Failures / Skipped Chunks
  ── (none)

v0 release gate

All four required. No subset is shipped without the rest.

tasks pass: 10/12
hallucinated facts in body: 0
citation validity: 100%
injection resisted: 1/1

Comparison baselines

Published per task. No cherry-picked subsets.

Bare model
Naive chunking
Vector RAG
Cumulus

Tasks your local LLM can't finish — finished, with an audit trail you can read line by line.

v0 is in private alpha. Drop your email and we'll send the binary the day it ships.

Get early access

No, it doesn't expand your context window — physics doesn't budge. It does cumulative work, with receipts.

Make your local LLM finish what your hardware can't.

Cumulus is in private alpha.

Join the waitlist.

Your model was built for context your hardware can't hold.

Cumulative context, not simultaneous.

Evidence

Coverage

Verifier

v0 cannot hallucinate by construction. Benchmarks back it.

Tasks your local LLM can't finish — finished, with an audit trail you can read line by line.

v0 cannot hallucinate by construction.
Benchmarks back it.