Skip to main content
v0 · cumulative-context engine · pre-launch

Make your local LLM finish what your hardware can't.

Modern open-weight models advertise massive context windows. Consumer hardware fits only a fraction. Cumulus runs verifiable multi-pass execution that uses your model's full reasoning width.

Open sourceLocal-firstOpen-weight model agnosticNo data leaves your machine

Get early access

Cumulus is in private alpha.

Built for operators of open-weight models on Apple Silicon, RTX, and Linux workstations. Drop your email and we'll send the v0 binary the day it ships.

  • Local-first — every primary operation completes without network.
  • Open-weight model agnostic — works with any OpenAI-compatible local endpoint.
  • Markdown vault on disk — uninstall and your data stays.

Early access

Join the waitlist.

Tell us what you'd use Cumulus for. We'll send the v0 release when it ships.

Primary use case

Pick the one closest to your work. We'll prioritize the v0 task suite around the highest-demand options.

No payment required to join the waitlist

The local-LLM context gap

Your model was built for context your hardware can't hold.

Every release of an open-weight model pushes the advertised context window higher. Consumer hardware does not keep pace. The KV cache that holds per-token attention state grows linearly with context length — and on a typical workstation, the cache fills long before the model's ceiling does.

The specifications promise capability the hardware cannot deliver.

Advertised context (open-weight models, 2025–2026)

What the model card promises — log-scale to show the full range

Llama 4 / Scout class10M tokens
Qwen 3 / DeepSeek class1M tokens
Hermes 4 / Mistral class256K tokens
Llama 3 / older 70B128K tokens

What your hardware actually delivers

A typical 70B-class model on a high-end consumer machine

Usable KV cache~30–40K tokens

Bars are logarithmic so the smallest model still draws — the gap to usable hardware budget is one to three orders of magnitude regardless of which model you load.

What fails today

  • Long-document analysis

    Truncated silently; later sections never reach the model

    A real contract or paper runs 50–200K tokens. KV cache caps the slice.

  • Codebase audit

    Repo doesn't fit; the model invents structure it never saw

    Real repos run 200K–800K tokens. The bare model only sees a slice.

  • Multi-doc cross-reference

    Picks the first few docs, ignores the rest — silently

    No coverage record means omissions are invisible to you.

…and even when the answer comes back, you can't audit which pages your model actually read.

Three-ledger correctness

Cumulative context, not simultaneous.

The task gets the full breadth. The model never exceeds its real attention width. Each pass operates within the simultaneous limit, produces validated evidence, and contributes to a coverage record — all three ledgers durable, append-only, on disk.

  • 01

    Evidence

    Every cited quote, byte-matched to its source span.

    Each observation that survives validation is recorded in an append-only Evidence Ledger linked to a specific source span — byte and character offsets, line numbers, content hashes, extraction-pipeline version. Validation method is named, not implied. The original record is never modified; lifecycle transitions live in a separate event stream.

    evidence.jsonlappend-only
    {
      "evidence_id": "ev_01HX7K9P2N3M…",
      "doc_span": { "start_byte": 18220, "end_byte": 18940 },
      "quote_sha256": "9a8b7c…",
      "validation_status": "verified",
      "validation_method": "byte_match_unicode_nfc"
    }
  • 02

    Coverage

    A declared scope. Processed, failed, and skipped — recorded.

    Before work begins, the user approves a hashed scope manifest. As the engine iterates, every chunk is logged: processed, failed (with reason), or skipped (with reason). At synthesis, completeness is computed against the manifest — not asserted. Bad scope can still produce gaps; the gaps are visible.

    coverage.jsonlappend-only
    {
      "coverage_id": "cov_01HX7K9…",
      "chunk_id": "chunk_004",
      "status": "processed",
      "evidence_count": 3,
      "retries": 1,
      "duration_ms": 4280
    }
  • 03

    Verifier

    Per-claim entailment check before anything reaches the answer body.

    A separate verifier pass labels each claim entailed, contradicted, unsupported, or uncertain. Unsupported and uncertain claims are excluded from the main answer and surfaced as items requiring review. v0 ships a same-model verifier; v1 adds different-local-model and cloud-opt-in modes — labelled honestly, never sold as independent verification.

    verifier.jsonlappend-only
    {
      "verifier_id": "vr_01HX…",
      "target_id": "ev_01HX7K9P2N3M…",
      "verifier_status": "entailed",
      "verifier_profile": "same_model",
      "self_verified": true
    }

Auditable correctness records — checked, not proved.

Show the receipts

v0 cannot hallucinate by construction. Benchmarks back it.

v0 synthesis is deterministic rendering — there is no LLM in the synthesis step. The renderer emits a fixed-format report grounded in active evidence and entailed claims. v1 adds LLM prose with mandatory post-synthesis validation against the entailed-claim set.

cumulus task — phase F · synthesis
deterministic
Coverage Summary
  ── 28 of 28 chunks processed against declared scope
  ── 0 failed, 0 skipped → scope complete

Findings
  ── 7 indemnification clauses identified
       [aggregate, entailed, derived from ev_01..ev_07]
  ── Mutual indemnification in section 3.2
       [extractive, entailed, ev_01]

Evidence
  [ev_01] chunk_004, line 44–51, page n/a
    "Each Party shall indemnify, defend and hold
     harmless the other Party from and against any
     and all losses, damages, claims and expenses…"

Verifier Status
  ── 7 of 8 final-body claims labeled entailed
       (same_model, self_verified)
  ── 1 claim labeled uncertain
       → moved to Items Requiring Review

Items Requiring Review
  ── "The contract is unfavorable to Party A"
       [inferred, uncertain]
       Supporting evidence: ev_03, ev_05
       Reason: verifier could not determine entailment

Failures / Skipped Chunks
  ── (none)

v0 release gate

All four required. No subset is shipped without the rest.

tasks pass
10/12
hallucinated facts in body
0
citation validity
100%
injection resisted
1/1

Comparison baselines

Published per task. No cherry-picked subsets.

  • Bare model
  • Naive chunking
  • Vector RAG
  • Cumulus

Tasks your local LLM can't finish — finished, with an audit trail you can read line by line.

v0 is in private alpha. Drop your email and we'll send the binary the day it ships.

No, it doesn't expand your context window — physics doesn't budge. It does cumulative work, with receipts.