CortexClaw v4.5 -- Changelog

v4.5 Current 2026-06-10

The knowledge graph goes live in retrieval, the maintenance loop runs itself, and the ranking layer is proven on an external benchmark.

Graph fusion enabled. Entity-graph results now fuse into live retrieval via weighted RRF with the vector head pinned (top-3). Multi-hop full-set hit@k 0.477 → 0.500 and set-recall 0.682 → 0.693, with zero regression on single-hop (R@1 0.993).
Self-maintaining graph. Nightly 05:15 refresh chain: incremental entity re-extraction → graph harvest → community summaries → procedural auto-distill. Kickstart run re-extracted all 758 chunks into a clean 482-entity graph.
Cluster summary tier. Louvain communities over the entity graph with LLM-written summaries - broad "what do I know about X" questions route here instead of chunk retrieval.
Reranker verdict: keep disabled. A domain fine-tuned cross-encoder (925 training pairs, hard negatives) scored R@1 0.701 vs the existing multi-signal blend's 0.989. The blend stays; the experiment is documented.
LongMemEval benchmark. 500-question external benchmark (LongMemEval-S cleaned): R@10 0.882; the engine's blended scoring lifts NDCG@10 0.522 → 0.738 (+41%) over vector-only, with the biggest gains on multi-session and temporal-reasoning questions. Run fully sandboxed; the live store untouched.
Supersession scan clean. Full-store contradiction scan: 0 proposals over 756 chunks after the graph rebuild.

v4.4 2026-06-08

Multi-workspace foundations and a pluggable, compressed index backend.

Scope routing. Memory can now be isolated per workspace (private / shared / global) with origin-stamped chunks. Off by default - today's behavior is byte-identical - and eval-gated so isolation is proven before it's ever enabled.
Pluggable index backends. Retrieval now runs on one of three backends: pure Python, NumPy (vectorized, exact), or TurboVec.
TurboVec live. A 4-bit quantized index, ~26× smaller than the raw vectors (8.70MB → 0.33MB) at recall@5/@10 of 0.975. Auto-rebuilds as memory grows and degrades safely to NumPy then Python if anything is missing.
Backfill. All 764 existing chunks stamped with scope and provenance; router backed up first.

v4.3 2026-05-12

Mesh repair and retrieval hygiene.

v4.1 - v4.1.1 2026-05

Forensic fixes to the feedback and daemon paths.

Repaired the missed-retrieval signal that had been silently zero for hundreds of daemon cycles.
Reworked grading into an 11-axis vector so stability is one input, not the whole grade.
Daemon ingest cap and wallclock guards to keep maintenance cycles bounded.

v4.0 2026-05-17

Nineteen improvements across all twelve layers, generated via 3-agent review and weighted vote.

Atomic WAL writes, dopamine-gated promotion, spreading-activation tiebreakers, cosine reconsolidation.
Prediction-error replay, adaptive reward weights, schema hierarchy, hybrid episodic search.
Topic-aware hot tier, consolidation dedup routing, observer reconciliation, and an adversarial self-test.

v1.0 - v3.x 2026-03 → 2026-05

From a flat-file replacement to a twelve-layer neural architecture.

v1.0 - the foundation: chunking, a router index, local embedding search, fact/narrative split, a feedback loop, rollups, and a maintenance daemon.
v2 - six neural layers added on top: variable decay, association, and the first consolidation behaviors.
v3.x - extended to twelve layers: episodic memory, persistent working memory, consolidation triggers, reward-driven learning, the glial observer network, and decoupled reward normalization, plus scoring-precision fixes.

Version History