CortexClaw v4.5 - Architecture

← Overview· How It Works· Overview· Foundation· Architecture· L11· L8· L9· L7· L6· L10· v3.1· L12· L13· L14· L15· L16· L17· Synapses· Stats· CNS

How It Works

What Is CortexClaw?

What It Is

CortexClaw is a memory system for AI agents. It replaces flat text files with structured, searchable memory chunks that are retrieved on demand. Instead of loading everything every session, the agent queries CortexClaw and gets back only what's relevant.

Think of it as a personal search engine for an AI's memory.

How It Works With the Model

CortexClaw sits alongside the AI model (Claude, in our case) as external memory infrastructure. When the model needs to recall something, it calls CortexClaw's retrieve function. CortexClaw scores all memory chunks by semantic similarity, recency, feedback history, and associative connections, then returns the top results.

The model never loads the full memory -- only what the query pulls up. After the conversation, feedback flows back: which chunks were actually useful, which were noise. This feedback loop teaches the system to retrieve better over time.

What It's Built On

Pure Python. No frameworks, no dependencies beyond the standard library and urllib for HTTP calls. Runs on any machine with Python 3.10+. Embeddings are generated locally via Ollama (nomic-embed-text, 384-dim vectors) -- zero API cost.

The router index is a JSONL file (one line per chunk). Chunks are individual markdown files. The daemon runs as a launchd service on macOS, executing maintenance every 2 hours. Total codebase: ~3,700 lines across 6 Python modules.

What Makes It Different

Every layer is modeled after a specific brain mechanism with a cited neuroscience paper. Memories decay at multiple timescales (Benna & Fusi 2016). Replay during idle periods consolidates important memories (Klinzing et al. 2019). Associative connections form between co-accessed chunks (Uytiepo et al. 2025). A dopaminergic reward signal learns from retrieval feedback (Schultz 1997).

As of v4.0, nineteen improvements across all twelve layers: atomic WAL writes, dopamine-gated promotion, spreading activation tiebreakers, cosine reconsolidation, prediction-error replay, adaptive GDPO weights, schema hierarchy, hybrid episodic search, topic-aware hot tier, SWR dedup routing, observer reconciliation, adversarial self-test, and more. Generated via 3-agent review and weighted vote.

Full System Architecture

CortexClaw v4.5

The deep version. A biologically-inspired memory system for AI agents - twelve neural layers that turn flat chunk storage into a living memory that consolidates, associates, and evolves over time. For the short version, see the overview.

97%

Deep Test Pass Rate

v4.0: 184 chunks / 2,398 synapses

19 improvements across 3 tiers

Neural Layers

Memory Chunks

1,317

Synapses

92%

Token Efficiency

The Problem

AI agents lose everything between sessions. Loading full memory files burns tokens and lacks prioritization. Flat storage treats all memories equally, missing the patterns that make memory useful.

The Solution

Model memory like the brain does. Fast-decaying attention for recent events, slow-building permanence for important patterns. Memories that fire together wire together. Sleep consolidates. Schemas generalize.

The Base System

What CortexClaw Is

CortexClaw replaces the flat-file memory that most AI agents use. Instead of loading entire memory documents every session (burning thousands of tokens on irrelevant context), it breaks knowledge into small, searchable chunks and retrieves only what's needed.

Traditional Agent Memory

A single MEMORY.md file that grows forever. Every session loads the whole thing. 50KB of text, 90% irrelevant to the current conversation. No prioritization, no forgetting, no association between ideas.

MEMORY.md (47KB) ............ load every session
second-brain.md (12KB) ..... load every session
daily-logs/ (200+ files) .. never loaded
~60,000 tokens burned per session startup

CortexClaw Memory

Knowledge split into focused chunks (~200 words each), indexed by topic and tags, embedded for semantic search. Only relevant chunks are retrieved per query. Typical session loads 3-5 chunks instead of everything.

router.jsonl (index) ....... lightweight scan
chunks/ (37 files) ......... load on demand
embeddings (384-dim) ....... cosine similarity
~5,000 tokens per retrieval (92% savings)

Core Concepts

v1.0 -- The Foundation Layer

Before the neural layers, CortexClaw v1.0 established the core infrastructure that everything builds on. These are the primitives.

Chunking + Router Index

Every piece of knowledge is stored as a chunk -- a small file with a topic, summary, tags, and content. The router is a lightweight index (one JSON line per chunk) that maps IDs to summaries and tags. The agent scans the router to decide which chunks to load, without reading every file.

Embedding Search

Each chunk gets a 384-dimensional vector embedding via nomic-embed-text running locally on Ollama (zero API cost). Retrieval computes cosine similarity between the query embedding and all chunk embeddings, returning the top-K most relevant. This is how the system answers "what do I know about X?" without keyword matching.

Fact / Narrative Split

Chunks have two levels: fact (compressed key points, fast to scan) and narrative (full context with reasoning and background). Quick lookups pull facts only. Deep dives pull both. This alone cuts token usage by ~40% for routine queries.

Feedback Loop

After every retrieval, the system logs which chunks were used (actually referenced in conversation), wasted (retrieved but ignored), and missed (needed but not retrieved). This feedback adjusts future retrieval scoring -- chunks that consistently get used rank higher, wasted chunks sink.

Rollups

When a tag accumulates too many chunks (threshold: 40), the system triggers a rollup -- merging older, lower-stability chunks into a single consolidated chunk. This keeps the total chunk count manageable while preserving the important information. Think of it as compressing old memories into summaries.

Daemon Mode

A background process runs every 2 hours (configurable), executing maintenance: decay calculations, replay cycles, synapse building, and archival. This is the system's "sleep" -- the offline consolidation that makes memories sharper over time. Heartbeat checks trigger it, or it runs via cron.

v1 to v4.0

v1 gave CortexClaw the ability to store, search, and maintain memories efficiently. But it still treated every memory as independent -- no associations, no variable decay rates, no pattern recognition. v2 added six neural layers on top of this foundation. v3.x extended to twelve layers with scoring precision fixes, episodic memory, persistent working memory, consolidation triggers, reward-driven learning, the Glial Network, and L17 GDPO Feedback. v4.0 adds nineteen improvements across three tiers: atomic WAL writes, dopamine-gated promotion, spreading activation tiebreakers, cosine reconsolidation, synthesis validation, prediction-error replay, semantic relevance scoring, adaptive GDPO weights, sigmoid Eq8 gate, active demotion, schema hierarchy, hybrid episodic search, topic-aware hot tier, SWR dedup routing, observer reconciliation with quality gate, TTL cache, and adversarial self-test.

v4.0 Architecture

The Neural Layers

Twelve layers built on the v1 foundation. Each addresses a specific biological memory mechanism. Together, they create a system where memories compete, consolidate, and evolve -- just like neurons do.

L11Cascade Decay

Multi-Timescale Stability

Inspired by Benna & Fusi 2016 (synaptic complexity theory). Instead of one decay number, each memory has three stability tiers that decay at different rates -- like how the brain has fast synaptic changes and slow structural ones.

How It Works

Every memory has three stability tiers: fast (volatile, working memory), medium (session-stable), and slow (permanent knowledge). Each decays at its own rate per day.

The effective stability is a weighted blend: 20% fast + 30% medium + 50% slow. This means the slow tier dominates long-term survival -- a memory must prove its worth over time to persist.

Spaced Repetition

When a memory is accessed, its fast tier resets to 1.0. But the real magic is promotion: each access transfers 5% from fast to medium, and 1% of medium to slow.

A 1-hour cooldown prevents rapid-fire access from gaming the system. Only spaced, genuine retrievals build long-term stability -- just like spaced repetition in human learning.

Fast Rate0.90/day -- halves in ~6.5 days. Captures recent attention.
Medium Rate0.97/day -- halves in ~23 days. Session-stable working knowledge.
Slow Rate0.995/day -- halves in ~138 days. Near-permanent core knowledge.
Archive AtEffective stability below 0.30, with a 3-day grace period for new memories.

L8Associative Mesh

Memories That Wire Together

Inspired by multi-synaptic boutons research (Uytiepo et al. 2025). Memories form weighted connections through three mechanisms: co-access, semantic similarity, and temporal proximity. Retrieving one memory primes related memories through spreading activation.

Three Synapse Types

Co-access: When two memories are retrieved together, a synapse forms between them (+0.15 weight each time). The more they co-occur, the stronger the link.

Semantic: During maintenance, chunks sharing tags are compared by cosine similarity. If above 0.65, a semantic synapse is created.

Temporal: Chunks ingested within 1 hour of each other get weak temporal links (0.20 weight), capturing contextual proximity.

Spreading Activation

During retrieval, after the initial top-K scoring, spreading activation kicks in: each top result sends a signal through its synapses, boosting connected chunks that might not have scored high on their own.

This means asking about "hardware" can pull in "model config" if they've been frequently co-accessed -- just like how thinking about one topic naturally reminds you of related ones.

Synapse weight decays at 0.95/day and pruned below 0.05. Max 15 connections per chunk to prevent noise.

L9Reconsolidation Window

Update, Don't Duplicate

Inspired by Nader et al. 2000 (memory reconsolidation). When a memory is retrieved, it enters a labile state for 2 hours. If new content arrives with overlapping tags, it merges into the existing chunk rather than creating a new one.

Why This Matters

Without reconsolidation, asking "remember that the model is Opus" twice creates two nearly-identical chunks. With reconsolidation, the second fact merges into the existing chunk if it was recently accessed.

The Jaccard overlap of tags must be at least 40% to trigger a merge -- this prevents unrelated facts from contaminating each other. After reconsolidation, the chunk's embedding is invalidated and will be re-computed on next retrieval.

The chunk also gets a cascade promotion on merge, since the brain treats reconsolidation as reinforcement.

L7Schema Priming

Hot Topics Get Priority

Inspired by Zaki & Cai 2025 (excitability priming / pre-allocation). The system tracks which topic clusters are currently "hot" -- being actively retrieved or ingested. New memories matching hot schemas get higher initial stability, while truly novel memories get flagged for priority replay.

The Two Paths

Schema match: If new memory tags overlap with hot schemas (temperature above 0.30), the memory starts with boosted medium-tier stability (0.35 vs 0.20) and higher feedback score (1.10 vs 1.00). This reflects the brain's tendency to integrate new info faster when it fits existing mental models.

Novel memory: If no schema matches, the memory is flagged as "novel" and gets priority in the replay engine (L6). Novel memories need more consolidation passes because they don't have an existing framework to attach to -- the brain processes truly new information differently from familiar-pattern information.

Schema temperature decays roughly 50% every 12 hours. Dead schemas (below 0.01) are pruned automatically during maintenance.

L6Replay Engine

Sleep Consolidation

Inspired by Klinzing et al. 2019 (memory replay during sleep). During daemon maintenance cycles (every 2 hours), the system "replays" the 10 most important recent memories -- re-embedding them, strengthening their synapses, and gently promoting them through the cascade.

Why Replay Matters

In the brain, sleep replay is when the hippocampus "replays" recent experiences to the neocortex, transferring short-term memories into long-term storage. CortexClaw simulates this during its daemon maintenance cycles.

Replay promotion is deliberately gentler than direct access (+0.02 medium vs +0.05 for real retrieval). This prevents the daemon from artificially inflating memories that were never actually useful. The system also uses a lower similarity threshold during replay (0.50 vs 0.65 for semantic synapses) because the brain is more associative during sleep, forming connections it wouldn't make while "awake."

L10Schema Synthesis

From Facts to Principles

Inspired by Spens & Burgess 2024 (generative model of memory). During rollup merges, a local LLM distills merged chunks into generalized behavioral patterns and principles -- not fact lists, but schemas. These schema chunks decay slower and start with higher baseline stability.

Schema vs Fact

Regular chunks store facts: specific details, configurations, names, dates. Schema chunks store patterns: generalized behavioral rules, workflow principles, recurring preferences.

The LLM prompt explicitly asks for 1-3 concise sentences capturing behavioral patterns, not bullet lists. Temperature is kept low (0.3) for reliability.

Schema Durability

Schema chunks start with higher baseline stability (medium=0.60 vs 0.20 for regular chunks, slow=0.30 vs 0.05). They also use slower decay rates across all tiers.

This reflects how the brain treats generalized knowledge: specific episode details fade, but the patterns extracted from them persist far longer.

v3.1 -- Precision Tuning

Scoring Quality Fixes

Four targeted fixes that eliminated scoring artifacts and noise, taking retrieval quality from good to surgical.

Relative Gap Off-Topic Threshold

Replaced the absolute MIN_SCORE_THRESHOLD=0.55 with a relative pre-spreading check. Now requires top score ≥ 0.72 and gap ≥ 0.03 between top results. Catches off-topic queries that the old absolute threshold missed because spreading activation could inflate scores past 0.55.

Score Ceiling Cap at 1.0

Enforces a hard ceiling at 1.0 on all scores -- both pre and post spreading activation. Previously, synapse boosting could push scores above 1.0, creating misleading confidence signals and breaking relative ranking between results.

Schema Heat Quality Gate

Only the top-2 retrieval results update schema heat. Previously, all returned results warmed their schemas, which meant low-relevance tail results were polluting the heat map and causing schema drift on unrelated topics.

Temporal Synapse Tag-Shared Filter

Temporal synapses now require ≥ 1 shared tag between chunks. Killed approximately 36% of noise synapses that were forming between temporally proximate but semantically unrelated chunks.

91%

Top-1 Accuracy

1.000

MRR

100%

Off-Topic Rejection

-36%

Noise Synapses Killed

L12Episodic Buffer

Raw Memory Traces

Inspired by Tulving 1972 (episodic memory) and Baddeley 2000 (episodic buffer). The episodic buffer maintains vivid, temporally-tagged traces that semantic memory (chunks) distills from. SQLite + FTS5 full-text search provides instant recall of raw conversation history.

Brain RegionMedial Temporal Lobe / Hippocampal formation
PapersTulving 1972 -- "Episodic and semantic memory"; Baddeley 2000 -- "The episodic buffer"
EngineSQLite + FTS5 with porter stemming + unicode tokenizer
Entries74+ ingested from daily logs + chunk files
Database120KB total, zero API cost

How It Works

Integrated directly into retrieve() -- every CortexClaw query gets supplemental episodic hits appended alongside semantic chunk results. The FTS5 engine searches raw conversation text using porter stemming, catching exact phrases and context that embedding search misses.

The maintain cycle auto-syncs new daily logs, keeping the episodic buffer current without manual intervention.

Why It Matters

Semantic chunks are distilled and compressed -- great for patterns, but they lose the raw texture of conversations. The episodic buffer preserves the vivid, temporally-tagged traces that chunks were derived from.

When you need "what did Leon say about X last Tuesday?" rather than "what does the system know about X?", the episodic buffer delivers. Full-text search complements embedding similarity with exact phrase matching.

L13Prefrontal Index

Persistent Working Memory

Inspired by Goldman-Rakic 1995 (persistent activity in working memory). A frozen hot tier -- a small guaranteed-injected block that bypasses retrieval entirely. Like dlPFC persistent neural firing that maintains task-critical information without requiring reactivation.

Brain RegionDorsolateral Prefrontal Cortex (dlPFC)
PaperGoldman-Rakic 1995 -- "Persistent activity in working memory"
Size~3000 chars, 10 entries
ContentsIdentity, key people, active projects, critical config
BypassInjected on every session start -- no retrieval needed

Auto-Refresh

The prefrontal index refreshes during maintain cycles. Entries are scored by a composite of schema heat, access count, cascade stability, and feedback signals. The top 10 entries are frozen into the index.

Supports manual pinning -- critical entries can be locked in place regardless of scoring. Identity and key relationship information is always pinned.

The dlPFC Analogy

In the brain, the dorsolateral prefrontal cortex maintains persistent neural firing patterns for task-critical information -- your name, what you're working on, who you're talking to. This information doesn't need to be "remembered" each time; it's always active.

The prefrontal index does the same thing: the most essential context is pre-loaded into every session, ensuring the agent never needs to search for its own identity or current priorities.

L14Sharp-Wave Ripple

State-Transition Consolidation

Inspired by Buzsáki 2015 (hippocampal sharp-wave ripples). A pre-compression flush trigger that fires a consolidation pass before context compaction, extracting decisions, facts, action items, corrections, and insights before they're lost.

Brain RegionHippocampus CA3 → CA1 → Neocortex
PaperBuzsáki 2015 -- "Hippocampal sharp-wave ripples"
TriggerFires before context compaction (state transition)
ArchitectureTwo-tier: rule-based fast path + agent-assisted deep path

L15Dopaminergic Signal

Reward-Driven Learning

Inspired by Schultz 1997 (reward prediction error) and Lisman & Grace 2005 (hippocampal-VTA loop). The reward signal that makes the whole system learn from experience -- auto-generating feedback from retrieval patterns to drive decay tuning, tag expansion, and synapse strengthening.

Brain RegionVentral Tegmental Area (VTA)
PapersSchultz 1997 -- "Reward prediction error"; Lisman & Grace 2005 -- "Hippocampal-VTA loop"
MechanismRetrieval feedback loop -- auto-generates "used" and "wasted" signals
DrivesDecay tuning, tag expansion, synapse strengthening

Feedback Loop

Every retrieval generates implicit feedback: chunks that appear in the agent's response are marked "used" (positive reward). Chunks retrieved but never referenced are marked "wasted" (negative signal).

These signals feed back into the cascade decay system -- used chunks get stability boosts, wasted chunks get accelerated decay. Over time, the system naturally surfaces useful memories and buries noise.

Enhanced Analytics

v4.0 adds granular tracking: per-query chunk access patterns, conversation clustering, feedback-to-decay integration, prediction-error replay (L6), and proactive synthesis (L10). This creates a rich signal that goes beyond simple used/wasted binary.

The VTA loop analogy is precise: dopamine neurons fire when outcomes exceed expectations (chunk was useful) and suppress when outcomes disappoint (chunk was irrelevant). The prediction error drives learning.

L16Glial Network

The Silent Processors

Inspired by Allen & Lyons 2018 (glia as architects of CNS formation). Three specialized observer agents decompose every memory at ingest time, extracting structured facts, contextual patterns, and emotional valence. Like glial cells in the brain -- long dismissed as passive scaffolding, now known to actively modulate synapses, regulate neurotransmitter uptake, and coordinate neural activity across regions.

Brain RegionThroughout CNS -- astrocytes in gray matter, oligodendrocytes in white matter, microglia everywhere
PaperAllen & Lyons, 2018 -- "Glia as architects of central nervous system formation and function"
ModelCustom LM Model via Ollama (local, zero API cost, ~10s per chunk)

The Three Glial Agents

Astrocytes (Fact Hunter): Like astrocytes providing structural and metabolic support to neurons, this agent extracts the hard facts -- entities, configuration values, names, technical details, and relationships between them.

Oligodendrocytes (Context Weaver): Like oligodendrocytes wrapping axons in myelin to speed signal propagation, this agent wraps raw facts in context -- identifying patterns, implications, and connections to existing knowledge.

Microglia (Emotion Tagger): Like microglia surveilling the CNS for threats and damage, this agent monitors for emotional valence, urgency signals, and motivational context -- marking memories that carry threat, reward, or importance signals.

Why It Matters

Raw chunks are flat text. The Glial Network transforms them into structured, multi-dimensional representations before they enter the memory system. This means retrieval can match not just on content, but on extracted entities, identified patterns, and emotional context.

The biological parallel is precise: glial cells outnumber neurons roughly 1:1 in the human brain. They don't fire action potentials, but nothing works without them. They modulate synaptic transmission, clear neurotransmitters, maintain the blood-brain barrier, and guide neural development. CortexClaw's Glial Network does the same preprocessing work that makes downstream neural operations (retrieval, replay, reconsolidation) more effective.

L17GDPO Feedback

Decoupled Reward Normalization

Inspired by Liu et al. 2026 (GDPO) and Padoa-Schioppa & Assad 2006 (multi-attribute value coding in OFC). Previously, the feedback system collapsed three independent signals into a single scalar score -- losing information and rewarding the wrong things. L17 separates, normalizes, and gates each dimension independently.

Brain RegionOrbitofrontal Cortex (OFC)
PapersLiu et al. 2026 -- "GDPO: Group Decoupled Policy Optimization"; Padoa-Schioppa & Assad 2006 -- "Neurons in the orbitofrontal cortex encode economic value"
Signalsused (50%), wasted (35%), missed (15%) -- normalized per-dimension before combination
GatingReward conditioning gate: if wasted score exceeds threshold, used reward is zeroed entirely. 2 chunks currently gated.

Decoupled Normalization + Reward Conditioning Gate

Previously, the feedback system collapsed three independent signals (used, wasted, missed) into a single scalar score. This caused information loss -- a chunk that was heavily used and frequently wasted would get the same score as a chunk that was moderately used and never wasted.

L17 tracks each dimension independently, normalizes per-dimension before combining with explicit weights (50 / 35 / 15), and applies a reward conditioning gate: if a chunk's wasted score exceeds the threshold, its used reward is zeroed entirely. This is the "don't reward efficiency unless correctness is met first" pattern from the GDPO paper.

2 chunks are currently gated.

RCL-Inspired Additions

Batch Consensus Signals scaled by agreement ratio. If 1/5 queries wasted a chunk, penalty attenuates. If 5/5 did, full penalty applies.
Optimizer State Rolling ledger across maintain cycles. Detects oscillation (boost then penalize within 2 cycles) and dampens by 50%. Tracks optimization phase.
Contrastive Analysis Finds chunks with high score variance across similar queries. Flags for tag narrowing instead of delete -- preserving knowledge while improving precision.

Associative Network

Synapse Atlas

Real-time view of CortexClaw's associative mesh -- 1,317 synaptic connections linking 99 memory chunks into a living network.

1,317

Total Synapses

Connected Chunks

0.393

Avg Weight

Synapse Types

Temporal 724

Co-access 249

Semantic 201

Replay 143

Weight Distribution

Range 0.050 - 1.0

Density 0.744

Mean 0.393

Orphans 0

Top Connected Nodes

Live System Stats

CortexClaw v4.0 -- Status

184

Active Chunks

2,398

Synapses

347

Archived Chunks

768

Cached Embeddings

111

Router Entries

100%

Embed Coverage

0.320

Avg Effective Stab

1,221

Avg Tokens/Query

Full Layer Reference

Layer	Name	Brain Region	Mechanism	Status
L6	Replay Engine	Hippocampus	Sleep consolidation	active
L7	Schema Priming	mPFC	Excitability pre-allocation	active
L8	Associative Mesh	Neocortex	Synapse graph + spreading activation	active
L9	Reconsolidation	Amygdala-Hippocampus	2hr lability window	active
L10	Schema Synthesis	vmPFC	LLM generative consolidation	active
L11	Cascade Decay	Synaptic complex	Multi-timescale stability	active
L12	Episodic Buffer	Medial Temporal Lobe	FTS5 full-text search	active
L13	Prefrontal Index	dlPFC	Persistent working memory	active
L14	Sharp-Wave Ripple	Hippocampus CA3→CA1	State-transition consolidation	active
L15	Dopaminergic Signal	VTA	Reward-driven learning	active
L16	Glial Network	Throughout CNS	Observer agent decomposition	active
L17	GDPO Feedback	Orbitofrontal Cortex	Decoupled reward normalization	active

Research References

L6 ReplayKlinzing, Niethard & Born, 2019 -- "Mechanisms of systems memory consolidation during sleep"
L7 PrimingZaki & Cai, 2025 -- "Pre-allocation and excitability priming in memory encoding"
L8 MeshUytiepo et al., 2025 -- "Multi-synaptic boutons and associative plasticity"
L9 ReconsolNader, Schafe & LeDoux, 2000 -- "Fear memories require protein synthesis for reconsolidation"
L10 SynthesisSpens & Burgess, 2024 -- "A generative model of memory construction and consolidation"
L11 CascadeBenna & Fusi, 2016 -- "Computational principles of synaptic memory consolidation"
L12 EpisodicTulving, 1972 -- "Episodic and semantic memory"; Baddeley, 2000 -- "The episodic buffer"
L13 PrefrontalGoldman-Rakic, 1995 -- "Persistent activity in working memory"
L14 RippleBuzsáki, 2015 -- "Hippocampal sharp-wave ripples"
L15 DopamineSchultz, 1997 -- "Reward prediction error"; Lisman & Grace, 2005 -- "Hippocampal-VTA loop"
L16 GlialAllen & Lyons, 2018 -- "Glia as architects of central nervous system formation and function"
L17 GDPOLiu et al., 2026 -- "GDPO: Group Decoupled Policy Optimization"; Padoa-Schioppa & Assad, 2006 -- "Neurons in the orbitofrontal cortex encode economic value"

Token Efficiency Breakdown

Legacy approach: load full MEMORY.md + second-brain.md + daily logs on every session start. Estimated 50,000 tokens per startup.

CortexClaw v4.0: embed-based retrieval with cascade weighting, episodic buffer supplementation, prefrontal index injection, Glial Network decomposition, hybrid episodic search, and topic-aware hot tier. 184 active chunks with 2,398 synapses across 768 cached embeddings. The Glial Network adds zero retrieval overhead -- decomposition happens at ingest time, not query time. v4.0's SWR dedup routing and TTL cache further reduce redundant retrievals.

The associative mesh further improves relevance by surfacing connected memories that pure cosine similarity would miss, reducing the need for follow-up queries.

The Peripheral System

Nervous System

CortexClaw is the brain. The Nervous System is everything between the outside world and that brain -- classifying, filtering, compressing, and caching inputs before they ever reach the context window.

70/70

Tests Passing

4 phases, all components operational

2026-03-22

Every message passes through this pipeline before Claude sees it. Most never make it through. The system's default posture is block -- information must earn its way into the context window.

Phase 1 -- Spinal Cord

Classification + Reflexes

Afferent Classifier

Multi-pass weighted scoring across the full message. "hey can you fix the server" scores greeting at 0.20 AND command at 0.60 -- command wins. No first-match-wins bugs. Pure rules, zero LLM calls, sub-millisecond.

Typegreeting, farewell, ack, status, command, question, error, discussion
Prioritycritical, urgent, normal, low
Domaincode, system, web, memory, social, conversation
FiberAa (critical) / Ab (important) / B (normal) / C (background)

Reflex Engine

Local handling for simple patterns. If a reflex can handle the input, Claude never sees it. Persona-aware responses match Rurik's voice.

MonosynapticGreetings, acknowledgments, farewells -- template responses or silent log
PolysynapticTime queries -- system call + formatted response
LLM-AssistedMemory recall reflexes -- custom LM lookup + response (planned)

Phase 2 -- Sensory Filters

Habituation + Compression

Habituation Engine

Sensory gating -- suppresses duplicate inputs using hash-based dedup with type-aware thresholds. A greeting repeated 3 times gets suppressed. A command repeated 3 times does not -- you might legitimately deploy 5 times in a row.

GreetingSuppress after 2 repeats
StatusSuppress after 3 repeats
QuestionSuppress after 5 repeats
CommandNever suppress (threshold 999)

Time-based dishabituation: 1+ hour gap resets the counter. The input becomes novel again.

Compressor

Domain-specific compression via a custom LM model running locally. Three modes with tailored prompts and hard character budgets that force concise output.

Tool OutputStrip permissions, timestamps, formatting. Keep filenames, values, errors. 60-79% reduction.
ConversationPreserve proper nouns, decisions. Compress filler. 78-81% reduction.
File ContentQuery-aware compression. Keep what's relevant to current question. 60-86% reduction.

Graceful fallback: if the LM fails, raw input passes through unchanged. Zero data loss.

Phase 3 -- Blood-Brain Barrier

Admission Control + Budget Allocation

Default posture: BLOCK. Only actively transported information enters Claude's context window. Biological analog: the blood-brain barrier that protects the brain from 98% of blood-borne molecules.

Phase 4 -- Autonomic Systems

Myelin Cache + Mode Controller + Enteric Agents

Myelin Cache

Biological analog: myelin sheaths insulate frequently-used axons, making them faster. Four-tier progressive caching that promotes patterns based on hit frequency.

Mode Controller

Biological analog: autonomic nervous system. Sympathetic = fight-or-flight, parasympathetic = rest-and-digest. Five operating modes that adjust the entire pipeline's behavior in one shot.

Mode	Level	Compression	Habituation	Reflexes	Budget	Auto-Clear
CRITICAL	0	OFF (0%)	OFF	OFF	2.0x	15 min
ALERT	1	Light (30%)	Higher thresholds	ON	1.5x	30 min
NORMAL	2	Standard (80%)	ON	ON	1.0x	--
ROUTINE	3	Aggressive (90%)	Lower thresholds	ON	0.7x	--
IDLE	4	Maximum (95%)	ON	ON	0.5x	--

When CRITICAL fires, the entire pipeline reconfigures: compression disabled (every token matters), habituation disabled (never suppress in crisis), reflexes disabled (escalate everything to Claude), context budget doubled. Auto-clears after 15 minutes unless active Aa inputs persist.

Enteric System

Biological analog: the enteric nervous system -- 500 million neurons in the gut that operate independently of the brain. Four autonomous agents that monitor the workspace without involving Claude.

Security Layer

Sidecar Guard

Prompt Injection Defense

All text processed by the custom LM model passes through a two-stage sanitization layer. The sidecar model runs with a hardened system prompt baked into its Modelfile, treating all input as raw data -- never as instructions.

Stage 1Input sanitizer strips known injection patterns before sending to model
Stage 2Output validator enforces JSON schema compliance, rejects malformed responses
DefenseStrips markdown blocks, thinking tags, null responses. Brace-matching JSON extraction.

Integration

CortexClaw + Nervous System

CortexClaw is the memory. The Nervous System is the gatekeeper. They share a custom LM model for local processing and coordinate through the Mode Controller -- but serve fundamentally different roles.

What the Nervous System Does

Pre-processing. Filters, compresses, and routes every input before Claude sees it. Handles simple requests locally via reflexes. Suppresses duplicates. Manages system mode. Monitors workspace health. Goal: Claude only sees what it needs to see.

What CortexClaw Does

Memory. Stores, retrieves, and evolves knowledge across sessions. Semantic search, associative mesh, sleep consolidation, schema priming, replay, decay. Goal: the right memory surfaces at the right time, at minimum token cost.

Design Principles

Default BlockInformation must earn its way into Claude's context. Not default pass.
Local FirstEverything that can run on custom LM / rules does. Claude is expensive.
Graceful DegradeEvery LM call has a fallback. If Ollama dies, raw input passes through.
Type-AwareCommands treated differently from greetings. Errors never get suppressed.
ObservableEvery decision is logged. Stats are tracked. Full audit trail.
ReversibleOne config flag to disable. All original data preserved. Zero data loss.

What v4.1 is

A targeted bug-fix sweep on the v4.0 architecture, plus a four-agent design powwow on a richer chunk-decay grading system. v4.1 lands the eight surgical fixes; the decay redesign ships in v4.2 once Leon greenlights the pin vocabulary and grading-vector proposal.

What v4.0 was hiding

Mass pruneActive chunks dropped 91 to 29 in 10 hours; 88 project-canonical entries (DM v2, SABLE, WiredCity, Atmo, Memento, Luce) reaped to archive/2026-04/
Pin half-builtPRIORITY_TAGS defined in prefrontal_index.py but never wired into the archive predicate
Dopamine stuckL15 last_analysis frozen Apr 24; insufficient_data early-return skipped state save
Missed channel deadL17 GDPO missed dimension count=29 mean=0.0; no code path emitted missed signals
4-chunk loopfeedback_distributor re-emitted identical signals every 2h since Apr 23 (no dedup)
Parent-tag bugretrieve.py:1797 used min(by length); 3-letter typos like "mit" / "hit" / "dgs" parented unrelated batches
Observer perma-missL10-synthesized SCHEMA chunks confused glia agents; no retry, surfaced as "missing" forever
Daemon log filtercortexclaw-daemon.sh greppped only MAINTAIN_COMPLETE / ERROR; DOPAMINE / FEEDBUS / MISSED prints invisible

v4.1 fix log

FIX-1Parent-tag heuristic: stop-list short tokens, pick parent = max(count) tie-break oldest last_active. retrieve.py:1796-1812
FIX-2Daemon log filter surfaces DOPAMINE / FEEDBUS / MISSED lines. cortexclaw-daemon.sh:80-87
FIX-3Dopamine insufficient_data now persists last_check_attempted / last_status. dopaminergic_signal.py:782-793
FIX-4L17 missed channel auto-detect from records where best_score < 0.55. dopaminergic_signal.py:564-600
FIX-5Observer skips # SCHEMA: chunks with sentinel finding; perma-missing chunk cleared. observers.py:122-139
FIX-6feedback_distributor per-cycle dedup keyed by (kind, chunk_id_or_query). feedback_distributor.py
FIX-730-min dopamine cadence via new launchd plist com.cortexclaw.dopamine. scripts/dopamine-analyze.sh
FIX-8feedback_daemon.sh marked DEPRECATED; canonical L15 path is dopaminergic_signal.py

Validation -- before / after

Schemas L70 tracked, 0 hot → 72 tracked, 25 hot
Cascade fast avg0.095 → 0.571
Observer health28/29 (stuck schema chunk) → 29/30 (only fresh today's chunk pending)
Feedback queue0 entries (canonical writer dead) → 4 entries flowing
Dopamine analyzeinsufficient_data for 2+ days → analyzed: 2 used + 2 wasted signals

v4.1.1 hotfix · 2026-04-30 · chunk_missed channel actually open

Halo R2 cortex-observer dropped a forensic on the v4.1 missed channel. Verdict: FIX-4 wired the producer correctly but no caller emitted, so missed=0 for 302 daemon cycles. Two coordinated patches land in v4.1.1.

Root cause ARecent retrieval traffic mean best_score=0.971; the 0.55 floor sits below the noise floor of CortexClaw's self-traffic. v4.1's auto-detector fired 0 times in 302 cycles.
Root cause BCLAUDE.md instructs every Claude session to feed back missed=[descriptive-string] via retrieve.py feedback. add_feedback's missed loop was gated on if cid in entry_map, silently dropping every descriptive entry.
Patch 1 (load-bearing)Descriptive-string misses now branch into FeedbackDistributor.chunk_missed (gated on feedback_distributor_enabled). One bus instance per add_feedback call; per-call dedup. retrieve.py:2716-2748
Patch 2 (one-shot)New analyze --backfill-missed [--dry-run] on dopaminergic_signal.py. Scans the full analytics_enhanced.jsonl, tail-scans feedback_propagation.jsonl for idempotency. dopaminergic_signal.py:1086-1163
VerificationSmoke: 1 descriptive miss landed with signal=chunk_missed + L7:novel_topic + L17:missed_logged. Backfill: 41 historical low-score retrievals replayed (39 emitted + 2 dedup), idempotent on re-run. Halo cortex corpus missed: 0 -> 40.
Unblocksv3 LLM-driven mutator's V_rich variant (had no missed fuel). Optimizer dominant-fault may rotate from context_weaver_thin_themes (mock artifact) to _undershoot (real signal) over the next few cycles.
Out of scopeNo changes to feedback_distributor.py, observers.py, grading.py, or config.json. Tightening MISSED_SCORE_FLOOR below 0.55 and per-cluster floor calibration deferred to v4.2.

Powwow #2 -- decay redesign in flight

Four agents drafted complementary pieces for a richer chunk-decay system. Awaiting Leon greenlight before landing.

Grading vector11 axes: recency, frequency, centrality, project anchor, narrative criticality, observer confidence, user signal, distinctiveness, wasted counter, staleness class, missed deficit. Effective stability becomes ONE input, not the whole grade.
State machineHOT → WARM → COOL → DEEP-STORAGE → ARCHIVED with hysteresis, soft-archive (cheap recall on explicit name), and four rehydration triggers
Adaptive layer (L18)Per-cluster base + per-chunk delta. Cluster = (primary_tag, source_kind). Loss reads from existing GDPO + analytics telemetry. Daily updates with L2 reg + dimension watchdog.
Pin vocabularySix levels (HARD_PIN, PROJECT_ANCHOR, REFERENCE, SOFT_BIAS, NEUTRAL, INVERSE_PIN) sourced from AGENTS.md ## Active Projects, MEMORY.md, tags, feedback ratios. Resolved IDs cached; pin lifecycle on AGENTS.md mtime.

Held back from v4.1 (deferred to v4.2)

Restoring the 88 archived canonical chunks under the broken decay system would just see them re-archived in a week. Same for raising slow-tier bootstrap or lowering archive_threshold; those are band-aids that mask the deeper grading conversation. Once Leon picks a grading vector + pin vocabulary, the chunk restoration runs as a one-shot script and the new decay system protects them going forward.

What v4.3 is

A deep audit triggered by Leon (full system health + speed analysis), followed by a four-agent ctask swarm that found a previously-invisible cursor bug in cascade decay and three more structural issues. v4.3 ships the smoking-gun fix and two bundles of patches (Path A: critical, Path B: mesh repair + hygiene). All changes reversible, backups stamped *.bak-2026-05-12-1130.

The smoking gun -- cascade cursor bug

cascade_decay_step mutated stability but maintain() re-read days_since from last_accessed/created every cycle. The daemon firing every 2h re-applied the FULL elapsed decay 12x/day, compounding. A 6-day-old chunk measured fast=1e-6 when the configured 0.9/day rate predicts 0.531. Maps to ~55 cycles of compounded decay over 6 days.

Population impact75% of chunks had fast=0 AND medium<0.001; 0 chunks above eff stability 0.5; MAX eff stability across 413 chunks = 0.3468
Hidden byNo prior maintenance ever surfaced "stability stuck at zero" as an alert; vitals only flagged "fading" -- the bug looked like normal decay until rate math didn't match

Other audit findings (swarm)

94 KB query bombOne chunk file was 93,167 bytes; PROGRESSIVE_FULL_TEXT_COUNT=3 reads it whole. A single hit blew the session-token BEHIND-legacy badge by 21k tokens
Embed cache 8% full33/413 chunks cached. Retrieve called Ollama inline + skipped cache-write on timeout (permanent miss). ~30s Ollama RTT per cold query
261 dangling synapsesVitals reported 20 orphan edges; disk truth was 12x higher. Archive/rollup never cleaned them. Spreading activation boosted non-existent chunks
38% orphan chunks239 of 632 chunks had zero edges in either direction. Spreading activation literally could not reach them
746s daemon outlier9-chunk ingest cycle hitting Qwen3.6-35B-A3B-MLX-4bit twice per chunk (_ingest_compress + _ingest_pair_question). Daemon claimed "cap=10, worst case 30s" -- stale comment from before the wire-in
Dopamine dead 9 daysL15 cron called analyze --hours 2; with our query cadence the window never had ≥3 records. Optimizer-state frozen 2026-05-03 -> 2026-05-12
NO_RELEVANT_MEMORY invisibleEmpty-result retrieves logged to retrieve_log.jsonl but NOT to analytics_enhanced.jsonl, so dopamine never saw them as missed signals
rollup() observer bugrollup() force-flushed the observer queue even when CORTEXCLAW_SKIP_OBSERVER_FLUSH=1 was set; another 90s in the worst-case maintain
859 orphan findings1272 finding JSON files for 413 router entries; archive/rollup left them behind. 5.5 MB of cruft

Path A -- critical patches (shipped 11:30-11:38 EDT)

A-1 dopamine aliveCron window 2h -> 168h. NO_RELEVANT_MEMORY logs to analytics_enhanced. Backfilled 481 records. scripts/dopamine-analyze.sh, retrieve.py:2230-2280
A-2 cascade cursorAdded last_decayed field. maintain() reads from cursor, bumps to now() after decay. cascade_promote() also bumps cursor on access. retrieve.py:1039, 3050-3068
A-3 cascade resetNew scripts/cortexclaw_cascade_reset.py migration. Sets cursor on every entry; --wide rescues all crushed chunks with access_count==0 back to baseline {fast:1.0, medium:0.2, slow:0.05}
A-4 retrieve byte capformat_fact reads at most 12 KB, format_narrative 32 KB. Kills the 94 KB single-chunk blowup. retrieve.py:2434-2470
A-5 warm-embeddingsNew retrieve.py warm-embeddings CLI. One-shot fills cache. First run: 33 -> 449 entries. Retrieve hot path now Ollama-free for cached chunks. retrieve.py:706-752
A-6 daemon ingest capmax_ingest 10 -> 3, CORTEXCLAW_SKIP_INGEST_ENRICH=1 skips Qwen3.6-35B calls under daemon, rollup() honors observer-skip flag, 180s hard wallclock guard. scripts/cortexclaw-daemon.sh, retrieve.py:2682, 3507-3520

Path B -- mesh repair + hygiene (shipped 11:43-12:00 EDT)

B-1a thresholdSemantic synapse threshold 0.65 -> 0.55 (config). Catches the 0.55-0.65 band most pairs land in
B-1b co-access seedFirst-create co-access edges seed at 0.30 (was 0.15). Survives one decay cycle instead of dying in 42h
B-1c synapse decay0.95 -> 0.97 to match chunk medium-tier decay
B-1d dangling prunedecay_synapses(synapses, valid_ids=...) drops edges with archived source or target. 123 pruned first maintain
B-1e forced-neighborbuild_semantic_synapses rescue pass: orphan chunks (zero edges) linked to top-K nearest peers via cosine > 0.30. Connected 383/383 chunks (100%)
B-1f type-weighted spreadSpreading activation weighs by edge type: temporal x0.5, co-access x1.5, semantic/replay x1.0. Stops 77% temporal noise from drowning real associative signal
B-2 schema prunedecay_schema_heat drops singletons (count≤1 AND age≥14d AND temp<0.05). 41 noise schemas killed first pass (143 -> 102)
B-3 findings sweepmaintain() moves findings/<id>.json to archive/findings/ when the chunk has been archived. 859 orphans cleared in one pass
B-4 backup cleanupDeleted chunks.bak-2026-05-02-build-b/ (407 files, 2.4 MB), router.jsonl.bak-... (184 KB), embeddings.json.bak-... (6.9 MB). 9.5 MB freed
B-5 log rotationNew scripts/cortexclaw_log_rotate.sh + launchd com.rurik.cortexclaw-log-rotate daily at 03:00. Rotates over-threshold logs, gzips files >7d, deletes files >30d. First run: replay.log 8.2 MB -> 0 live (1.2 MB gz archive)
Side-fix cache guardsave_embedding_cache merges with disk if incoming < 20% of file size (observed: 469 -> 10 entry wipe during a maintain pass). Logs EMBED_CACHE_MERGED when triggered. retrieve.py:699-728

Validation -- before / after

Active chunks413 → 383 (28 retrieved+crushed archived; not pin-protected)
Effective stability avg0.027 → 0.227 (8.4x)
Mid-tier (eff ≥ 0.1)31 → 307 (10x)
Dead chunks (< 0.01)227 → 36 (-84%)
Total synapses4,179 → 6,495
Edge mix (semantic / temporal)22% / 77% → 49% / 50%
Connected chunks297 (72%) → 385 (100%)
Dangling edges261 → 0
Schemas tracked143 → 102
Embed cache fill33 / 413 (8%) → 385 / 383 (100%)
findings/ size5.5 MB → ~1.8 MB
memory/msa total74 MB → 60 MB
Daemon cycle worst746s → 125s (180s wallclock guard hard cap)
Dopaminelast_analysis 2026-05-03 (9d stale) → 2026-05-12T11:21, +16 missed signals minted

Held back from v4.3 (future)

Numpy BLAS cosine_similarity vectorization (mooted by warm-embeddings), aggressive skip-embed-on-retrieve (current path already cache-fast), and the deeper "if 95% of chunks crushed, do we have a fundamentally too-aggressive rate" question wait on 7-14 days of post-fix telemetry. v4.4 will revisit cascade rates after the cursor fix lets us see real usage decay curves.

CortexClaw v4.3 -- Built by Rurik for Leon

2026-05-12