CortexClaw v4.0

What Is CortexClaw?

What It Is

CortexClaw is a memory system for AI agents. It replaces flat text files with structured, searchable memory chunks that are retrieved on demand. Instead of loading everything every session, the agent queries CortexClaw and gets back only what's relevant.

Think of it as a personal search engine for an AI's memory.

How It Works With the Model

CortexClaw sits alongside the AI model (Claude, in our case) as external memory infrastructure. When the model needs to recall something, it calls CortexClaw's retrieve function. CortexClaw scores all memory chunks by semantic similarity, recency, feedback history, and associative connections, then returns the top results.

The model never loads the full memory -- only what the query pulls up. After the conversation, feedback flows back: which chunks were actually useful, which were noise. This feedback loop teaches the system to retrieve better over time.

What It's Built On

Pure Python. No frameworks, no dependencies beyond the standard library and urllib for HTTP calls. Runs on any machine with Python 3.10+. Embeddings are generated locally via Ollama (nomic-embed-text, 384-dim vectors) -- zero API cost.

The router index is a JSONL file (one line per chunk). Chunks are individual markdown files. The daemon runs as a launchd service on macOS, executing maintenance every 2 hours. Total codebase: ~3,700 lines across 6 Python modules.

What Makes It Different

Every layer is modeled after a specific brain mechanism with a cited neuroscience paper. Memories decay at multiple timescales (Benna & Fusi 2016). Replay during idle periods consolidates important memories (Klinzing et al. 2019). Associative connections form between co-accessed chunks (Uytiepo et al. 2025). A dopaminergic reward signal learns from retrieval feedback (Schultz 1997).

As of v4.0, nineteen improvements across all twelve layers: atomic WAL writes, dopamine-gated promotion, spreading activation tiebreakers, cosine reconsolidation, prediction-error replay, adaptive GDPO weights, schema hierarchy, hybrid episodic search, topic-aware hot tier, SWR dedup routing, observer reconciliation, adversarial self-test, and more. Generated via 3-agent review and weighted vote.

CortexClaw v4.0

A biologically-inspired memory system for AI agents. Twelve neural layers transform flat chunk storage into a living memory that consolidates, associates, and evolves over time.

97%
Deep Test Pass Rate
v4.0: 184 chunks / 2,398 synapses
19 improvements across 3 tiers
12
Neural Layers
60
Memory Chunks
1,317
Synapses
92%
Token Efficiency
The Problem

AI agents lose everything between sessions. Loading full memory files burns tokens and lacks prioritization. Flat storage treats all memories equally, missing the patterns that make memory useful.

The Solution

Model memory like the brain does. Fast-decaying attention for recent events, slow-building permanence for important patterns. Memories that fire together wire together. Sleep consolidates. Schemas generalize.

What CortexClaw Is

CortexClaw replaces the flat-file memory that most AI agents use. Instead of loading entire memory documents every session (burning thousands of tokens on irrelevant context), it breaks knowledge into small, searchable chunks and retrieves only what's needed.

Traditional Agent Memory

A single MEMORY.md file that grows forever. Every session loads the whole thing. 50KB of text, 90% irrelevant to the current conversation. No prioritization, no forgetting, no association between ideas.

MEMORY.md (47KB) ............ load every session
second-brain.md (12KB) ..... load every session
daily-logs/ (200+ files) .. never loaded
~60,000 tokens burned per session startup
CortexClaw Memory

Knowledge split into focused chunks (~200 words each), indexed by topic and tags, embedded for semantic search. Only relevant chunks are retrieved per query. Typical session loads 3-5 chunks instead of everything.

router.jsonl (index) ....... lightweight scan
chunks/ (37 files) ......... load on demand
embeddings (384-dim) ....... cosine similarity
~5,000 tokens per retrieval (92% savings)

v1.0 -- The Foundation Layer

Before the neural layers, CortexClaw v1.0 established the core infrastructure that everything builds on. These are the primitives.

Chunking + Router Index

Every piece of knowledge is stored as a chunk -- a small file with a topic, summary, tags, and content. The router is a lightweight index (one JSON line per chunk) that maps IDs to summaries and tags. The agent scans the router to decide which chunks to load, without reading every file.

Embedding Search

Each chunk gets a 384-dimensional vector embedding via nomic-embed-text running locally on Ollama (zero API cost). Retrieval computes cosine similarity between the query embedding and all chunk embeddings, returning the top-K most relevant. This is how the system answers "what do I know about X?" without keyword matching.

Fact / Narrative Split

Chunks have two levels: fact (compressed key points, fast to scan) and narrative (full context with reasoning and background). Quick lookups pull facts only. Deep dives pull both. This alone cuts token usage by ~40% for routine queries.

Feedback Loop

After every retrieval, the system logs which chunks were used (actually referenced in conversation), wasted (retrieved but ignored), and missed (needed but not retrieved). This feedback adjusts future retrieval scoring -- chunks that consistently get used rank higher, wasted chunks sink.

Rollups

When a tag accumulates too many chunks (threshold: 40), the system triggers a rollup -- merging older, lower-stability chunks into a single consolidated chunk. This keeps the total chunk count manageable while preserving the important information. Think of it as compressing old memories into summaries.

Daemon Mode

A background process runs every 2 hours (configurable), executing maintenance: decay calculations, replay cycles, synapse building, and archival. This is the system's "sleep" -- the offline consolidation that makes memories sharper over time. Heartbeat checks trigger it, or it runs via cron.

v1 to v4.0

v1 gave CortexClaw the ability to store, search, and maintain memories efficiently. But it still treated every memory as independent -- no associations, no variable decay rates, no pattern recognition. v2 added six neural layers on top of this foundation. v3.x extended to twelve layers with scoring precision fixes, episodic memory, persistent working memory, consolidation triggers, reward-driven learning, the Glial Network, and L17 GDPO Feedback. v4.0 adds nineteen improvements across three tiers: atomic WAL writes, dopamine-gated promotion, spreading activation tiebreakers, cosine reconsolidation, synthesis validation, prediction-error replay, semantic relevance scoring, adaptive GDPO weights, sigmoid Eq8 gate, active demotion, schema hierarchy, hybrid episodic search, topic-aware hot tier, SWR dedup routing, observer reconciliation with quality gate, TTL cache, and adversarial self-test.

The Neural Layers

Twelve layers built on the v1 foundation. Each addresses a specific biological memory mechanism. Together, they create a system where memories compete, consolidate, and evolve -- just like neurons do.

LAYER ARCHITECTURE L11 Cascade Decay fast / medium / slow tiers L8 Associative Mesh synapse graph + spreading activation L9 Reconsolidation 2hr lability window L7 Schema Priming topic heat maps L6 Replay Engine sleep consolidation L10 Schema Synthesis LLM generative consolidation L12 Episodic Buffer FTS5 full-text search L13 Prefrontal Index persistent working memory L14 Sharp-Wave Ripple state-transition consolidation L15 Dopaminergic Signal reward-driven learning L16 Glial Network ingest-time decomposition v3.1 — v4.0 ADDITIONS DATA FLOW INGEST L16 GLIAL decompose L9 + L7 reconsol + prime L14 RIPPLE extract on compress STORE RETRIEVE L13 INJECT prefrontal L8 SPREAD + L12 episodic L15 SIGNAL feedback loop RETURN solid = entry/exit points | dashed = processing stages | arrows = data flow direction

Multi-Timescale Stability

Inspired by Benna & Fusi 2016 (synaptic complexity theory). Instead of one decay number, each memory has three stability tiers that decay at different rates -- like how the brain has fast synaptic changes and slow structural ones.

STABILITY DAYS SINCE LAST ACCESS 1.0 0.8 0.6 0.4 0.2 0.0 0 5 10 15 20 25 archive FAST (0.90/day) MEDIUM (0.97/day) SLOW (0.995/day)
How It Works

Every memory has three stability tiers: fast (volatile, working memory), medium (session-stable), and slow (permanent knowledge). Each decays at its own rate per day.

The effective stability is a weighted blend: 20% fast + 30% medium + 50% slow. This means the slow tier dominates long-term survival -- a memory must prove its worth over time to persist.

Spaced Repetition

When a memory is accessed, its fast tier resets to 1.0. But the real magic is promotion: each access transfers 5% from fast to medium, and 1% of medium to slow.

A 1-hour cooldown prevents rapid-fire access from gaming the system. Only spaced, genuine retrievals build long-term stability -- just like spaced repetition in human learning.

Memories That Wire Together

Inspired by multi-synaptic boutons research (Uytiepo et al. 2025). Memories form weighted connections through three mechanisms: co-access, semantic similarity, and temporal proximity. Retrieving one memory primes related memories through spreading activation.

model config hard- ware opus switch install stack work- flow people github pixel slack co-access semantic temporal spread activation
Three Synapse Types

Co-access: When two memories are retrieved together, a synapse forms between them (+0.15 weight each time). The more they co-occur, the stronger the link.

Semantic: During maintenance, chunks sharing tags are compared by cosine similarity. If above 0.65, a semantic synapse is created.

Temporal: Chunks ingested within 1 hour of each other get weak temporal links (0.20 weight), capturing contextual proximity.

Spreading Activation

During retrieval, after the initial top-K scoring, spreading activation kicks in: each top result sends a signal through its synapses, boosting connected chunks that might not have scored high on their own.

This means asking about "hardware" can pull in "model config" if they've been frequently co-accessed -- just like how thinking about one topic naturally reminds you of related ones.

Synapse weight decays at 0.95/day and pruned below 0.05. Max 15 connections per chunk to prevent noise.

Update, Don't Duplicate

Inspired by Nader et al. 2000 (memory reconsolidation). When a memory is retrieved, it enters a labile state for 2 hours. If new content arrives with overlapping tags, it merges into the existing chunk rather than creating a new one.

RETRIEVE t=0 LABILE WINDOW (2 HOURS) NEW INGEST tags overlap >= 40% RECONSOLIDATE window closed NEW INGEST NEW CHUNK Within window: merge facts, union tags, invalidate embedding Outside window: create new chunk as normal
Why This Matters

Without reconsolidation, asking "remember that the model is Opus" twice creates two nearly-identical chunks. With reconsolidation, the second fact merges into the existing chunk if it was recently accessed.

The Jaccard overlap of tags must be at least 40% to trigger a merge -- this prevents unrelated facts from contaminating each other. After reconsolidation, the chunk's embedding is invalidated and will be re-computed on next retrieval.

The chunk also gets a cascade promotion on merge, since the brain treats reconsolidation as reinforcement.

Hot Topics Get Priority

Inspired by Zaki & Cai 2025 (excitability priming / pre-allocation). The system tracks which topic clusters are currently "hot" -- being actively retrieved or ingested. New memories matching hot schemas get higher initial stability, while truly novel memories get flagged for priority replay.

SCHEMA HEAT MAP model 0.85 config 0.72 anthropic 0.58 ollama 0.35 hardware 0.22 pixel-art 0.08 NEW MEMORY ARRIVES tags: model, anthropic matches hot schema! medium: 0.35 (boosted) feedback: 1.10 tags: weather, tropical no schema match medium: 0.20 (default) flagged: NOVEL (L6 priority)
The Two Paths

Schema match: If new memory tags overlap with hot schemas (temperature above 0.30), the memory starts with boosted medium-tier stability (0.35 vs 0.20) and higher feedback score (1.10 vs 1.00). This reflects the brain's tendency to integrate new info faster when it fits existing mental models.

Novel memory: If no schema matches, the memory is flagged as "novel" and gets priority in the replay engine (L6). Novel memories need more consolidation passes because they don't have an existing framework to attach to -- the brain processes truly new information differently from familiar-pattern information.

Schema temperature decays roughly 50% every 12 hours. Dead schemas (below 0.01) are pruned automatically during maintenance.

Sleep Consolidation

Inspired by Klinzing et al. 2019 (memory replay during sleep). During daemon maintenance cycles (every 2 hours), the system "replays" the 10 most important recent memories -- re-embedding them, strengthening their synapses, and gently promoting them through the cascade.

REPLAY CYCLE (EVERY 2H) PHASE 1: SELECT Score each memory by: recency max(0, 1 - hrs/48) salience tags * 0.20 novelty 0.40 if novel (L7) gap (fast-slow) * 0.2 PHASE 2: RE-EMBED For each candidate: 1. Read full chunk content 2. Recompute embedding 3. Update embed cache (refreshes semantic rep) PHASE 3: STRENGTHEN Pairwise similarity check: if cosine >= 0.50: create replay synapse weight = sim * 0.15 (lower bar than awake) PHASE 4: CASCADE PROMOTE Normal: medium += 0.02 slow += 0.005 * medium Salient: medium += 0.12 slow += 0.015 Salient tags: urgent, important, error, breakthrough, critical, fix, decision, bug, lesson, warning
Why Replay Matters

In the brain, sleep replay is when the hippocampus "replays" recent experiences to the neocortex, transferring short-term memories into long-term storage. CortexClaw simulates this during its daemon maintenance cycles.

Replay promotion is deliberately gentler than direct access (+0.02 medium vs +0.05 for real retrieval). This prevents the daemon from artificially inflating memories that were never actually useful. The system also uses a lower similarity threshold during replay (0.50 vs 0.65 for semantic synapses) because the brain is more associative during sleep, forming connections it wouldn't make while "awake."

From Facts to Principles

Inspired by Spens & Burgess 2024 (generative model of memory). During rollup merges, a local LLM distills merged chunks into generalized behavioral patterns and principles -- not fact lists, but schemas. These schema chunks decay slower and start with higher baseline stability.

CHUNK A Leon tests with Lana first then teaches Rurik CHUNK B SSH through Lana to install on Rurik's box CHUNK C Bot can't initiate DMs 403 on first contact ROLLUP tag overlap >= 60% LLM SYNTH Custom LM Model temp: 0.3 max: 200 tokens SCHEMA CHUNK Leon uses a staged deployment workflow where features are validated on one agent before propagating to others via remote installation. medium: 0.60 | slow: 0.30
Schema vs Fact

Regular chunks store facts: specific details, configurations, names, dates. Schema chunks store patterns: generalized behavioral rules, workflow principles, recurring preferences.

The LLM prompt explicitly asks for 1-3 concise sentences capturing behavioral patterns, not bullet lists. Temperature is kept low (0.3) for reliability.

Schema Durability

Schema chunks start with higher baseline stability (medium=0.60 vs 0.20 for regular chunks, slow=0.30 vs 0.05). They also use slower decay rates across all tiers.

This reflects how the brain treats generalized knowledge: specific episode details fade, but the patterns extracted from them persist far longer.

Scoring Quality Fixes

Four targeted fixes that eliminated scoring artifacts and noise, taking retrieval quality from good to surgical.

Relative Gap Off-Topic Threshold

Replaced the absolute MIN_SCORE_THRESHOLD=0.55 with a relative pre-spreading check. Now requires top score ≥ 0.72 and gap ≥ 0.03 between top results. Catches off-topic queries that the old absolute threshold missed because spreading activation could inflate scores past 0.55.

Score Ceiling Cap at 1.0

Enforces a hard ceiling at 1.0 on all scores -- both pre and post spreading activation. Previously, synapse boosting could push scores above 1.0, creating misleading confidence signals and breaking relative ranking between results.

Schema Heat Quality Gate

Only the top-2 retrieval results update schema heat. Previously, all returned results warmed their schemas, which meant low-relevance tail results were polluting the heat map and causing schema drift on unrelated topics.

Temporal Synapse Tag-Shared Filter

Temporal synapses now require ≥ 1 shared tag between chunks. Killed approximately 36% of noise synapses that were forming between temporally proximate but semantically unrelated chunks.

91%
Top-1 Accuracy
1.000
MRR
100%
Off-Topic Rejection
-36%
Noise Synapses Killed

Raw Memory Traces

Inspired by Tulving 1972 (episodic memory) and Baddeley 2000 (episodic buffer). The episodic buffer maintains vivid, temporally-tagged traces that semantic memory (chunks) distills from. SQLite + FTS5 full-text search provides instant recall of raw conversation history.

How It Works

Integrated directly into retrieve() -- every CortexClaw query gets supplemental episodic hits appended alongside semantic chunk results. The FTS5 engine searches raw conversation text using porter stemming, catching exact phrases and context that embedding search misses.

The maintain cycle auto-syncs new daily logs, keeping the episodic buffer current without manual intervention.

Why It Matters

Semantic chunks are distilled and compressed -- great for patterns, but they lose the raw texture of conversations. The episodic buffer preserves the vivid, temporally-tagged traces that chunks were derived from.

When you need "what did Leon say about X last Tuesday?" rather than "what does the system know about X?", the episodic buffer delivers. Full-text search complements embedding similarity with exact phrase matching.

Persistent Working Memory

Inspired by Goldman-Rakic 1995 (persistent activity in working memory). A frozen hot tier -- a small guaranteed-injected block that bypasses retrieval entirely. Like dlPFC persistent neural firing that maintains task-critical information without requiring reactivation.

Auto-Refresh

The prefrontal index refreshes during maintain cycles. Entries are scored by a composite of schema heat, access count, cascade stability, and feedback signals. The top 10 entries are frozen into the index.

Supports manual pinning -- critical entries can be locked in place regardless of scoring. Identity and key relationship information is always pinned.

The dlPFC Analogy

In the brain, the dorsolateral prefrontal cortex maintains persistent neural firing patterns for task-critical information -- your name, what you're working on, who you're talking to. This information doesn't need to be "remembered" each time; it's always active.

The prefrontal index does the same thing: the most essential context is pre-loaded into every session, ensuring the agent never needs to search for its own identity or current priorities.

State-Transition Consolidation

Inspired by Buzsáki 2015 (hippocampal sharp-wave ripples). A pre-compression flush trigger that fires a consolidation pass before context compaction, extracting decisions, facts, action items, corrections, and insights before they're lost.

CONTEXT FULL compaction needed L14 RIPPLE extract before compress FAST PATH (RULE-BASED) pattern match: 80% of cases DEEP PATH (AGENT) sessions_spawn for nuance INGEST → CortexClaw RULE-BASED FAST PATH PATTERNS: URLs • config values • version numbers • decision keywords • action keywords No LLM needed for 80% of extraction cases. Instant, zero cost. Deep path fires via Clawdbot sessions_spawn for nuanced extraction of insights and corrections.

Reward-Driven Learning

Inspired by Schultz 1997 (reward prediction error) and Lisman & Grace 2005 (hippocampal-VTA loop). The reward signal that makes the whole system learn from experience -- auto-generating feedback from retrieval patterns to drive decay tuning, tag expansion, and synapse strengthening.

Feedback Loop

Every retrieval generates implicit feedback: chunks that appear in the agent's response are marked "used" (positive reward). Chunks retrieved but never referenced are marked "wasted" (negative signal).

These signals feed back into the cascade decay system -- used chunks get stability boosts, wasted chunks get accelerated decay. Over time, the system naturally surfaces useful memories and buries noise.

Enhanced Analytics

v4.0 adds granular tracking: per-query chunk access patterns, conversation clustering, feedback-to-decay integration, prediction-error replay (L6), and proactive synthesis (L10). This creates a rich signal that goes beyond simple used/wasted binary.

The VTA loop analogy is precise: dopamine neurons fire when outcomes exceed expectations (chunk was useful) and suppress when outcomes disappoint (chunk was irrelevant). The prediction error drives learning.

The Silent Processors

Inspired by Allen & Lyons 2018 (glia as architects of CNS formation). Three specialized observer agents decompose every memory at ingest time, extracting structured facts, contextual patterns, and emotional valence. Like glial cells in the brain -- long dismissed as passive scaffolding, now known to actively modulate synapses, regulate neurotransmitter uptake, and coordinate neural activity across regions.

GLIAL DECOMPOSITION PIPELINE NEW CHUNK raw memory at ingest ASTROCYTE Fact Hunter entities, facts, relationships OLIGODENDROCYTE Context Weaver patterns, implications, connections MICROGLIA Emotion Tagger valence, urgency, threat/reward ENRICHED chunk stored with structured metadata All three agents run in parallel on a custom LM model -- ~10 seconds total per chunk, zero API cost
The Three Glial Agents

Astrocytes (Fact Hunter): Like astrocytes providing structural and metabolic support to neurons, this agent extracts the hard facts -- entities, configuration values, names, technical details, and relationships between them.

Oligodendrocytes (Context Weaver): Like oligodendrocytes wrapping axons in myelin to speed signal propagation, this agent wraps raw facts in context -- identifying patterns, implications, and connections to existing knowledge.

Microglia (Emotion Tagger): Like microglia surveilling the CNS for threats and damage, this agent monitors for emotional valence, urgency signals, and motivational context -- marking memories that carry threat, reward, or importance signals.

Why It Matters

Raw chunks are flat text. The Glial Network transforms them into structured, multi-dimensional representations before they enter the memory system. This means retrieval can match not just on content, but on extracted entities, identified patterns, and emotional context.

The biological parallel is precise: glial cells outnumber neurons roughly 1:1 in the human brain. They don't fire action potentials, but nothing works without them. They modulate synaptic transmission, clear neurotransmitters, maintain the blood-brain barrier, and guide neural development. CortexClaw's Glial Network does the same preprocessing work that makes downstream neural operations (retrieval, replay, reconsolidation) more effective.

Decoupled Reward Normalization

Inspired by Liu et al. 2026 (GDPO) and Padoa-Schioppa & Assad 2006 (multi-attribute value coding in OFC). Previously, the feedback system collapsed three independent signals into a single scalar score -- losing information and rewarding the wrong things. L17 separates, normalizes, and gates each dimension independently.

Decoupled Normalization + Reward Conditioning Gate

Previously, the feedback system collapsed three independent signals (used, wasted, missed) into a single scalar score. This caused information loss -- a chunk that was heavily used and frequently wasted would get the same score as a chunk that was moderately used and never wasted.

L17 tracks each dimension independently, normalizes per-dimension before combining with explicit weights (50 / 35 / 15), and applies a reward conditioning gate: if a chunk's wasted score exceeds the threshold, its used reward is zeroed entirely. This is the "don't reward efficiency unless correctness is met first" pattern from the GDPO paper.

2 chunks are currently gated.

RCL-Inspired Additions
  • Batch Consensus Signals scaled by agreement ratio. If 1/5 queries wasted a chunk, penalty attenuates. If 5/5 did, full penalty applies.
  • Optimizer State Rolling ledger across maintain cycles. Detects oscillation (boost then penalize within 2 cycles) and dampens by 50%. Tracks optimization phase.
  • Contrastive Analysis Finds chunks with high score variance across similar queries. Flags for tag narrowing instead of delete -- preserving knowledge while improving precision.

Synapse Atlas

Real-time view of CortexClaw's associative mesh -- 1,317 synaptic connections linking 99 memory chunks into a living network.

1,317
Total Synapses
99
Connected Chunks
0.393
Avg Weight
4
Synapse Types
Synapse Types
Temporal 724
Co-access 249
Semantic 201
Replay 143
Weight Distribution
Range 0.050 — 1.0
Density 0.744
Mean 0.393
Orphans 0
MOST CONNECTED NODES hub skill-creator 66 bluebubbles 61 coding-agent 56 weather 54 notion-api 52 NODE SIZE = CONNECTION COUNT LINE OPACITY = AVERAGE SYNAPSE WEIGHT DASHED = CROSS-NODE CONNECTIONS

CortexClaw v4.0 -- Status

184
Active Chunks
2,398
Synapses
347
Archived Chunks
768
Cached Embeddings
111
Router Entries
100%
Embed Coverage
0.320
Avg Effective Stab
1,221
Avg Tokens/Query
Full Layer Reference
Layer Name Brain Region Mechanism Status
L6 Replay Engine Hippocampus Sleep consolidation active
L7 Schema Priming mPFC Excitability pre-allocation active
L8 Associative Mesh Neocortex Synapse graph + spreading activation active
L9 Reconsolidation Amygdala-Hippocampus 2hr lability window active
L10 Schema Synthesis vmPFC LLM generative consolidation active
L11 Cascade Decay Synaptic complex Multi-timescale stability active
L12 Episodic Buffer Medial Temporal Lobe FTS5 full-text search active
L13 Prefrontal Index dlPFC Persistent working memory active
L14 Sharp-Wave Ripple Hippocampus CA3→CA1 State-transition consolidation active
L15 Dopaminergic Signal VTA Reward-driven learning active
L16 Glial Network Throughout CNS Observer agent decomposition active
L17 GDPO Feedback Orbitofrontal Cortex Decoupled reward normalization active
Research References
  • L6 ReplayKlinzing, Niethard & Born, 2019 -- "Mechanisms of systems memory consolidation during sleep"
  • L7 PrimingZaki & Cai, 2025 -- "Pre-allocation and excitability priming in memory encoding"
  • L8 MeshUytiepo et al., 2025 -- "Multi-synaptic boutons and associative plasticity"
  • L9 ReconsolNader, Schafe & LeDoux, 2000 -- "Fear memories require protein synthesis for reconsolidation"
  • L10 SynthesisSpens & Burgess, 2024 -- "A generative model of memory construction and consolidation"
  • L11 CascadeBenna & Fusi, 2016 -- "Computational principles of synaptic memory consolidation"
  • L12 EpisodicTulving, 1972 -- "Episodic and semantic memory"; Baddeley, 2000 -- "The episodic buffer"
  • L13 PrefrontalGoldman-Rakic, 1995 -- "Persistent activity in working memory"
  • L14 RippleBuzsáki, 2015 -- "Hippocampal sharp-wave ripples"
  • L15 DopamineSchultz, 1997 -- "Reward prediction error"; Lisman & Grace, 2005 -- "Hippocampal-VTA loop"
  • L16 GlialAllen & Lyons, 2018 -- "Glia as architects of central nervous system formation and function"
  • L17 GDPOLiu et al., 2026 -- "GDPO: Group Decoupled Policy Optimization"; Padoa-Schioppa & Assad, 2006 -- "Neurons in the orbitofrontal cortex encode economic value"
Token Efficiency Breakdown

Legacy approach: load full MEMORY.md + second-brain.md + daily logs on every session start. Estimated 50,000 tokens per startup.

CortexClaw v4.0: embed-based retrieval with cascade weighting, episodic buffer supplementation, prefrontal index injection, Glial Network decomposition, hybrid episodic search, and topic-aware hot tier. 184 active chunks with 2,398 synapses across 768 cached embeddings. The Glial Network adds zero retrieval overhead -- decomposition happens at ingest time, not query time. v4.0's SWR dedup routing and TTL cache further reduce redundant retrievals.

The associative mesh further improves relevance by surfacing connected memories that pure cosine similarity would miss, reducing the need for follow-up queries.

Nervous System

CortexClaw is the brain. The Nervous System is everything between the outside world and that brain -- classifying, filtering, compressing, and caching inputs before they ever reach the context window.

70/70
Tests Passing
4 phases, all components operational
2026-03-22
CNS PIPELINE FLOW RAW INPUT P4 Myelin Cache hit? return in sub-ms CACHED miss P1 Classifier type, priority, fiber, domain P4 Mode Controller 5 DEFCON-like operating levels P1 Reflex Engine handle locally without Claude REFLEXED P2 Habituation suppress duplicate inputs P2 Compressor 60-86% token reduction via custom LM P3 Blood-Brain Barrier default BLOCK. earn admission. CLAUDE CONTEXT WINDOW ENTERIC SYSTEM async, zero Claude tokens GitGut CronBile MetricMucs FilePerist findings -> mode controller PHASE 4 PHASE 1 PHASE 4 PHASE 1 PHASE 2 PHASE 2 PHASE 3

Every message passes through this pipeline before Claude sees it. Most never make it through. The system's default posture is block -- information must earn its way into the context window.

Classification + Reflexes

Afferent Classifier

Multi-pass weighted scoring across the full message. "hey can you fix the server" scores greeting at 0.20 AND command at 0.60 -- command wins. No first-match-wins bugs. Pure rules, zero LLM calls, sub-millisecond.

  • Typegreeting, farewell, ack, status, command, question, error, discussion
  • Prioritycritical, urgent, normal, low
  • Domaincode, system, web, memory, social, conversation
  • FiberAa (critical) / Ab (important) / B (normal) / C (background)
Reflex Engine

Local handling for simple patterns. If a reflex can handle the input, Claude never sees it. Persona-aware responses match Rurik's voice.

  • MonosynapticGreetings, acknowledgments, farewells -- template responses or silent log
  • PolysynapticTime queries -- system call + formatted response
  • LLM-AssistedMemory recall reflexes -- custom LM lookup + response (planned)
NERVE FIBER CLASSIFICATION Critical max budget, never suppress Important commands, errors, questions B Normal discussion, general input C Background greetings, acks, social FASTEST SLOWEST processing priority + budget allocation

Habituation + Compression

Habituation Engine

Sensory gating -- suppresses duplicate inputs using hash-based dedup with type-aware thresholds. A greeting repeated 3 times gets suppressed. A command repeated 3 times does not -- you might legitimately deploy 5 times in a row.

  • GreetingSuppress after 2 repeats
  • StatusSuppress after 3 repeats
  • QuestionSuppress after 5 repeats
  • CommandNever suppress (threshold 999)

Time-based dishabituation: 1+ hour gap resets the counter. The input becomes novel again.

Compressor

Domain-specific compression via a custom LM model running locally. Three modes with tailored prompts and hard character budgets that force concise output.

  • Tool OutputStrip permissions, timestamps, formatting. Keep filenames, values, errors. 60-79% reduction.
  • ConversationPreserve proper nouns, decisions. Compress filler. 78-81% reduction.
  • File ContentQuery-aware compression. Keep what's relevant to current question. 60-86% reduction.

Graceful fallback: if the LM fails, raw input passes through unchanged. Zero data loss.

Admission Control + Budget Allocation

Default posture: BLOCK. Only actively transported information enters Claude's context window. Biological analog: the blood-brain barrier that protects the brain from 98% of blood-borne molecules.

TRANSPORT CLASSIFICATION GLUT1 -- ALWAYS ADMIT current user message active task state critical errors, persona files AMINO -- SELECTIVE retrieved memories compressed summaries relevant code fragments BLOCKED -- 98%+ OF RAW raw tool output old history, unchanged files redundant system messages CVO BYPASS User says "show full output" / error debug mode / memory maintenance / admin commands HOMUNCULUS BUDGET Dynamic context allocation by type + priority + mode multiplier Measured: 97.3% raw token reduction at the admission gate

Myelin Cache + Mode Controller + Enteric Agents

Myelin Cache

Biological analog: myelin sheaths insulate frequently-used axons, making them faster. Four-tier progressive caching that promotes patterns based on hit frequency.

L4 COLD first seen, tracking freq 3 hits L3 WARM template match, skeleton 8 hits L2 HOT exact match, ~1ms 18 hits L1 HARDCODED rule-level, sub-ms Decay: L3 demotes after 24h idle / L2 after 72h / L1 after 7 days Capacity: L1:50 / L2:200 / L3:500 / L4:1000 -- LRU eviction when full
Mode Controller

Biological analog: autonomic nervous system. Sympathetic = fight-or-flight, parasympathetic = rest-and-digest. Five operating modes that adjust the entire pipeline's behavior in one shot.

Mode Level Compression Habituation Reflexes Budget Auto-Clear
CRITICAL 0 OFF (0%) OFF OFF 2.0x 15 min
ALERT 1 Light (30%) Higher thresholds ON 1.5x 30 min
NORMAL 2 Standard (80%) ON ON 1.0x --
ROUTINE 3 Aggressive (90%) Lower thresholds ON 0.7x --
IDLE 4 Maximum (95%) ON ON 0.5x --

When CRITICAL fires, the entire pipeline reconfigures: compression disabled (every token matters), habituation disabled (never suppress in crisis), reflexes disabled (escalate everything to Claude), context budget doubled. Auto-clears after 15 minutes unless active Aa inputs persist.

Enteric System

Biological analog: the enteric nervous system -- 500 million neurons in the gut that operate independently of the brain. Four autonomous agents that monitor the workspace without involving Claude.

ENTERIC AGENTS GitGut uncommitted changes branch state, recent commits CronBile log error scanning log size rotation alerts MetricMucs disk, memory, load avg Ollama service health FilePerist workspace file changes new / deleted / modified INFO (logged) · NOTICE (next heartbeat) · WARNING (next opportunity) · ALERT (immediate escalation to mode controller) runs on heartbeats -- zero Claude tokens -- escalations feed into mode controller

Sidecar Guard

Prompt Injection Defense

All text processed by the custom LM model passes through a two-stage sanitization layer. The sidecar model runs with a hardened system prompt baked into its Modelfile, treating all input as raw data -- never as instructions.

  • Stage 1Input sanitizer strips known injection patterns before sending to model
  • Stage 2Output validator enforces JSON schema compliance, rejects malformed responses
  • DefenseStrips markdown blocks, thinking tags, null responses. Brace-matching JSON extraction.

CortexClaw + Nervous System

CortexClaw is the memory. The Nervous System is the gatekeeper. They share a custom LM model for local processing and coordinate through the Mode Controller -- but serve fundamentally different roles.

SYSTEM INTEGRATION OUTSIDE WORLD NERVOUS SYSTEM classify -> reflex -> habituate -> compress -> BBB gate 97.3% filtered survivors only CLAUDE retrieve memories CORTEXCLAW -- 17 NEURAL LAYERS SHARED INFRA Custom LM + Ollama nomic-embed-text
What the Nervous System Does

Pre-processing. Filters, compresses, and routes every input before Claude sees it. Handles simple requests locally via reflexes. Suppresses duplicates. Manages system mode. Monitors workspace health. Goal: Claude only sees what it needs to see.

What CortexClaw Does

Memory. Stores, retrieves, and evolves knowledge across sessions. Semantic search, associative mesh, sleep consolidation, schema priming, replay, decay. Goal: the right memory surfaces at the right time, at minimum token cost.

Design Principles
  • Default BlockInformation must earn its way into Claude's context. Not default pass.
  • Local FirstEverything that can run on custom LM / rules does. Claude is expensive.
  • Graceful DegradeEvery LM call has a fallback. If Ollama dies, raw input passes through.
  • Type-AwareCommands treated differently from greetings. Errors never get suppressed.
  • ObservableEvery decision is logged. Stats are tracked. Full audit trail.
  • ReversibleOne config flag to disable. All original data preserved. Zero data loss.

CortexClaw v4.0 -- Built by Rurik for Leon

2026-04-15