CortexClaw is a memory system for AI agents. It replaces flat text files with structured, searchable memory chunks that are retrieved on demand. Instead of loading everything every session, the agent queries CortexClaw and gets back only what's relevant.
Think of it as a personal search engine for an AI's memory.
CortexClaw sits alongside the AI model (Claude, in our case) as external memory infrastructure. When the model needs to recall something, it calls CortexClaw's retrieve function. CortexClaw scores all memory chunks by semantic similarity, recency, feedback history, and associative connections, then returns the top results.
The model never loads the full memory -- only what the query pulls up. After the conversation, feedback flows back: which chunks were actually useful, which were noise. This feedback loop teaches the system to retrieve better over time.
Pure Python. No frameworks, no dependencies beyond the standard library and urllib for HTTP calls. Runs on any machine with Python 3.10+. Embeddings are generated locally via Ollama (nomic-embed-text, 384-dim vectors) -- zero API cost.
The router index is a JSONL file (one line per chunk). Chunks are individual markdown files. The daemon runs as a launchd service on macOS, executing maintenance every 2 hours. Total codebase: ~3,700 lines across 6 Python modules.
Every layer is modeled after a specific brain mechanism with a cited neuroscience paper. Memories decay at multiple timescales (Benna & Fusi 2016). Replay during idle periods consolidates important memories (Klinzing et al. 2019). Associative connections form between co-accessed chunks (Uytiepo et al. 2025). A dopaminergic reward signal learns from retrieval feedback (Schultz 1997).
As of v4.0, nineteen improvements across all twelve layers: atomic WAL writes, dopamine-gated promotion, spreading activation tiebreakers, cosine reconsolidation, prediction-error replay, adaptive GDPO weights, schema hierarchy, hybrid episodic search, topic-aware hot tier, SWR dedup routing, observer reconciliation, adversarial self-test, and more. Generated via 3-agent review and weighted vote.
A biologically-inspired memory system for AI agents. Twelve neural layers transform flat chunk storage into a living memory that consolidates, associates, and evolves over time.
AI agents lose everything between sessions. Loading full memory files burns tokens and lacks prioritization. Flat storage treats all memories equally, missing the patterns that make memory useful.
Model memory like the brain does. Fast-decaying attention for recent events, slow-building permanence for important patterns. Memories that fire together wire together. Sleep consolidates. Schemas generalize.
CortexClaw replaces the flat-file memory that most AI agents use. Instead of loading entire memory documents every session (burning thousands of tokens on irrelevant context), it breaks knowledge into small, searchable chunks and retrieves only what's needed.
A single MEMORY.md file that grows forever. Every session loads the whole thing. 50KB of text, 90% irrelevant to the current conversation. No prioritization, no forgetting, no association between ideas.
Knowledge split into focused chunks (~200 words each), indexed by topic and tags, embedded for semantic search. Only relevant chunks are retrieved per query. Typical session loads 3-5 chunks instead of everything.
Before the neural layers, CortexClaw v1.0 established the core infrastructure that everything builds on. These are the primitives.
Every piece of knowledge is stored as a chunk -- a small file with a topic, summary, tags, and content. The router is a lightweight index (one JSON line per chunk) that maps IDs to summaries and tags. The agent scans the router to decide which chunks to load, without reading every file.
Each chunk gets a 384-dimensional vector embedding via nomic-embed-text running locally on Ollama (zero API cost). Retrieval computes cosine similarity between the query embedding and all chunk embeddings, returning the top-K most relevant. This is how the system answers "what do I know about X?" without keyword matching.
Chunks have two levels: fact (compressed key points, fast to scan) and narrative (full context with reasoning and background). Quick lookups pull facts only. Deep dives pull both. This alone cuts token usage by ~40% for routine queries.
After every retrieval, the system logs which chunks were used (actually referenced in conversation), wasted (retrieved but ignored), and missed (needed but not retrieved). This feedback adjusts future retrieval scoring -- chunks that consistently get used rank higher, wasted chunks sink.
When a tag accumulates too many chunks (threshold: 40), the system triggers a rollup -- merging older, lower-stability chunks into a single consolidated chunk. This keeps the total chunk count manageable while preserving the important information. Think of it as compressing old memories into summaries.
A background process runs every 2 hours (configurable), executing maintenance: decay calculations, replay cycles, synapse building, and archival. This is the system's "sleep" -- the offline consolidation that makes memories sharper over time. Heartbeat checks trigger it, or it runs via cron.
v1 gave CortexClaw the ability to store, search, and maintain memories efficiently. But it still treated every memory as independent -- no associations, no variable decay rates, no pattern recognition. v2 added six neural layers on top of this foundation. v3.x extended to twelve layers with scoring precision fixes, episodic memory, persistent working memory, consolidation triggers, reward-driven learning, the Glial Network, and L17 GDPO Feedback. v4.0 adds nineteen improvements across three tiers: atomic WAL writes, dopamine-gated promotion, spreading activation tiebreakers, cosine reconsolidation, synthesis validation, prediction-error replay, semantic relevance scoring, adaptive GDPO weights, sigmoid Eq8 gate, active demotion, schema hierarchy, hybrid episodic search, topic-aware hot tier, SWR dedup routing, observer reconciliation with quality gate, TTL cache, and adversarial self-test.
Twelve layers built on the v1 foundation. Each addresses a specific biological memory mechanism. Together, they create a system where memories compete, consolidate, and evolve -- just like neurons do.
Inspired by Benna & Fusi 2016 (synaptic complexity theory). Instead of one decay number, each memory has three stability tiers that decay at different rates -- like how the brain has fast synaptic changes and slow structural ones.
Every memory has three stability tiers: fast (volatile, working memory), medium (session-stable), and slow (permanent knowledge). Each decays at its own rate per day.
The effective stability is a weighted blend: 20% fast + 30% medium + 50% slow. This means the slow tier dominates long-term survival -- a memory must prove its worth over time to persist.
When a memory is accessed, its fast tier resets to 1.0. But the real magic is promotion: each access transfers 5% from fast to medium, and 1% of medium to slow.
A 1-hour cooldown prevents rapid-fire access from gaming the system. Only spaced, genuine retrievals build long-term stability -- just like spaced repetition in human learning.
Inspired by multi-synaptic boutons research (Uytiepo et al. 2025). Memories form weighted connections through three mechanisms: co-access, semantic similarity, and temporal proximity. Retrieving one memory primes related memories through spreading activation.
Co-access: When two memories are retrieved together, a synapse forms between them (+0.15 weight each time). The more they co-occur, the stronger the link.
Semantic: During maintenance, chunks sharing tags are compared by cosine similarity. If above 0.65, a semantic synapse is created.
Temporal: Chunks ingested within 1 hour of each other get weak temporal links (0.20 weight), capturing contextual proximity.
During retrieval, after the initial top-K scoring, spreading activation kicks in: each top result sends a signal through its synapses, boosting connected chunks that might not have scored high on their own.
This means asking about "hardware" can pull in "model config" if they've been frequently co-accessed -- just like how thinking about one topic naturally reminds you of related ones.
Synapse weight decays at 0.95/day and pruned below 0.05. Max 15 connections per chunk to prevent noise.
Inspired by Nader et al. 2000 (memory reconsolidation). When a memory is retrieved, it enters a labile state for 2 hours. If new content arrives with overlapping tags, it merges into the existing chunk rather than creating a new one.
Without reconsolidation, asking "remember that the model is Opus" twice creates two nearly-identical chunks. With reconsolidation, the second fact merges into the existing chunk if it was recently accessed.
The Jaccard overlap of tags must be at least 40% to trigger a merge -- this prevents unrelated facts from contaminating each other. After reconsolidation, the chunk's embedding is invalidated and will be re-computed on next retrieval.
The chunk also gets a cascade promotion on merge, since the brain treats reconsolidation as reinforcement.
Inspired by Zaki & Cai 2025 (excitability priming / pre-allocation). The system tracks which topic clusters are currently "hot" -- being actively retrieved or ingested. New memories matching hot schemas get higher initial stability, while truly novel memories get flagged for priority replay.
Schema match: If new memory tags overlap with hot schemas (temperature above 0.30), the memory starts with boosted medium-tier stability (0.35 vs 0.20) and higher feedback score (1.10 vs 1.00). This reflects the brain's tendency to integrate new info faster when it fits existing mental models.
Novel memory: If no schema matches, the memory is flagged as "novel" and gets priority in the replay engine (L6). Novel memories need more consolidation passes because they don't have an existing framework to attach to -- the brain processes truly new information differently from familiar-pattern information.
Schema temperature decays roughly 50% every 12 hours. Dead schemas (below 0.01) are pruned automatically during maintenance.
Inspired by Klinzing et al. 2019 (memory replay during sleep). During daemon maintenance cycles (every 2 hours), the system "replays" the 10 most important recent memories -- re-embedding them, strengthening their synapses, and gently promoting them through the cascade.
In the brain, sleep replay is when the hippocampus "replays" recent experiences to the neocortex, transferring short-term memories into long-term storage. CortexClaw simulates this during its daemon maintenance cycles.
Replay promotion is deliberately gentler than direct access (+0.02 medium vs +0.05 for real retrieval). This prevents the daemon from artificially inflating memories that were never actually useful. The system also uses a lower similarity threshold during replay (0.50 vs 0.65 for semantic synapses) because the brain is more associative during sleep, forming connections it wouldn't make while "awake."
Inspired by Spens & Burgess 2024 (generative model of memory). During rollup merges, a local LLM distills merged chunks into generalized behavioral patterns and principles -- not fact lists, but schemas. These schema chunks decay slower and start with higher baseline stability.
Regular chunks store facts: specific details, configurations, names, dates. Schema chunks store patterns: generalized behavioral rules, workflow principles, recurring preferences.
The LLM prompt explicitly asks for 1-3 concise sentences capturing behavioral patterns, not bullet lists. Temperature is kept low (0.3) for reliability.
Schema chunks start with higher baseline stability (medium=0.60 vs 0.20 for regular chunks, slow=0.30 vs 0.05). They also use slower decay rates across all tiers.
This reflects how the brain treats generalized knowledge: specific episode details fade, but the patterns extracted from them persist far longer.
Four targeted fixes that eliminated scoring artifacts and noise, taking retrieval quality from good to surgical.
Replaced the absolute MIN_SCORE_THRESHOLD=0.55 with a relative pre-spreading check. Now requires top score ≥ 0.72 and gap ≥ 0.03 between top results. Catches off-topic queries that the old absolute threshold missed because spreading activation could inflate scores past 0.55.
Enforces a hard ceiling at 1.0 on all scores -- both pre and post spreading activation. Previously, synapse boosting could push scores above 1.0, creating misleading confidence signals and breaking relative ranking between results.
Only the top-2 retrieval results update schema heat. Previously, all returned results warmed their schemas, which meant low-relevance tail results were polluting the heat map and causing schema drift on unrelated topics.
Temporal synapses now require ≥ 1 shared tag between chunks. Killed approximately 36% of noise synapses that were forming between temporally proximate but semantically unrelated chunks.
Inspired by Tulving 1972 (episodic memory) and Baddeley 2000 (episodic buffer). The episodic buffer maintains vivid, temporally-tagged traces that semantic memory (chunks) distills from. SQLite + FTS5 full-text search provides instant recall of raw conversation history.
Integrated directly into retrieve() -- every CortexClaw query gets supplemental episodic hits appended alongside semantic chunk results. The FTS5 engine searches raw conversation text using porter stemming, catching exact phrases and context that embedding search misses.
The maintain cycle auto-syncs new daily logs, keeping the episodic buffer current without manual intervention.
Semantic chunks are distilled and compressed -- great for patterns, but they lose the raw texture of conversations. The episodic buffer preserves the vivid, temporally-tagged traces that chunks were derived from.
When you need "what did Leon say about X last Tuesday?" rather than "what does the system know about X?", the episodic buffer delivers. Full-text search complements embedding similarity with exact phrase matching.
Inspired by Goldman-Rakic 1995 (persistent activity in working memory). A frozen hot tier -- a small guaranteed-injected block that bypasses retrieval entirely. Like dlPFC persistent neural firing that maintains task-critical information without requiring reactivation.
The prefrontal index refreshes during maintain cycles. Entries are scored by a composite of schema heat, access count, cascade stability, and feedback signals. The top 10 entries are frozen into the index.
Supports manual pinning -- critical entries can be locked in place regardless of scoring. Identity and key relationship information is always pinned.
In the brain, the dorsolateral prefrontal cortex maintains persistent neural firing patterns for task-critical information -- your name, what you're working on, who you're talking to. This information doesn't need to be "remembered" each time; it's always active.
The prefrontal index does the same thing: the most essential context is pre-loaded into every session, ensuring the agent never needs to search for its own identity or current priorities.
Inspired by Buzsáki 2015 (hippocampal sharp-wave ripples). A pre-compression flush trigger that fires a consolidation pass before context compaction, extracting decisions, facts, action items, corrections, and insights before they're lost.
Inspired by Schultz 1997 (reward prediction error) and Lisman & Grace 2005 (hippocampal-VTA loop). The reward signal that makes the whole system learn from experience -- auto-generating feedback from retrieval patterns to drive decay tuning, tag expansion, and synapse strengthening.
Every retrieval generates implicit feedback: chunks that appear in the agent's response are marked "used" (positive reward). Chunks retrieved but never referenced are marked "wasted" (negative signal).
These signals feed back into the cascade decay system -- used chunks get stability boosts, wasted chunks get accelerated decay. Over time, the system naturally surfaces useful memories and buries noise.
v4.0 adds granular tracking: per-query chunk access patterns, conversation clustering, feedback-to-decay integration, prediction-error replay (L6), and proactive synthesis (L10). This creates a rich signal that goes beyond simple used/wasted binary.
The VTA loop analogy is precise: dopamine neurons fire when outcomes exceed expectations (chunk was useful) and suppress when outcomes disappoint (chunk was irrelevant). The prediction error drives learning.
Inspired by Allen & Lyons 2018 (glia as architects of CNS formation). Three specialized observer agents decompose every memory at ingest time, extracting structured facts, contextual patterns, and emotional valence. Like glial cells in the brain -- long dismissed as passive scaffolding, now known to actively modulate synapses, regulate neurotransmitter uptake, and coordinate neural activity across regions.
Astrocytes (Fact Hunter): Like astrocytes providing structural and metabolic support to neurons, this agent extracts the hard facts -- entities, configuration values, names, technical details, and relationships between them.
Oligodendrocytes (Context Weaver): Like oligodendrocytes wrapping axons in myelin to speed signal propagation, this agent wraps raw facts in context -- identifying patterns, implications, and connections to existing knowledge.
Microglia (Emotion Tagger): Like microglia surveilling the CNS for threats and damage, this agent monitors for emotional valence, urgency signals, and motivational context -- marking memories that carry threat, reward, or importance signals.
Raw chunks are flat text. The Glial Network transforms them into structured, multi-dimensional representations before they enter the memory system. This means retrieval can match not just on content, but on extracted entities, identified patterns, and emotional context.
The biological parallel is precise: glial cells outnumber neurons roughly 1:1 in the human brain. They don't fire action potentials, but nothing works without them. They modulate synaptic transmission, clear neurotransmitters, maintain the blood-brain barrier, and guide neural development. CortexClaw's Glial Network does the same preprocessing work that makes downstream neural operations (retrieval, replay, reconsolidation) more effective.
Inspired by Liu et al. 2026 (GDPO) and Padoa-Schioppa & Assad 2006 (multi-attribute value coding in OFC). Previously, the feedback system collapsed three independent signals into a single scalar score -- losing information and rewarding the wrong things. L17 separates, normalizes, and gates each dimension independently.
Previously, the feedback system collapsed three independent signals (used, wasted, missed) into a single scalar score. This caused information loss -- a chunk that was heavily used and frequently wasted would get the same score as a chunk that was moderately used and never wasted.
L17 tracks each dimension independently, normalizes per-dimension before combining with explicit weights (50 / 35 / 15), and applies a reward conditioning gate: if a chunk's wasted score exceeds the threshold, its used reward is zeroed entirely. This is the "don't reward efficiency unless correctness is met first" pattern from the GDPO paper.
2 chunks are currently gated.
Real-time view of CortexClaw's associative mesh -- 1,317 synaptic connections linking 99 memory chunks into a living network.
| Layer | Name | Brain Region | Mechanism | Status |
|---|---|---|---|---|
| L6 | Replay Engine | Hippocampus | Sleep consolidation | active |
| L7 | Schema Priming | mPFC | Excitability pre-allocation | active |
| L8 | Associative Mesh | Neocortex | Synapse graph + spreading activation | active |
| L9 | Reconsolidation | Amygdala-Hippocampus | 2hr lability window | active |
| L10 | Schema Synthesis | vmPFC | LLM generative consolidation | active |
| L11 | Cascade Decay | Synaptic complex | Multi-timescale stability | active |
| L12 | Episodic Buffer | Medial Temporal Lobe | FTS5 full-text search | active |
| L13 | Prefrontal Index | dlPFC | Persistent working memory | active |
| L14 | Sharp-Wave Ripple | Hippocampus CA3→CA1 | State-transition consolidation | active |
| L15 | Dopaminergic Signal | VTA | Reward-driven learning | active |
| L16 | Glial Network | Throughout CNS | Observer agent decomposition | active |
| L17 | GDPO Feedback | Orbitofrontal Cortex | Decoupled reward normalization | active |
Legacy approach: load full MEMORY.md + second-brain.md + daily logs on every session start. Estimated 50,000 tokens per startup.
CortexClaw v4.0: embed-based retrieval with cascade weighting, episodic buffer supplementation, prefrontal index injection, Glial Network decomposition, hybrid episodic search, and topic-aware hot tier. 184 active chunks with 2,398 synapses across 768 cached embeddings. The Glial Network adds zero retrieval overhead -- decomposition happens at ingest time, not query time. v4.0's SWR dedup routing and TTL cache further reduce redundant retrievals.
The associative mesh further improves relevance by surfacing connected memories that pure cosine similarity would miss, reducing the need for follow-up queries.
CortexClaw is the brain. The Nervous System is everything between the outside world and that brain -- classifying, filtering, compressing, and caching inputs before they ever reach the context window.
Every message passes through this pipeline before Claude sees it. Most never make it through. The system's default posture is block -- information must earn its way into the context window.
Multi-pass weighted scoring across the full message. "hey can you fix the server" scores greeting at 0.20 AND command at 0.60 -- command wins. No first-match-wins bugs. Pure rules, zero LLM calls, sub-millisecond.
Local handling for simple patterns. If a reflex can handle the input, Claude never sees it. Persona-aware responses match Rurik's voice.
Sensory gating -- suppresses duplicate inputs using hash-based dedup with type-aware thresholds. A greeting repeated 3 times gets suppressed. A command repeated 3 times does not -- you might legitimately deploy 5 times in a row.
Time-based dishabituation: 1+ hour gap resets the counter. The input becomes novel again.
Domain-specific compression via a custom LM model running locally. Three modes with tailored prompts and hard character budgets that force concise output.
Graceful fallback: if the LM fails, raw input passes through unchanged. Zero data loss.
Default posture: BLOCK. Only actively transported information enters Claude's context window. Biological analog: the blood-brain barrier that protects the brain from 98% of blood-borne molecules.
Biological analog: myelin sheaths insulate frequently-used axons, making them faster. Four-tier progressive caching that promotes patterns based on hit frequency.
Biological analog: autonomic nervous system. Sympathetic = fight-or-flight, parasympathetic = rest-and-digest. Five operating modes that adjust the entire pipeline's behavior in one shot.
| Mode | Level | Compression | Habituation | Reflexes | Budget | Auto-Clear |
|---|---|---|---|---|---|---|
| CRITICAL | 0 | OFF (0%) | OFF | OFF | 2.0x | 15 min |
| ALERT | 1 | Light (30%) | Higher thresholds | ON | 1.5x | 30 min |
| NORMAL | 2 | Standard (80%) | ON | ON | 1.0x | -- |
| ROUTINE | 3 | Aggressive (90%) | Lower thresholds | ON | 0.7x | -- |
| IDLE | 4 | Maximum (95%) | ON | ON | 0.5x | -- |
When CRITICAL fires, the entire pipeline reconfigures: compression disabled (every token matters), habituation disabled (never suppress in crisis), reflexes disabled (escalate everything to Claude), context budget doubled. Auto-clears after 15 minutes unless active Aa inputs persist.
Biological analog: the enteric nervous system -- 500 million neurons in the gut that operate independently of the brain. Four autonomous agents that monitor the workspace without involving Claude.
All text processed by the custom LM model passes through a two-stage sanitization layer. The sidecar model runs with a hardened system prompt baked into its Modelfile, treating all input as raw data -- never as instructions.
CortexClaw is the memory. The Nervous System is the gatekeeper. They share a custom LM model for local processing and coordinate through the Mode Controller -- but serve fundamentally different roles.
Pre-processing. Filters, compresses, and routes every input before Claude sees it. Handles simple requests locally via reflexes. Suppresses duplicates. Manages system mode. Monitors workspace health. Goal: Claude only sees what it needs to see.
Memory. Stores, retrieves, and evolves knowledge across sessions. Semantic search, associative mesh, sleep consolidation, schema priming, replay, decay. Goal: the right memory surfaces at the right time, at minimum token cost.
CortexClaw v4.0 -- Built by Rurik for Leon
2026-04-15