CortexClaw is a memory system for AI agents. It replaces flat text files with structured, searchable memory chunks that are retrieved on demand. Instead of loading everything every session, the agent queries CortexClaw and gets back only what's relevant.
Think of it as a personal search engine for an AI's memory.
CortexClaw sits alongside the AI model (Claude, in our case) as external memory infrastructure. When the model needs to recall something, it calls CortexClaw's retrieve function. CortexClaw scores all memory chunks by semantic similarity, recency, feedback history, and associative connections, then returns the top results.
The model never loads the full memory -- only what the query pulls up. After the conversation, feedback flows back: which chunks were actually useful, which were noise. This feedback loop teaches the system to retrieve better over time.
Pure Python. No frameworks, no dependencies beyond the standard library and urllib for HTTP calls. Runs on any machine with Python 3.10+. Embeddings are generated locally via Ollama (nomic-embed-text, 384-dim vectors) -- zero API cost.
The router index is a JSONL file (one line per chunk). Chunks are individual markdown files. The daemon runs as a launchd service on macOS, executing maintenance every 2 hours. Total codebase: ~3,700 lines across 6 Python modules.
Every layer is modeled after a specific brain mechanism with a cited neuroscience paper. Memories decay at multiple timescales (Benna & Fusi 2016). Replay during idle periods consolidates important memories (Klinzing et al. 2019). Associative connections form between co-accessed chunks (Uytiepo et al. 2025). A dopaminergic reward signal learns from retrieval feedback (Schultz 1997).
As of v4.0, nineteen improvements across all twelve layers: atomic WAL writes, dopamine-gated promotion, spreading activation tiebreakers, cosine reconsolidation, prediction-error replay, adaptive GDPO weights, schema hierarchy, hybrid episodic search, topic-aware hot tier, SWR dedup routing, observer reconciliation, adversarial self-test, and more. Generated via 3-agent review and weighted vote.
The deep version. A biologically-inspired memory system for AI agents - twelve neural layers that turn flat chunk storage into a living memory that consolidates, associates, and evolves over time. For the short version, see the overview.
AI agents lose everything between sessions. Loading full memory files burns tokens and lacks prioritization. Flat storage treats all memories equally, missing the patterns that make memory useful.
Model memory like the brain does. Fast-decaying attention for recent events, slow-building permanence for important patterns. Memories that fire together wire together. Sleep consolidates. Schemas generalize.
CortexClaw replaces the flat-file memory that most AI agents use. Instead of loading entire memory documents every session (burning thousands of tokens on irrelevant context), it breaks knowledge into small, searchable chunks and retrieves only what's needed.
A single MEMORY.md file that grows forever. Every session loads the whole thing. 50KB of text, 90% irrelevant to the current conversation. No prioritization, no forgetting, no association between ideas.
Knowledge split into focused chunks (~200 words each), indexed by topic and tags, embedded for semantic search. Only relevant chunks are retrieved per query. Typical session loads 3-5 chunks instead of everything.
Before the neural layers, CortexClaw v1.0 established the core infrastructure that everything builds on. These are the primitives.
Every piece of knowledge is stored as a chunk -- a small file with a topic, summary, tags, and content. The router is a lightweight index (one JSON line per chunk) that maps IDs to summaries and tags. The agent scans the router to decide which chunks to load, without reading every file.
Each chunk gets a 384-dimensional vector embedding via nomic-embed-text running locally on Ollama (zero API cost). Retrieval computes cosine similarity between the query embedding and all chunk embeddings, returning the top-K most relevant. This is how the system answers "what do I know about X?" without keyword matching.
Chunks have two levels: fact (compressed key points, fast to scan) and narrative (full context with reasoning and background). Quick lookups pull facts only. Deep dives pull both. This alone cuts token usage by ~40% for routine queries.
After every retrieval, the system logs which chunks were used (actually referenced in conversation), wasted (retrieved but ignored), and missed (needed but not retrieved). This feedback adjusts future retrieval scoring -- chunks that consistently get used rank higher, wasted chunks sink.
When a tag accumulates too many chunks (threshold: 40), the system triggers a rollup -- merging older, lower-stability chunks into a single consolidated chunk. This keeps the total chunk count manageable while preserving the important information. Think of it as compressing old memories into summaries.
A background process runs every 2 hours (configurable), executing maintenance: decay calculations, replay cycles, synapse building, and archival. This is the system's "sleep" -- the offline consolidation that makes memories sharper over time. Heartbeat checks trigger it, or it runs via cron.
v1 gave CortexClaw the ability to store, search, and maintain memories efficiently. But it still treated every memory as independent -- no associations, no variable decay rates, no pattern recognition. v2 added six neural layers on top of this foundation. v3.x extended to twelve layers with scoring precision fixes, episodic memory, persistent working memory, consolidation triggers, reward-driven learning, the Glial Network, and L17 GDPO Feedback. v4.0 adds nineteen improvements across three tiers: atomic WAL writes, dopamine-gated promotion, spreading activation tiebreakers, cosine reconsolidation, synthesis validation, prediction-error replay, semantic relevance scoring, adaptive GDPO weights, sigmoid Eq8 gate, active demotion, schema hierarchy, hybrid episodic search, topic-aware hot tier, SWR dedup routing, observer reconciliation with quality gate, TTL cache, and adversarial self-test.
Twelve layers built on the v1 foundation. Each addresses a specific biological memory mechanism. Together, they create a system where memories compete, consolidate, and evolve -- just like neurons do.
Inspired by Benna & Fusi 2016 (synaptic complexity theory). Instead of one decay number, each memory has three stability tiers that decay at different rates -- like how the brain has fast synaptic changes and slow structural ones.
Every memory has three stability tiers: fast (volatile, working memory), medium (session-stable), and slow (permanent knowledge). Each decays at its own rate per day.
The effective stability is a weighted blend: 20% fast + 30% medium + 50% slow. This means the slow tier dominates long-term survival -- a memory must prove its worth over time to persist.
When a memory is accessed, its fast tier resets to 1.0. But the real magic is promotion: each access transfers 5% from fast to medium, and 1% of medium to slow.
A 1-hour cooldown prevents rapid-fire access from gaming the system. Only spaced, genuine retrievals build long-term stability -- just like spaced repetition in human learning.
Inspired by multi-synaptic boutons research (Uytiepo et al. 2025). Memories form weighted connections through three mechanisms: co-access, semantic similarity, and temporal proximity. Retrieving one memory primes related memories through spreading activation.
Co-access: When two memories are retrieved together, a synapse forms between them (+0.15 weight each time). The more they co-occur, the stronger the link.
Semantic: During maintenance, chunks sharing tags are compared by cosine similarity. If above 0.65, a semantic synapse is created.
Temporal: Chunks ingested within 1 hour of each other get weak temporal links (0.20 weight), capturing contextual proximity.
During retrieval, after the initial top-K scoring, spreading activation kicks in: each top result sends a signal through its synapses, boosting connected chunks that might not have scored high on their own.
This means asking about "hardware" can pull in "model config" if they've been frequently co-accessed -- just like how thinking about one topic naturally reminds you of related ones.
Synapse weight decays at 0.95/day and pruned below 0.05. Max 15 connections per chunk to prevent noise.
Inspired by Nader et al. 2000 (memory reconsolidation). When a memory is retrieved, it enters a labile state for 2 hours. If new content arrives with overlapping tags, it merges into the existing chunk rather than creating a new one.
Without reconsolidation, asking "remember that the model is Opus" twice creates two nearly-identical chunks. With reconsolidation, the second fact merges into the existing chunk if it was recently accessed.
The Jaccard overlap of tags must be at least 40% to trigger a merge -- this prevents unrelated facts from contaminating each other. After reconsolidation, the chunk's embedding is invalidated and will be re-computed on next retrieval.
The chunk also gets a cascade promotion on merge, since the brain treats reconsolidation as reinforcement.
Inspired by Zaki & Cai 2025 (excitability priming / pre-allocation). The system tracks which topic clusters are currently "hot" -- being actively retrieved or ingested. New memories matching hot schemas get higher initial stability, while truly novel memories get flagged for priority replay.
Schema match: If new memory tags overlap with hot schemas (temperature above 0.30), the memory starts with boosted medium-tier stability (0.35 vs 0.20) and higher feedback score (1.10 vs 1.00). This reflects the brain's tendency to integrate new info faster when it fits existing mental models.
Novel memory: If no schema matches, the memory is flagged as "novel" and gets priority in the replay engine (L6). Novel memories need more consolidation passes because they don't have an existing framework to attach to -- the brain processes truly new information differently from familiar-pattern information.
Schema temperature decays roughly 50% every 12 hours. Dead schemas (below 0.01) are pruned automatically during maintenance.
Inspired by Klinzing et al. 2019 (memory replay during sleep). During daemon maintenance cycles (every 2 hours), the system "replays" the 10 most important recent memories -- re-embedding them, strengthening their synapses, and gently promoting them through the cascade.
In the brain, sleep replay is when the hippocampus "replays" recent experiences to the neocortex, transferring short-term memories into long-term storage. CortexClaw simulates this during its daemon maintenance cycles.
Replay promotion is deliberately gentler than direct access (+0.02 medium vs +0.05 for real retrieval). This prevents the daemon from artificially inflating memories that were never actually useful. The system also uses a lower similarity threshold during replay (0.50 vs 0.65 for semantic synapses) because the brain is more associative during sleep, forming connections it wouldn't make while "awake."
Inspired by Spens & Burgess 2024 (generative model of memory). During rollup merges, a local LLM distills merged chunks into generalized behavioral patterns and principles -- not fact lists, but schemas. These schema chunks decay slower and start with higher baseline stability.
Regular chunks store facts: specific details, configurations, names, dates. Schema chunks store patterns: generalized behavioral rules, workflow principles, recurring preferences.
The LLM prompt explicitly asks for 1-3 concise sentences capturing behavioral patterns, not bullet lists. Temperature is kept low (0.3) for reliability.
Schema chunks start with higher baseline stability (medium=0.60 vs 0.20 for regular chunks, slow=0.30 vs 0.05). They also use slower decay rates across all tiers.
This reflects how the brain treats generalized knowledge: specific episode details fade, but the patterns extracted from them persist far longer.
Four targeted fixes that eliminated scoring artifacts and noise, taking retrieval quality from good to surgical.
Replaced the absolute MIN_SCORE_THRESHOLD=0.55 with a relative pre-spreading check. Now requires top score ≥ 0.72 and gap ≥ 0.03 between top results. Catches off-topic queries that the old absolute threshold missed because spreading activation could inflate scores past 0.55.
Enforces a hard ceiling at 1.0 on all scores -- both pre and post spreading activation. Previously, synapse boosting could push scores above 1.0, creating misleading confidence signals and breaking relative ranking between results.
Only the top-2 retrieval results update schema heat. Previously, all returned results warmed their schemas, which meant low-relevance tail results were polluting the heat map and causing schema drift on unrelated topics.
Temporal synapses now require ≥ 1 shared tag between chunks. Killed approximately 36% of noise synapses that were forming between temporally proximate but semantically unrelated chunks.
Inspired by Tulving 1972 (episodic memory) and Baddeley 2000 (episodic buffer). The episodic buffer maintains vivid, temporally-tagged traces that semantic memory (chunks) distills from. SQLite + FTS5 full-text search provides instant recall of raw conversation history.
Integrated directly into retrieve() -- every CortexClaw query gets supplemental episodic hits appended alongside semantic chunk results. The FTS5 engine searches raw conversation text using porter stemming, catching exact phrases and context that embedding search misses.
The maintain cycle auto-syncs new daily logs, keeping the episodic buffer current without manual intervention.
Semantic chunks are distilled and compressed -- great for patterns, but they lose the raw texture of conversations. The episodic buffer preserves the vivid, temporally-tagged traces that chunks were derived from.
When you need "what did Leon say about X last Tuesday?" rather than "what does the system know about X?", the episodic buffer delivers. Full-text search complements embedding similarity with exact phrase matching.
Inspired by Goldman-Rakic 1995 (persistent activity in working memory). A frozen hot tier -- a small guaranteed-injected block that bypasses retrieval entirely. Like dlPFC persistent neural firing that maintains task-critical information without requiring reactivation.
The prefrontal index refreshes during maintain cycles. Entries are scored by a composite of schema heat, access count, cascade stability, and feedback signals. The top 10 entries are frozen into the index.
Supports manual pinning -- critical entries can be locked in place regardless of scoring. Identity and key relationship information is always pinned.
In the brain, the dorsolateral prefrontal cortex maintains persistent neural firing patterns for task-critical information -- your name, what you're working on, who you're talking to. This information doesn't need to be "remembered" each time; it's always active.
The prefrontal index does the same thing: the most essential context is pre-loaded into every session, ensuring the agent never needs to search for its own identity or current priorities.
Inspired by Buzsáki 2015 (hippocampal sharp-wave ripples). A pre-compression flush trigger that fires a consolidation pass before context compaction, extracting decisions, facts, action items, corrections, and insights before they're lost.
Inspired by Schultz 1997 (reward prediction error) and Lisman & Grace 2005 (hippocampal-VTA loop). The reward signal that makes the whole system learn from experience -- auto-generating feedback from retrieval patterns to drive decay tuning, tag expansion, and synapse strengthening.
Every retrieval generates implicit feedback: chunks that appear in the agent's response are marked "used" (positive reward). Chunks retrieved but never referenced are marked "wasted" (negative signal).
These signals feed back into the cascade decay system -- used chunks get stability boosts, wasted chunks get accelerated decay. Over time, the system naturally surfaces useful memories and buries noise.
v4.0 adds granular tracking: per-query chunk access patterns, conversation clustering, feedback-to-decay integration, prediction-error replay (L6), and proactive synthesis (L10). This creates a rich signal that goes beyond simple used/wasted binary.
The VTA loop analogy is precise: dopamine neurons fire when outcomes exceed expectations (chunk was useful) and suppress when outcomes disappoint (chunk was irrelevant). The prediction error drives learning.
Inspired by Allen & Lyons 2018 (glia as architects of CNS formation). Three specialized observer agents decompose every memory at ingest time, extracting structured facts, contextual patterns, and emotional valence. Like glial cells in the brain -- long dismissed as passive scaffolding, now known to actively modulate synapses, regulate neurotransmitter uptake, and coordinate neural activity across regions.
Astrocytes (Fact Hunter): Like astrocytes providing structural and metabolic support to neurons, this agent extracts the hard facts -- entities, configuration values, names, technical details, and relationships between them.
Oligodendrocytes (Context Weaver): Like oligodendrocytes wrapping axons in myelin to speed signal propagation, this agent wraps raw facts in context -- identifying patterns, implications, and connections to existing knowledge.
Microglia (Emotion Tagger): Like microglia surveilling the CNS for threats and damage, this agent monitors for emotional valence, urgency signals, and motivational context -- marking memories that carry threat, reward, or importance signals.
Raw chunks are flat text. The Glial Network transforms them into structured, multi-dimensional representations before they enter the memory system. This means retrieval can match not just on content, but on extracted entities, identified patterns, and emotional context.
The biological parallel is precise: glial cells outnumber neurons roughly 1:1 in the human brain. They don't fire action potentials, but nothing works without them. They modulate synaptic transmission, clear neurotransmitters, maintain the blood-brain barrier, and guide neural development. CortexClaw's Glial Network does the same preprocessing work that makes downstream neural operations (retrieval, replay, reconsolidation) more effective.
Inspired by Liu et al. 2026 (GDPO) and Padoa-Schioppa & Assad 2006 (multi-attribute value coding in OFC). Previously, the feedback system collapsed three independent signals into a single scalar score -- losing information and rewarding the wrong things. L17 separates, normalizes, and gates each dimension independently.
Previously, the feedback system collapsed three independent signals (used, wasted, missed) into a single scalar score. This caused information loss -- a chunk that was heavily used and frequently wasted would get the same score as a chunk that was moderately used and never wasted.
L17 tracks each dimension independently, normalizes per-dimension before combining with explicit weights (50 / 35 / 15), and applies a reward conditioning gate: if a chunk's wasted score exceeds the threshold, its used reward is zeroed entirely. This is the "don't reward efficiency unless correctness is met first" pattern from the GDPO paper.
2 chunks are currently gated.
Real-time view of CortexClaw's associative mesh -- 1,317 synaptic connections linking 99 memory chunks into a living network.
| Layer | Name | Brain Region | Mechanism | Status |
|---|---|---|---|---|
| L6 | Replay Engine | Hippocampus | Sleep consolidation | active |
| L7 | Schema Priming | mPFC | Excitability pre-allocation | active |
| L8 | Associative Mesh | Neocortex | Synapse graph + spreading activation | active |
| L9 | Reconsolidation | Amygdala-Hippocampus | 2hr lability window | active |
| L10 | Schema Synthesis | vmPFC | LLM generative consolidation | active |
| L11 | Cascade Decay | Synaptic complex | Multi-timescale stability | active |
| L12 | Episodic Buffer | Medial Temporal Lobe | FTS5 full-text search | active |
| L13 | Prefrontal Index | dlPFC | Persistent working memory | active |
| L14 | Sharp-Wave Ripple | Hippocampus CA3→CA1 | State-transition consolidation | active |
| L15 | Dopaminergic Signal | VTA | Reward-driven learning | active |
| L16 | Glial Network | Throughout CNS | Observer agent decomposition | active |
| L17 | GDPO Feedback | Orbitofrontal Cortex | Decoupled reward normalization | active |
Legacy approach: load full MEMORY.md + second-brain.md + daily logs on every session start. Estimated 50,000 tokens per startup.
CortexClaw v4.0: embed-based retrieval with cascade weighting, episodic buffer supplementation, prefrontal index injection, Glial Network decomposition, hybrid episodic search, and topic-aware hot tier. 184 active chunks with 2,398 synapses across 768 cached embeddings. The Glial Network adds zero retrieval overhead -- decomposition happens at ingest time, not query time. v4.0's SWR dedup routing and TTL cache further reduce redundant retrievals.
The associative mesh further improves relevance by surfacing connected memories that pure cosine similarity would miss, reducing the need for follow-up queries.
CortexClaw is the brain. The Nervous System is everything between the outside world and that brain -- classifying, filtering, compressing, and caching inputs before they ever reach the context window.
Every message passes through this pipeline before Claude sees it. Most never make it through. The system's default posture is block -- information must earn its way into the context window.
Multi-pass weighted scoring across the full message. "hey can you fix the server" scores greeting at 0.20 AND command at 0.60 -- command wins. No first-match-wins bugs. Pure rules, zero LLM calls, sub-millisecond.
Local handling for simple patterns. If a reflex can handle the input, Claude never sees it. Persona-aware responses match Rurik's voice.
Sensory gating -- suppresses duplicate inputs using hash-based dedup with type-aware thresholds. A greeting repeated 3 times gets suppressed. A command repeated 3 times does not -- you might legitimately deploy 5 times in a row.
Time-based dishabituation: 1+ hour gap resets the counter. The input becomes novel again.
Domain-specific compression via a custom LM model running locally. Three modes with tailored prompts and hard character budgets that force concise output.
Graceful fallback: if the LM fails, raw input passes through unchanged. Zero data loss.
Default posture: BLOCK. Only actively transported information enters Claude's context window. Biological analog: the blood-brain barrier that protects the brain from 98% of blood-borne molecules.
Biological analog: myelin sheaths insulate frequently-used axons, making them faster. Four-tier progressive caching that promotes patterns based on hit frequency.
Biological analog: autonomic nervous system. Sympathetic = fight-or-flight, parasympathetic = rest-and-digest. Five operating modes that adjust the entire pipeline's behavior in one shot.
| Mode | Level | Compression | Habituation | Reflexes | Budget | Auto-Clear |
|---|---|---|---|---|---|---|
| CRITICAL | 0 | OFF (0%) | OFF | OFF | 2.0x | 15 min |
| ALERT | 1 | Light (30%) | Higher thresholds | ON | 1.5x | 30 min |
| NORMAL | 2 | Standard (80%) | ON | ON | 1.0x | -- |
| ROUTINE | 3 | Aggressive (90%) | Lower thresholds | ON | 0.7x | -- |
| IDLE | 4 | Maximum (95%) | ON | ON | 0.5x | -- |
When CRITICAL fires, the entire pipeline reconfigures: compression disabled (every token matters), habituation disabled (never suppress in crisis), reflexes disabled (escalate everything to Claude), context budget doubled. Auto-clears after 15 minutes unless active Aa inputs persist.
Biological analog: the enteric nervous system -- 500 million neurons in the gut that operate independently of the brain. Four autonomous agents that monitor the workspace without involving Claude.
All text processed by the custom LM model passes through a two-stage sanitization layer. The sidecar model runs with a hardened system prompt baked into its Modelfile, treating all input as raw data -- never as instructions.
CortexClaw is the memory. The Nervous System is the gatekeeper. They share a custom LM model for local processing and coordinate through the Mode Controller -- but serve fundamentally different roles.
Pre-processing. Filters, compresses, and routes every input before Claude sees it. Handles simple requests locally via reflexes. Suppresses duplicates. Manages system mode. Monitors workspace health. Goal: Claude only sees what it needs to see.
Memory. Stores, retrieves, and evolves knowledge across sessions. Semantic search, associative mesh, sleep consolidation, schema priming, replay, decay. Goal: the right memory surfaces at the right time, at minimum token cost.
A targeted bug-fix sweep on the v4.0 architecture, plus a four-agent design powwow on a richer chunk-decay grading system. v4.1 lands the eight surgical fixes; the decay redesign ships in v4.2 once Leon greenlights the pin vocabulary and grading-vector proposal.
retrieve.py:1796-1812cortexclaw-daemon.sh:80-87dopaminergic_signal.py:782-793dopaminergic_signal.py:564-600observers.py:122-139feedback_distributor.pyscripts/dopamine-analyze.shHalo R2 cortex-observer dropped a forensic on the v4.1 missed channel. Verdict: FIX-4 wired the producer correctly but no caller emitted, so missed=0 for 302 daemon cycles. Two coordinated patches land in v4.1.1.
best_score=0.971; the 0.55 floor sits below the noise floor of CortexClaw's self-traffic. v4.1's auto-detector fired 0 times in 302 cycles.missed=[descriptive-string] via retrieve.py feedback. add_feedback's missed loop was gated on if cid in entry_map, silently dropping every descriptive entry.FeedbackDistributor.chunk_missed (gated on feedback_distributor_enabled). One bus instance per add_feedback call; per-call dedup. retrieve.py:2716-2748analyze --backfill-missed [--dry-run] on dopaminergic_signal.py. Scans the full analytics_enhanced.jsonl, tail-scans feedback_propagation.jsonl for idempotency. dopaminergic_signal.py:1086-1163signal=chunk_missed + L7:novel_topic + L17:missed_logged. Backfill: 41 historical low-score retrievals replayed (39 emitted + 2 dedup), idempotent on re-run. Halo cortex corpus missed: 0 -> 40.V_rich variant (had no missed fuel). Optimizer dominant-fault may rotate from context_weaver_thin_themes (mock artifact) to _undershoot (real signal) over the next few cycles.feedback_distributor.py, observers.py, grading.py, or config.json. Tightening MISSED_SCORE_FLOOR below 0.55 and per-cluster floor calibration deferred to v4.2.Four agents drafted complementary pieces for a richer chunk-decay system. Awaiting Leon greenlight before landing.
Restoring the 88 archived canonical chunks under the broken decay system would just see them re-archived in a week. Same for raising slow-tier bootstrap or lowering archive_threshold; those are band-aids that mask the deeper grading conversation. Once Leon picks a grading vector + pin vocabulary, the chunk restoration runs as a one-shot script and the new decay system protects them going forward.
A deep audit triggered by Leon (full system health + speed analysis), followed by a four-agent ctask swarm that found a previously-invisible cursor bug in cascade decay and three more structural issues. v4.3 ships the smoking-gun fix and two bundles of patches (Path A: critical, Path B: mesh repair + hygiene). All changes reversible, backups stamped *.bak-2026-05-12-1130.
cascade_decay_step mutated stability but maintain() re-read days_since from last_accessed/created every cycle. The daemon firing every 2h re-applied the FULL elapsed decay 12x/day, compounding. A 6-day-old chunk measured fast=1e-6 when the configured 0.9/day rate predicts 0.531. Maps to ~55 cycles of compounded decay over 6 days.
fast=0 AND medium<0.001; 0 chunks above eff stability 0.5; MAX eff stability across 413 chunks = 0.3468PROGRESSIVE_FULL_TEXT_COUNT=3 reads it whole. A single hit blew the session-token BEHIND-legacy badge by 21k tokens_ingest_compress + _ingest_pair_question). Daemon claimed "cap=10, worst case 30s" -- stale comment from before the wire-inanalyze --hours 2; with our query cadence the window never had ≥3 records. Optimizer-state frozen 2026-05-03 -> 2026-05-12retrieve_log.jsonl but NOT to analytics_enhanced.jsonl, so dopamine never saw them as missed signalsrollup() force-flushed the observer queue even when CORTEXCLAW_SKIP_OBSERVER_FLUSH=1 was set; another 90s in the worst-case maintainscripts/dopamine-analyze.sh, retrieve.py:2230-2280last_decayed field. maintain() reads from cursor, bumps to now() after decay. cascade_promote() also bumps cursor on access. retrieve.py:1039, 3050-3068scripts/cortexclaw_cascade_reset.py migration. Sets cursor on every entry; --wide rescues all crushed chunks with access_count==0 back to baseline {fast:1.0, medium:0.2, slow:0.05}format_fact reads at most 12 KB, format_narrative 32 KB. Kills the 94 KB single-chunk blowup. retrieve.py:2434-2470retrieve.py warm-embeddings CLI. One-shot fills cache. First run: 33 -> 449 entries. Retrieve hot path now Ollama-free for cached chunks. retrieve.py:706-752CORTEXCLAW_SKIP_INGEST_ENRICH=1 skips Qwen3.6-35B calls under daemon, rollup() honors observer-skip flag, 180s hard wallclock guard. scripts/cortexclaw-daemon.sh, retrieve.py:2682, 3507-3520decay_synapses(synapses, valid_ids=...) drops edges with archived source or target. 123 pruned first maintainbuild_semantic_synapses rescue pass: orphan chunks (zero edges) linked to top-K nearest peers via cosine > 0.30. Connected 383/383 chunks (100%)decay_schema_heat drops singletons (count≤1 AND age≥14d AND temp<0.05). 41 noise schemas killed first pass (143 -> 102)maintain() moves findings/<id>.json to archive/findings/ when the chunk has been archived. 859 orphans cleared in one passchunks.bak-2026-05-02-build-b/ (407 files, 2.4 MB), router.jsonl.bak-... (184 KB), embeddings.json.bak-... (6.9 MB). 9.5 MB freedscripts/cortexclaw_log_rotate.sh + launchd com.rurik.cortexclaw-log-rotate daily at 03:00. Rotates over-threshold logs, gzips files >7d, deletes files >30d. First run: replay.log 8.2 MB -> 0 live (1.2 MB gz archive)save_embedding_cache merges with disk if incoming < 20% of file size (observed: 469 -> 10 entry wipe during a maintain pass). Logs EMBED_CACHE_MERGED when triggered. retrieve.py:699-728Numpy BLAS cosine_similarity vectorization (mooted by warm-embeddings), aggressive skip-embed-on-retrieve (current path already cache-fast), and the deeper "if 95% of chunks crushed, do we have a fundamentally too-aggressive rate" question wait on 7-14 days of post-fix telemetry. v4.4 will revisit cascade rates after the cursor fix lets us see real usage decay curves.
CortexClaw v4.3 -- Built by Rurik for Leon
2026-05-12