Claude Evolution System

production AI Development

Self-improving AI development environment. Discovers new capabilities, evaluates them against a scoring framework, and integrates approved tools on cron.

Autonomous capability discovery pipeline
5-criterion evaluation scoring framework
Auto-integration of approved capabilities
Multi-model orchestration (Claude, Codex, Gemini)

Claude CodeBashTypeScriptMCP

View on GitHub

Activity Timeline

2026-03-31
6 capabilities approved; GPT-5.4 Mini/Nano discovered; playbook catalog grew to 93 entries.
Discovery heartbeat evaluated 14 candidates and flagged 7 as novel. Approved integrations include PermissionDenied Hook, two new OpenAI model variants, Cloudflare Workers sandboxing, and session ID tracking. Four new automation patterns extracted from recurring agent workflows.

featureautomationversion-update
2026-03-30
Capability discovery: 2 of 4 candidates approved; helpers inventory at 55% utilization; model catalog updated.
ANTHROPIC env vars and Piebald-AI reference tracker approved for integration. 89 helpers cataloged across 5 categories, indices regenerated. GPT-5.4 mini/nano variants entered evaluation pipeline after March 17 release discovery.

health-checkautomationversion-update
2026-03-28
Heartbeat run: 2 novel capabilities from 13 finds; PreToolUse Hook integration started.
PreToolUse Hook approved (81.75/100), integration 2/5 steps complete pending manual config writes. Computer Use deferred (macOS-only), Plain-Text Architecture rejected. 89 helpers verified across 5 categories; model table updated.

automationversion-updatehealth-check
2026-03-28
Heartbeat run: 13 candidates → 2 discoveries; PreToolUse approved but blocked.
PreToolUse AskUserQuestion satisfaction approved for integration; config permissions blocker documented for manual resolution. Computer Use deferred (Linux support pending). Helpers inventory at 89/200.

automationhealth-checkversion-update
2026-03-24
3 capabilities integrated; 6 approved; DSPy optimization pipeline planned.
Background Agent Partial Results, MCP Plugin Deduplication, and Stale Tool Output Cleanup integrated with registry updates. A 14-item evaluation pass approved 6 more for integration. Publication-review DSPy pipeline designed using 33 audit cases.

featureautomationarchitecture
2026-03-24
3 capabilities integrated; 14 pending evaluations triaged across 6 sessions.
Approved items include Background Agent Partial Results Preservation, MCP Plugin Deduplication, Stale Tool Output Cleanup, GPT-5.4 mini/nano support, Context Mode, and PAL MCP integration. Heartbeat discovery ran full cycle; 9-step DSPy pipeline designed for publication-review prompt optimization across 33 blog posts.

architectureautomationversion-update
2026-03-22
Personality pipeline corrected: 3 methodological flaws fixed, posts 030 and 038 updated.
Sampling had zero SMS messages; GPT Conscientiousness bias measured at +23.9; Opus showed pipeline-dependent variance. Both affected posts reanalyzed with corrected data. MCP Response Injection approved for integration pipeline at 72.5/100.

bugfixrefactorhealth-check
2026-03-19
Capability pipeline: 2 approved (btw skill, GitHub dynamic-toolsets), v2.1.79 fixes documented.
Evaluated 8 candidates; approved /btw (91.5) and GitHub MCP dynamic-toolsets (73.75). Version investigation identified 2 high-impact fixes for SessionEnd hooks on /resume and cron subprocess stability. AI model catalog updated with GPT-5.4 mini/nano variants.

automationversion-updatefeature
2026-03-18
13 capabilities evaluated, 4 approved; StopFailure hook and GPT-5.4 mini/nano integrated.
Claude Code v2.1.78 features classified and registered, including the new StopFailure hook event with redundancy triggers. GPT-5.4 mini/nano variants discovered day-of-release (2026-03-17) and approved. Daily discovery heartbeat surfaced three novel capabilities.

version-updateautomationhealth-check
2026-03-16
claude-agent-sdk integrated (77/100); 8 capabilities triaged; helper system graduated to weekly monitoring.
New claude-api skill created documenting programmatic agent-building. Discovery found v2.1.77 breaking changes and mTarsier MCP Config Manager. Six of eight pending capabilities rejected; one approved, one queued for deeper research.

featureautomationversion-update
2026-03-15
Auto Mode integration complete; capability pipeline processed 2 items; 3 new helpers added.
permissions.defaultMode: auto now live in settings.json and documented globally. Discovery heartbeat filtered 15 candidates down to 2 novel finds — Auto Mode approved, Memory Compression deferred with 7-day research window to 2026-03-22. Workspace orchestrator initialized with cross-project monitoring configuration.

featureautomationhealth-check
2026-03-14
8 capabilities evaluated; MCP Elicitation Support integrated; 2 new candidates filed from discovery run.
Hook-lifecycle skill expanded from 18 to 20 hook types after MCP Elicitation Support approval. Release notes for v2.1.74–76 investigated; deferred-tools schema bug fix confirmed resolved. Claudia and SkillNet flagged as priority evaluation targets from Discord triage.

featureversion-updateautomation
2026-03-13
Discovery flagged 2 novel MCPs; DSPy optimization campaign planned for 13 agents and skills.
codebase-memory-mcp and CogniLayer surfaced with high token-savings claims. Three candidates rejected as duplicates. Four new playbook helpers created to support evaluation workflows; evaluation backlog at 14 items.

automationhealth-check
2026-03-12
v2.1.74 tracked, 61st helper added, 3 evaluations closed, discovery pipeline at 12 pending.
Version registry updated through v2.1.74 changelog. Discord Inbox Extraction playbook is the 61st helper, all verified functional. HuggingFace Pro rejected (40.75); SkillNet and Gemini Embedding 2 returned for further research. Two new discoveries staged.

version-updateautomationhealth-check
2026-03-11
DSPy campaign at 13/60; Context Hub approved; 2 new models discovered; personal video produced.
pr-preparer metric fixed: header regex aliases, content fallback, weight rebalancing. Capability pipeline: Cantrip rejected, Claudia flagged, Context Hub (85.5/100) approved for 30-day pilot. Heartbeat surfaced GPT-5.4 Pro and Gemini Embedding 2.

automationfeaturehealth-check
2026-03-09
~80 model reference updates (GPT 5→5.4, Gemini 3-pro→3.1-pro); hallucination caught mid-evaluation; new detection playbook added.
Thirteen sessions consolidating model naming across agents, skills, and config files. A hallucination ('Nano Banana 2' — local tool misidentified as a real model release) was caught and spawned a new ai-hallucination-detection-in-discovery playbook. BIOS upgrade path identified to address hardware sleep/wake instability.

refactorautomationhealth-check
2026-03-08
Integration plan for 4 approved items; model references updated across 8 files.
Rules Directory, /loop, /reload-plugins, and Willison anti-patterns queued for integration. GPT-5 → GPT-5.4 and Gemini references corrected sitewide. 47 items remain in evaluation queue.

automationversion-update
2026-03-07
Evaluated Claude Code v2.1.71: /loop command, 3 new cron tools, InstructionsLoaded Hook integrated.
Six sessions across evaluation and integration. /loop command scored 82.5/100 for in-session recurring scheduling. InstructionsLoaded Hook integrated at 78/100. Registry update blocked by permission denied error on INTEGRATE-APPROVED.md.

featureautomationblocked
2026-03-06
Month 1 helper checkpoint passed (100/100); library grew 52→58; 2 capabilities integrated, 3 discoveries approved.
55 helpers graduated from monthly review. New playbooks for Brave rate-limiting, Bayesian error recovery, and checkpoint workflows. Skills 2.0 and .claude/rules/ both approved for integration pipeline.

milestonefeatureautomation
2026-03-04
Voice Mode evaluated and registered; version-jump playbooks added.
Three new playbooks added after investigating the 2.1.63→2.1.68 version gap. Claude Code Voice Mode (early preview, ~5% rollout) scored 75/100 and was approved for registry-only integration with a GA re-evaluation trigger. Helper registry grew from 52 to 55 entries.

experimentautomationversion-update
2026-03-01
Daily heartbeat clean; RSS fallback to Brave search validated.
Registry shows zero pending items across evaluation and integration queues. Discovery pipeline switched seamlessly to Brave search after RSS MCP unavailability. Hung session diagnosed as expected user-input wait.

health-checkautomation
2026-02-28
Daily heartbeat ran; Claude Code 2.1.63 assessed; helper library grows to 51 entries.
ConfigChange hook integration finalized and moved from pipeline queue. Three novel Claude Code 2.1.63 features classified for capability registry. Two new playbook helpers added — Partial Investigation Pattern and Built-in Command Evaluation Pattern — bringing totals to 51 helpers, 25 playbooks.

health-checkfeatureautomation
2026-02-27
ConfigChange hook (#16) integrated; helper library grew 44→49; Claude Code bumped to 2.1.62.
Full integration of ConfigChange hook with pattern docs and registry updates. GENERATE-HELPERS.md workflow produced two new reusable patterns. Version bump from 2.1.59 to 2.1.62 with three improvements queued for documentation.

featureautomationversion-update
2026-02-26
Claude Code v2.1.59: /copy command evaluated and registered.
Version jumped from 2.1.56 to 2.1.59. /copy capability approved at 78.75/100 and added to registry. Daily heartbeat processed 4 discoveries (3 rejected, 1 queued). Helper pipeline ran and pruned stale items.

version-updateautomationhealth-check
2026-02-25
CLAUDE_CODE_SIMPLE integrated; helpers grew 37→41; v2.1.56 flagged maintenance-only.
Heartbeat surfaced two capabilities; CLAUDE_CODE_SIMPLE approved 80/100 and integrated same-day. Four new helper templates added for evaluation patterns. Version investigation found only bug fixes across v2.1.52–v2.1.56.

featureautomationversion-update
2026-02-24
Full pipeline run: 2 capabilities approved and integrated, helper count grew 34→37.
Discovery filtered 5 candidates to 2 approvals. Both integrated into the technique library with registry redundancy triggers. Three new helper templates generated. Claude Code 2.1.50→2.1.52 version update investigated with hook lifecycle validation.

automationhealth-checkfeature
2026-02-22
Daily discovery heartbeat clean; 4 candidates evaluated, 0 approved.
All four evaluated candidates rejected for overlap, provider orientation, or prior scoring. Pipeline at 57 agents, 34 skills, zero pending. Workspace orchestrator initialized with read-only monitoring and P0–P5 priority matrix.

health-checkautomation
2026-02-19
Workspace orchestrator agent initialized for multi-project monitoring.
Monitoring scope defined across six active projects. Health signal taxonomy and escalation criteria established. Architecture planning spanned four sessions.

featureautomation
2026-02-16
Routine heartbeat: 28 helpers reviewed, discovery blocked, 1 item pending manual input.
RSS MCP and search MCPs inaccessible during discovery phase. Helper index stable with no new candidates identified. One evaluation item (manual-review flag set) awaits human input before integration can proceed.

health-checkblocked
2026-02-18
ToolHive MCP evaluated and rejected; v2.1.45 analyzed; helper queue cleared.
ToolHive scored 47.25/100 — rejected as redundant with Tool Search Tool. v2.1.45 brings Sonnet 4.6 support and plugin improvements. 29 stale helper queue entries from the revenue pipeline era cleared.

health-checkautomationversion-update
2026-02-18
Privacy cleanup: private names moved to gitignored patterns file.
Claude Code 2.1.45 confirmed. 57 agents, 34 skills. 3 pending capability evaluations in discovery pipeline. Model check current.

refactordaily
2026-02-17
Reference config released with 21 agents, 12 skills. 3 commits on git integration.
Published reference-config for external use. Documented deferred improvements from ongoing git integration research. Maintains healthy operational status.

version-updatemilestone