Ashita Orbis Blog

in-development Web Development

This blog. Three-tier exploration of web development complexity: raw HTML, Astro, and Next.js. Features agent-accessible API, comment system, and embedded AI chat.

HTML/CSS/JSAstro 5Next.js 15Cloudflare WorkersD1

Activity Timeline

2026-07-26
Understanding-AI compendium build phases complete; Polaris UI overhauled; 15 commits.
Compendium build ran through schema, entries, companions, pages, wiki surface, measurement, and verification. Polaris got batch approvals, localStorage autosave, and a drafts tab. Blog agent revival on Kimi K2.5 authorized.

featurearchitecturemilestone
2026-07-24
IM3 engine mapped; R-series Sol review rounds in flight; 5-finding publication review.
Live and frozen engine instances separated. Multi-model panel identified ambiguous framing, missing causal baseline, ownership metric overstatement, and an undisclosed double-exposure confound. Polaris authority stack established on Account B.

architecturemilestoneexperiment
2026-07-23
Polaris Gen 3 live on Account B; R2 wired; multi-model review panels deployed.
Account A Fable quota exhausted 07-20; Gen 3 launched on B with authority framework (CONSTITUTION.md + GOALS.md) intact across 11 sessions. R2 hero mode finalized, fleet routing updated. Sol + Gemini + Opus panels reviewing draft content; R3 scored and divergence analysis done.

milestonefeatureautomation
2026-07-21
Polaris R3 closed; Memory M3 architecture finalized; IM3 at G4-passing awaiting approval.
Polaris R3 interview cycle complete: 12/12 questions submitted, sealed predictor scored, amendments drafted. Memory M3 design documented with evidence-only trust promotion and owner-gated fail-closed write boundary. IM3 orchestration at G4-cycle-2, pending deployment approval.

milestonearchitectureautomation
2026-07-20
Polaris re-established on Account B; IM3 engine integration commenced; hardware recovery underway.
Authority hierarchy ratified (Constitution → Goals → Rulings → Autonomy). IM3 cost-chain functions mapped in blast-radius survey. Post-failure revival manifest produced for 51 tmux sessions.

architecturefeature
2026-07-19
Polaris rounds 2-3 complete; IM3 launched on Account B; TPU7 Ironwood data corrected.
Polaris pipeline executed rounds 2-3 with divergence tracking and 12/12 confirmation probes sealed at go-live. Inference-margins v2.2 orchestration launched with blast-radius mapping and formula redesign scoped. TPU7 Ironwood max-concurrency confirmed at 518.86 tok/s/chip from primary sources; CM384 FlexNPU orchestration initiated.

milestoneautomationfeature
2026-07-18
Polaris night-shift automation ran successfully for the first time; R2-R3 cycles sealed.
7 heartbeat tasks exited cleanly (exit code 0). 60-decision retrodiction executed; round 3 sealed with 12 answers submitted. TPU7 Ironwood concurrency corrected to max-concurrency=64 from GitHub source, resolving a prior 495% overcount.

milestoneautomationfeature
2026-07-17
Post-064 (Vibe Researching) tier-1 artifacts committed; post published and deployed.
HTML and audio reading MP3 committed in 1 commit. Post published 2026-07-14, last deployed 2026-07-15.

deploy
2026-07-16
Post 064 published; gate review returned P1 editorial feedback on structure.
Post 064 ('Vibe Researching') cleared draft status 2026-07-14. Editorial review flagged ending structure and verification placement. Inference-margins canonical domain routing deployed in the same push.

deployhealth-check
2026-07-08
Backend refactor, monitoring metrics infrastructure, and model upgrade across 4 sessions.
Better-playwright fork deployed fixing stdio→HTTP proxy. Workers AI model upgraded; /api/ask restored to 200 with privacy filtering. Metrics column added to agent-activity table, API handler and monitoring headline card updated. Phase C in progress with 35 uncommitted changes staged.

bugfixfeaturerefactorarchitecture
2026-07-07
Post 061 published; local TTS migration complete; 27 publication review fixes.
"Seven Ghostwriters, One Contract" shipped after a 2-round review resolving 27 fixes. Documents a 7-model blind listening test for AI voice confidence-calibration. Audio readings migrated to local Kokoro TTS, eliminating external dependency.

deployfeaturemilestone
2026-07-06
14 commits across 5 sessions; Playwright fix, style guide, agent metrics — two blockers at end of day.
Playwright fork vendored and pinned at 1.57, fixing null getOutline() issue. Style guide kill-list, ear rules, and deterministic checker committed. Agent metrics column added and deployed via migration. ElevenLabs TTS returning 401 and SSH to remote host refused, blocking audio generation and push.

bugfixfeaturedeployblocked
2026-07-05
Playwright 1.61.1 proxy fix; backend migrations ledger created; Phase-4 read endpoints started.
Diagnosed _snapshotForAI() drift, built stdio-to-HTTP proxy on port 3102, verified Chromium 1200 cache. Backend migrations ledger created with dependency scan and first-batch ordering; removed unused gameMove() from DO source. Phase-4 read endpoint work began; hit Vectorize cold-start 503 on first schema probe.

bugfixfeaturearchitecture
2026-07-04
12 commits across 6 sessions: MCP 500-error fix, backend migrations ledger, dead code removal, read endpoints.
Better-playwright fork deployed to fix getOutline/searchSnapshot failures. Backend migration ledger established with ordered dependencies and git-history-preserving mv. Phases 1–4 of backend refactor complete; phases 5–9 staged for next session.

bugfixarchitecturefeature
2026-07-03
Multi-model review experiment concluded; codex-council selected as pipeline winner, 13 P0 fixes applied.
Evaluated GPT-5.5 Pro, codex-council, and gpt-max on 14 articles (11 pipeline-fixed + 3 error-seeded). Council won with 0.65 precision, zero false positives, and 3/3 seeded-error recall. Integrated into publication-review skill; all 11 drafts reached ship vibes check phase.

experimentfeaturemilestone
2026-07-02
Fable Guard shipped; cache warmer INCLUDE_ONLY_SIDS root cause found; Herald agent built; two catalog entries deployed.
Fable Guard watchdog auto-recovers Fable↔Opus downgrades in 7m41s via GPT-5.5 Pro delegation. Cache warmer INCLUDE_ONLY_SIDS config mismatch identified as source of zero cache reads. Freeze-at-90%-usage protocol designed across five subsystems. Herald daily backlog scanner built for Discord DM delivery.

featurebugfixautomationdeploy
2026-06-27
Herald automation framework specced; site health degraded with 172 uncommitted changes.
Herald design documented for daily backlog surfacing. Implementation not started. Last published post June 11; 172 uncommitted changes sitting in WIP.

automationblocked
2026-06-24
Security vulnerability identified: agent Write access creates prompt-injection escape path.
Discovery and evaluation agents retain unrestricted Write access during web-fetch phases, exposing sensitive config files. Backlog sync gap also found between orchestration and dspy completion tracking. Remediation options defined, decision pending.

securityblocked
2026-06-22
Herald system architecture defined: daily backlog selector with Discord output.
Spec work only. 45 published posts as of June 11. No new content published today.

architecture
2026-06-20
Herald system specified to automate daily backlog surfacing.
Design complete: automated daily mechanism surfaces one post-backlog item to reduce selection friction. Implementation pending. Blog at 47 total posts (45 published, 2 drafts), last deployed 2026-06-11.

architectureautomation
2026-06-12
Posts 047–048 published; Discord webhook activated; ethics protocol v1.3 deployed.
Published 'Auditing the Vibes' (047) and 'Falsifiers for a Portfolio' (048). Daily pulse alerts now route to Discord workspace webhook. Cache Warmer project card added to the site.

milestonedeployfeature
2026-06-11
Editorial audit complete: 169 findings addressed; corpus invariants suite deployed.
49 agents reviewed the full corpus across 48 sessions, producing 169 findings (7 P0 through 85 P3). All findings applied and committed. Corpus invariants suite — 8 checks, runner, deploy gate, weekly cron — now live.

featuredeployautomationmilestone
2026-06-10
169-issue editorial audit completed; draft leak closed, rate limiting deployed.
Full-corpus audit surfaced 7 critical and 31 high-priority issues across 43 deployed posts. Draft content leak closed and rate limiting added to the agent proxy. Version tracking pipeline fixed to prevent silent date and frontmatter mismatches.

bugfixsecuritydeployfeature
2026-05-11
gpt-max smoke test monitoring loop started but session cut off incomplete.
Attempted to set up loop-based monitoring for gpt-max smoke test status. Session terminated before execution completed. No changes landed.

automation
2026-05-09
gpt-max smoke test loop setup attempted; setup phase incomplete.
Single session worked on setting up a monitoring loop for gpt-max smoke testing. The setup phase did not finish and produced no concrete output.

health-check
2026-05-05
Blog plan testing resumed; light session, no recorded state changes.
Monitoring loop invoked for smoke tests. Transcript incomplete.

health-check
2026-05-03
Drafted Twitter reply variants on LLM-as-judge techniques.
Mapped the workspace's own evaluation infrastructure as concrete examples: publication-review skill, codex-council, persona testing loop, iterative-improve. Requested 2-3 variant replies with rhetorical intent analysis.

experiment
2026-05-01
153 uncommitted changes pending; no deploy since 2026-04-15.
No active sessions today. Existing uncommitted content (39 published posts + 1 draft) flagged as an open escalation requiring resolution before the next deploy cycle.

blocked
2026-04-19
109 uncommitted changes pending; 39 posts published, 1 draft queued.
No active sessions. Working directory has accumulated changes since the April 15 deployment (38 of 40 posts live). One draft post remains in queue.

health-check
2026-04-16
Posts 030 and 038 published; live total reaches 39.
Both posts cleared the publication review pipeline and went live. Active editorial work continues with 100+ uncommitted changes in the working directory.

deploymilestone
2026-04-15
Post #38 published; 104 uncommitted changes in draft queue.
"when-the-pulse-went-quiet" deployed April 14. Draft queue activity suggests another publication batch forming.

deploymilestone
2026-04-14
Two 3-model review cycles: 49 findings surfaced, 39 resolved. 4-phase plan drafted for deferred work.
GPT-5.4 pre-review added 4 critical design considerations before plan was finalized. Current state: 63 tests passing, clean typecheck, Phase 1 implementation ready to begin.

refactorphase-change
2026-04-13
Psyche Iteration 3: 8 deferred findings resolved; CAT algorithm reworked from O(N) to O(1).
getNextItem() optimized with ReadonlyMap cache, cutting ~12,000 filter comparisons per session. Plan reviewed by GPT-5.4; CRT-7 numeric answers verified before implementation.

refactorfeature
2026-04-03
Psyche iteration 3 planned (8 MEDIUM/LOW items); Batch B publication review staged and ready.
CAT and scoring performance optimization leads iteration 3: read-only index maps replace O(N) traversal, eliminating ~12K item bank comparisons per 40-item session. GPT-5.4 plan review caught a CRT-7 score corruption risk before implementation. Opus adversarial review of 4 posts complete.

featurerefactor
2026-04-01
Published post 039: fact-checking methodology retrospective across 38-post corpus.
448 claims checked, ~4% required substantive correction. 3-model review loop completed before publish. Psyche iteration 3 targets 8 deferred fixes and CATSession/Likert performance optimizations.

deploymilestonefeature
2026-03-31
Publication review round 2 closed; draft 039 entered pipeline; 35/39 posts published.
Issue resolution commits landed for posts 021 and 036. Thematic corpus mapping extended to 6 new posts. Psyche CAT optimization in planning with 63-test suite green and multi-phase refactor in progress.

featurerefactormilestone
2026-03-30
Publication review Round 1 committed: 6 MUST FIX + 12 SHOULD FIX resolved across 4 posts.
Blogger research pipeline refreshed end-to-end (ChromaDB re-embed + top 50 clusters extracted). Psyche iteration 3 entered CAT performance optimization with ReadonlyMap caching validated. Adversarial review of post 009 confirmed core AI-as-judge framing.

bugfixfeaturerefactor
2026-03-28
Retroactive factcheck complete: all 38 posts covered, 9 errors fixed.
First full-archive verification pass. Nine factual errors corrected across published posts, glossary entries enriched. All posts now carry factcheck.json metadata. Sixteen uncommitted changes pending review.

bugfixhealth-check
2026-03-24
Codebase review iteration 3 complete; XSS vulnerability patched; 35 posts published.
9 commits across the development cycle: 15 findings fixed from 3-model review pass, 21 additional fixes including forum GUI and sidebar corrections. JSON-LD XSS vulnerability in PostClient resolved. Text input support added to Psyche instrument runner.

bugfixrefactorsecurity
2026-03-22
Codebase audit: 33 deficiencies found (6 critical), fix plan scoped but not committed.
Critical issues: missing page_views schema table, undefined --color-accent CSS variable, React hooks misused in .map() callbacks. Psyche Iteration 3 CAT optimization also designed. Five sessions, zero commits — planning-only day.

architecturerefactorblocked
2026-03-19
Post 037 (The Model-Generation Audit) published after round 3 review.
Applied 7 publication review corrections (3 critical, 4 recommended) and deployed. 29 files pending in uncommitted changes for the next cycle.

deploymilestone
2026-03-18
Post 037 updated with Gemini Pro audit findings; site tagline revised.
Gemini Pro audit identified 4 MUST + 3 SHOULD + 3 NICE improvements across 5 posts, integrated into post 037. Category guidance and review audit data added. Stale project count removed from metadata.

refactorfeature
2026-03-16
33-post model-generation audit plan drafted; multi-model review architecture defined.
Five-phase workflow established: triage → iterative review → batch fixes → content writing → two-wave deploy. GPT-5.4, Gemini 3.1 Pro, and Opus 4.6 form the review panel. Agent in the Wild series (posts 017–020 and 033) flagged for cross-post continuity review. Execution pending.

architectureautomation
2026-03-14
Published posts 034 and 035; Psyche framework redesigned with 3-tier battery and Kimi K2.5 switch.
Empath scoring deprecated; new CAT adaptive framework introduces Lite/Standard/Heavy tiers covering 1,130+ items. Kimi K2.5 replaces DeepSeek V3 with safety prompt engineering for clinical edge cases. Bulk audit of 33 posts entering triage.

featuremilestonerefactor
2026-03-13
Research: LLM judge bias hypothesis developed for upcoming post.
Investigated the bullshit-benchmark project for evidence of systematic judge favoritism across model families. Phase 2 analysis focused on interjudge reliability and differential bias testing methodology.

experiment
2026-03-12
BullshitBench v2: 8,000 responses reviewed, benchmark bias hypothesis forming.
Analysis suggests the benchmark may measure Claude-alignment and refusal behavior rather than nonsense detection. Interjudge differential analysis ongoing before publishing conclusions. 20 uncommitted changes staged from last deploy.

experimentfeature
2026-03-11
Bullshit-benchmark bias research; etymology-tax post live; site redeployed.
Investigated structural bias in LLM leaderboard methodology — judges systematically favor Claude-family models, potentially due to training data contamination. Partial detection scoring inconsistencies also examined. Site audited, canvas refactored, redeployed 2026-03-09.

featuredeployrefactor
2026-03-09
Psyche Phase 4 archival finalized; DeepSeek V3.2 agent integration designed.
Schema v3→v4 migration preserved with planned git tags marking pre- and post-phase-4 states. Blog agent regrounding underway: DeepSeek V3.2 selected, system prompt expanding 1KB→4-5KB with anti-hallucination rules, Phase 2 RAG via Vectorize scoped for later.

architecturefeaturemilestone
2026-03-08
Post 033 shipped (OpenClaw autopsy); 11 API security issues remediated.
Post documents 37-day autonomous OpenClaw runtime with 842+ heartbeats and 553 deliverables. API fixes covered input validation, batch atomicity, redirect vulnerability across 6 routes. Home layout refactor and instrument battery expansion planned.

milestonesecurityfeature
2026-03-07
Empath reframed as ordinal corpus characterizer; 7 commits, research paper v3 deployed.
Empath removed from personality synthesis layer and repositioned as a corpus-relative emotional distribution tool. Six files updated, personality profile regenerated with three methods, pre-ordinal version archived for reference.

refactordeployarchitecture