Mar 17, 2026

Deploying a Personal AI agent: A Journal from the Trenches

aka How I trained PAI to understand and enable me

Late February: The Starting Line

I didn’t set out to build an AI Chief of Staff. I set out to stop drowning in email and also to find a personal AI agent that didn’t sound near as scary from a cost and security perspective as OpenClaw.

I had 30 newsletters a day cluttering my Protonmail inbox. I had Todoist tasks that were really bookmarks pretending to be work. I had calendar invitations I was processing by hand like it was 2005. And I had a subscription to Claude Code Pro Max that I was underutilizing.

The idea was simple: Instead of building all the tools with PrescientFlow (my Claude workflow), describe the problems to my personal AI agent and build solutions together. Not wrestle with Claude Code in separate projects to create tools without a cohesive interface and Actually partner with it. I’d bring the 20 years of security architecture experience and the domain knowledge. The AI would bring the ability to write, test, and iterate code at a speed I never could alone.

I didn’t know it yet, but I was about to accelerate an emerging skill: AI agent management. Not prompt engineering — that’s typing. Management. Setting objectives, defining boundaries, reviewing output, giving feedback, and gradually extending trust as competence is demonstrated. The same skills I’ve used to manage humans for two decades, applied to a fundamentally different kind of collaborator.

Early March: The Plumbing Phase

The first tools were unglamorous. A bookmarker that synced URLs to a SQLite database. A Todoist integration that could tell the difference between an actual task and a saved link. An email fetcher that pulled my scheduler’s inbox programmatically.

Here’s the thing nobody tells you about working with an AI assistance: the first week is mostly about learning how to ask for what you want. I’d describe a problem — “I need to process calendar invitations from my Inbox” — and PAI would build a solution. Sometimes it was exactly right. Sometimes it over-engineered things. Sometimes it missed an edge case that was obvious to me but invisible to an AI that’s never managed an email inbox. It’s worth noting that Dan Miessler has built a rival to my PrescientFlow in expanding on Claude’s native excellent but sometimes dodgy planner to assure higher quality outcomes with minimal input from me.

My job wasn’t writing spec docs to get Claude to produce better outcomes. It was directing an AI that writes the spec docs. Describing the architecture. Reviewing the output. Catching the assumptions that don’t survive contact with production. Saying “no, simpler” when it added three layers of abstraction where two lines would do.

The newsletter digest came first. I told PAI: “30 newsletters a week, I need one summary email a day to bring my reading time down below 5 minutes per.” It designed the pipeline — scan inbox, summarize each with AI, compile, archive originals. 3-5 emails a day became one brief. But the first version crashed on HTML-heavy newsletters. The second version timed out on large batches. The third version worked. That debugging cycle — me describing the failure, PAI diagnosing and fixing — became the rhythm of how we work.

Then the calendar invitation processor. Parse ICS attachments, create Google Calendar events, send RFC 5546 acceptance replies, label and archive. I described the workflow; PAI built it. Any time someone sent me a meeting invite, acceptance, or change, it just appeared on my calendar and the inbox was clean. No clicks required.

These tools were reactive and narrow. Exactly what I needed to trust the system before handing it anything that mattered. But I was already learning something: the quality of what PAI produced was directly proportional to how well I managed the collaboration. Too vague asks got vague results. Even with PAI’s excellent planner workflow the more precise asks with clear constraints got production-ready code sooner and PAI was remembering what we did and learned from our failures.

The First Week of March: Things Start Breaking (This Is the Good Part)

The interesting part of any engineering story isn’t what worked. It’s what broke.

The Double-Logout Bug. My invitation processor failed 87 consecutive times. Eighty-seven. The error: “Connection not available.” I described the symptom to PAI. It read the code, traced the logic, and found it: a double-logout in an early return path. The code had explicit client.logout() calls in a “no invitations found” branch, but those same calls lived in finally blocks. First logout succeeded. Second hit a dead connection. 87 times.

The fix was two lines deleted. Not added. Deleted. I could have stared at that code for an hour and missed it. PAI found it in seconds because it could hold the entire control flow in its head simultaneously. That’s not intelligence — it’s a different kind of attention. And learning when to leverage that difference is the core skill of AI agent management.

The Unicode Ghost. Someone sent a CLOSE command for a remote session. Instead of closing it, PAI created a brand new one. Forensic investigation revealed that ProtonMail’s rich text editor was silently substituting Unicode lookalike characters for ASCII. Non-breaking hyphens (U+2011) instead of regular ones. The regex matched ASCII. The email contained Unicode. Everything looked identical to human eyes.

Three regex replacements fixed it. Root cause was invisible without the investigation.

These bugs taught me something I should have already known from 20+ years in security: the failure modes that matter are the ones you can’t see. But they also taught me something new: an AI partner that can read every line of code in a codebase simultaneously finds different bugs than a human who reads sequentially. We’re not redundant. We’re complementary.

March 3: The Bookmarks Reckoning

I ran a report on my Todoist “Today” view and found 28 bookmarked links staring back at me. Not tasks. Links. Saved from LinkedIn, sitting in my task list like they were actionable.

The full audit was worse: 640 tasks across 27 projects, but 416 of them — 65% — were bookmarks, not work. My task management system had become a link dumping ground.

This was the first moment I realized PAI wasn’t just automating my workflow. It was revealing the dysfunction in it. A human assistant wouldn’t have run that report. I wouldn’t have asked them to. But the data was right there, and once I saw it, I couldn’t unsee it.

Here’s where the AI agent management skill evolved. I didn’t say “write me a migration script.” I said: “These bookmarks need to move out of Todoist into something searchable. The new home needs to be accessible from my phone. Each bookmark needs AI-categorized topics. And after migration, label the Todoist tasks so I can review and close them.” PAI designed the architecture — SQLite locally, Google Sheets for mobile access, AI categorization in batches of 25, Todoist API labeling. It processed 446 bookmarks, categorized every one, pushed them to the Sheet, and labeled 415 Todoist tasks “migrated.” Then it rewired the daily cron from syncing Todoist to syncing the Google Sheet.

I described the outcome. PAI designed and executed the path. My job was to verify it worked and course-correct when it didn’t. That’s management.

March 8-10: The Research Machine

By this point I had a triage gate running — a local LLM (Gemma3 4b, on my own hardware) screening every incoming query in two passes. One for prompt injection detection. One for complexity routing. Simple queries went to Haiku. Complex ones went to Opus. Zero API cost for the screening.

This was me pushing PAI beyond its native capabilities. Claude Code doesn’t ship with a prompt injection pre-screening layer. It doesn’t have a local LLM triage system. I described the problem — “I want to screen inputs before they hit the API, using my own hardware” — and PAI and I designed a two-pass architecture together. Security gate first (clean/suspicious), then complexity router (simple/moderate/complex). The triage gate became one of our most important tools, even though the prompt injection research later revealed its blind spots.

I started using PAI for parallel multi-source research. Not “ask a question and get an answer.” I’d describe what I needed to understand, and PAI would spawn 3 to 9 research agents simultaneously — Claude, Gemini, Grok researchers each pursuing different angles on the same topic. The results would come back from independent sources, and PAI would synthesize the agreements and conflicts. FYI, Dan is awesome and sorted out prompts to emulate Gemini and Grok using Claude as a base. Didn’t want you to think I was running all 3 models with the added cost…

Thailand LTR visa requirements. Remote cybersecurity jobs in Southeast Asia. IAPP AIGP certification study material. Freelance platform analysis. Each one came back as a structured report cross-validated across multiple AI researchers.

The AIGP research was particularly useful. I was studying for the AI Governance Professional certification and using PAI itself as my case study for the risk assessment portions. There’s a certain poetry in using an AI system to study AI governance while simultaneously applying the governance framework to that same system.

March 10: Remote Command Execution Goes Live

This was the tool that changed the relationship. And the one that taught me the most about managing AI autonomy.

I described the concept to PAI: “I want to email a command from any device and have you execute it in a Claude session on my server.” PAI built the execute-remote system — 5 trait, zero-trust sender verification, email polling, session management, reply with results. The architecture was sound on day one. The operational edge cases took a week.

The security guy in me immediately started stress-testing it. What happens if someone spoofs the sender? What if the command contains injection? What if a long-running task holds the lock and nothing else can process?

That last one actually happened. PAI and I redesigned the lock architecture together. I described the problem; PAI proposed the solution: command-level locks for short operations, session-level locks for long-running ones, a fast-path for CLOSE commands. I reviewed it, poked holes, PAI iterated.

Then I discovered that permission denials from Claude — those moments where it says “I need to ask before doing this” — were getting silently swallowed in the remote context. No human at the keyboard means no one to approve. I told PAI: “Surface these as questions in the reply email.” It redesigned the permission flow so denied actions became readable questions I could respond to from my phone.

This is the part that’s hard to explain to people who haven’t done it. I’m not writing code. I’m not reviewing PRs. I’m managing an AI that writes code, and my value-add is the production context it can’t have — knowing that a permission denial on a remote session is a showstopper, not an edge case. Knowing that session subjects need to be human-readable because I’ll have six running simultaneously. Knowing that a lock held for 30 minutes blocks everything else.

The AI has the technical execution speed. I have the operational judgment. Neither is sufficient alone.

March 12: The Discovery Interview

This was the day the “Chief of Staff” idea became real. And the day I started managing PAI like I’d manage a new hire.

I’d been calling PAI my Chief of Staff half-jokingly. But I decided to take it seriously. So I did what any good manager does when onboarding someone to a complex role: I had them interview me. PAI designed 16 structured questions across five domains — daily operations, communication, business vision, job search strategy, and personal priorities. I didn’t write those questions. PAI did, based on its understanding of what a Chief of Staff needs to know.

The format was unusual. PAI delivered the questions through MeetBot’s chat in a Google Meet call. I spoke my answers. MeetBot captured the transcript, PAI processed it through a three-pass AI pipeline, and produced structured findings with direct quotes and thematic analysis. FYI, Dan built PAI to work on Linux but it’s really designed around Macs because he prefers to use Whisperflow to capture his thoughts and feed them directly into the terminal.

Some of what I said was uncomfortable to hear back:

“My to-do list… they’re not really well organized. They’re not really well aligned, and there’s more in the today’s view than I can actually probably accomplish.”

“I’ve got my other Claude sessions with PrescientFlow dinging me a lot, notifying me whenever they get done and they’re waiting for me. So, it’s frequently interrupting me, which is shortening the amount of time I have to spend on focused reading and making decisions.”

“There’s all kinds of news out there about prompt injections causing the agents that people have given some authority over their lives to tricking them into doing things that are bad.”

That last one is my security brain talking. I’m building an AI system that manages parts of my life, and I’m simultaneously studying the research on how those systems get compromised. The tension is productive. It keeps me honest.

The interview produced a structured findings document and a phased roadmap with trust gates – evidence-based, per-tool progression from “observe only” through “operate,” “suggest,” “act,” and eventually “delegate.” Not a timeline. Not a global switch. Per-tool, per-capability, measured from audit logs.

My security career taught me that trust is built in increments and revoked in instants. The roadmap reflects that.

March 12: MeetBot’s Zero-Trust Day

The same day as the discovery interview, I told PAI: “The display-name authentication on MeetBot is unacceptable. Anyone can change their display name.”

PAI designed and implemented a two-layer, zero trust auth system. Display name match as a quick filter, then Google Meet REST API verification of the sender’s immutable Google account ID. Fails closed — if the API verification fails for any reason, the command is denied. I reviewed the design, approved the approach, PAI wrote the code, I tested it.

Four distinct MeetBot fixes in one day, plus the discovery interview, plus a code review where PAI ran /simplify and found 46 issues across the codebase. I approved 12 high-value fixes — deduplication, parallel operations, streaming downloads. PAI executed all of them.

I remember this day clearly because it was the first day PAI felt like a colleague rather than a tool. I was giving direction, reviewing proposals, approving changes, and catching edge cases. PAI was designing solutions, writing code, running analyses, and surfacing problems I didn’t know I had. That’s not prompt engineering. That’s collaboration.

March 13: Extending the Native Capabilities

This is when I started deliberately pushing PAI beyond what Claude Code and PAI can do out of the box.

Claude Code and PAI can write code and run commands. They don’t natively have a Status Board with persistent notifications. They don’t have a web-based quiz system for certification study. They don’t have a permissions manifest framework for self-governance. They don’t have a triage gate that pre-screens its own inputs.

All of these are capabilities I asked PAI to build for itself. The Status Board — “I need a place to see what you’re working on without asking.” The AIGP quiz system — “I want my daily quiz on a web page I can open from a notification link, not in email.” The permissions manifests — “Every tool you operate needs documented rules for what you CAN do, MUST NOT do, and must ASK about.”

Each one started as me describing a gap, and PAI designing and building the solution. The quiz system went from “I want this” to live web app with SQLite, FTS5 search, AI grading, and history browsing in a single session. The bookmark migration went from “move these to Google Sheets” to a categorization pipeline processing 446 items with AI topic analysis in under 10 minutes.

I’m not a TypeScript developer. I’m a security architect who knows what systems should do. PAI is the developer. My job is knowing what to build and why. PAI’s job is knowing how to build it and doing it fast.

March 14: Pen-Testing My Own Product (With My PAI’s native capabilities)

I told PAI: “Run a full web application security assessment on RiskJuggler.ai.” That’s my own product — the one I’m building to help small businesses with cybersecurity. The irony was not lost on me.

PAI spawned a Pentester agent — a specialized sub-agent with security testing methodology, OWASP Top 10 knowledge, and the ability to run curl commands against the target. I didn’t write the test cases. PAI did, based on it’s native WebAssessment skill’s methodology. I reviewed the findings.

Results:

Severity

Count

Critical

2

High

4

Medium

8

Low

3

Total

17

The most dangerous finding: X-Forwarded-For rate limit bypass enabling unlimited password brute-forcing. PAI didn’t just find it — it wrote the proof-of-concept exploit and the recommended fix with code samples.

Over the next two days, the development team fixed the findings in waves and I tweaked PrescientFlow to add threat modeling and default ADRs to improve the developer inputs. Each time they said “done,” I told PAI: “Retest.” Four rounds total. PAI tracked what was fixed, what was partially fixed, what regressed, and what was new. By the end: 20 of 20 findings resolved, overall risk dropped from CRITICAL to LOW.

This is the kind of assessment that would cost $5,000-15,000 from a consultancy. PAI did four rounds of it in two days. And because I’m both the product owner and the security assessor, the feedback loop was measured in hours, not weeks.

March 15: The AIGP Quiz System

I told PAI: “The email-based quiz is clunky. I want a web page I can open from a notification link, answer questions with radio buttons and checkboxes, get AI-graded results inline, and browse my history by date or topic.”

PAI designed the whole thing in one session: SQLite database with FTS5 full-text search, Hono routes mounted on the existing Status Board server, dark-themed UI matching the notification system, inline AI grading with a spinner, and a history page with search and competency filtering. It migrated the 6 existing email-based quizzes into the new database. It rewired the daily cron to generate quizzes into the database and send a notification link instead of an email.

The next morning I found a bug — multi-line scenario options were getting truncated to the first line. I told PAI; it traced the issue to a regex that didn’t cross newlines, fixed it, and restarted the server. Total time from “this is broken” to “this is fixed”: about 3 minutes.

No SaaS subscription. No “upgrade for more features.” Just exactly what I needed to study for the AIGP exam, built to my specifications, running on my own hardware. That’s what happens when your AI partner can go from description to deployed web application in a single conversation.

The same day, I asked PAI to prepare a briefing document for our Chief of Staff working session. It pulled live data from the Todoist API (640 tasks, 27 projects), analyzed the bookmark topic distribution from the database, cross-referenced the discovery interview findings, and produced a 6-section briefing with concrete proposals for project restructuring, label taxonomy, and daily standup design. It converted the briefing to DOCX, uploaded it to Google Drive, and attached it to the calendar event.

Meeting prep for a working session with my AI. Which sounds absurd until you realize that “meeting prep” is exactly what a Chief of Staff does.

What We’ve Actually Built

Let me be concrete. In roughly three weeks, working as a human-AI partnership:

  • 9 tool directories under active development (meetbot, protonmail, aigp, messages, calendar, bookmarker, todoist, triage, health)

  • 20 security findings identified and remediated on my own product across 4 rounds of testing

  • 446 bookmarks AI-categorized and migrated out of my task system into Google Sheets

  • 640 tasks audited down to 224 actionable items with a restructuring plan

  • 10+ research reports produced by parallel multi-agent research teams

  • Zero-trust authentication designed for MeetBot chat commands

  • Remote command execution from any device via email with permission flow

  • A triage gate that pre-screens PAI’s own inputs for prompt injection

  • A web-based quiz system with AI grading and searchable history

  • A Chief of Staff roadmap with evidence-based trust gates (G0 through G4)

  • A formal discovery interview where the AI interviewed me about my own workflow

None of this was me writing TypeScript. All of it was me managing an AI that writes TypeScript. Describing problems, reviewing solutions, catching edge cases, giving feedback, extending trust incrementally.

I’m evolving from “person who uses AI tools” to “person who manages AI agents.” The skill set is different. It’s less about knowing how to code and more about knowing what to build, why to build it, the right questions to ask along the way, and when the output isn’t good enough. It’s about knowing when to let the AI run and when to intervene. When to say “do it” and when to say “explain first.”

The velocity is extraordinary. Things that would take me weeks or months to code myself get built in hours. But the quality is entirely dependent on how well I manage the process. Vague direction produces vague results. Precise direction with clear constraints and honest feedback produces production-ready systems.

What “AI Chief of Staff” Actually Means Day-to-Day

It doesn’t mean an autonomous agent running my life. Not yet. Maybe not ever, depending on what the prompt injection research produces over the next year.

It means I have a system that:

  • Processes my email intelligently instead of me scanning 30 newsletters and cannot delete anything

  • Manages my calendar invitations without a click and cannot delete anything

  • Runs security assessments on my own products

  • Conducts structured research across multiple sources in parallel

  • Monitors the health of its own infrastructure and tells me when something breaks

  • Takes meeting notes and sends minutes to the right people

  • Maintains its own governance documentation

  • Screens its own inputs for prompt injection using two models to detect malicious content

And most importantly, it means I have a system that I’ve deliberately designed to earn more autonomy over time, with explicit boundaries, audit trails, and a violation ratchet that resets trust to zero if a boundary is crossed.

I told PAI during the discovery interview: “I’m not sure I’m ready for a lot of proactive. But brainstorming ideas, that’s always a good thing.”

That’s still where we are. Reactive, bounded, building trust. The roadmap says Phase 2 (proactive operations) starts when Phase 0 and Phase 1 tools are stable. Phase 3 (autonomous back office) is a six-month horizon with a 90-day track record requirement. I’m not risk averse. I’m just accutely aware of the damage that can be done that most people don’t think about when driven by FOMO.

I’m applying my AIGP risk analysis training to the system I built with AI. I’m pen-testing my security product with an AI security assessor. I’m studying prompt injection while building an agent that processes my email.

The contradictions aren’t bugs. They’re the feature. The tension between “this is incredibly useful” and “this could go very wrong” is exactly the tension that produces good engineering and strong businesses.

Conclusions: The Part I Didn’t Expect

I’ve been in security for over 20 years. I’ve built SOCs, architected zero-trust networks, translated risk for boards that couldn’t spell NIST. None of that prepared me for the specific weirdness of collaborating with an AI on building a system that the AI then operates.

The thing I didn’t expect: the management skills transfer almost perfectly. Setting clear objectives. Defining boundaries before granting authority. Reviewing work product. Giving specific, actionable feedback. Extending autonomy incrementally based on demonstrated competence. These are the same things I did as a VP managing security teams. The “employee” is different. The management principles are identical.

What’s new is the speed and the feedback loop. When I tell PAI “this approach is wrong, here’s why,” the correction is immediate and permanent. When I say “I don’t want CISO-level positioning in that recruiter research, I want Architect and Director,” it rewrites the entire deliverable in minutes. When I say “the stop-record button failed twice and the bot got stuck,” PAI investigates its own code, identifies a race condition, adds a duplicate command guard, fixes the recovery path, enhances the logging, and commits the change. All while I watch.

The same principles that make a good security architecture make a good AI collaboration. Least privilege. Defense in depth. Trust but verify. Assume breach.

I’m not done. The Todoist restructuring needs execution. The focus-time system needs integration. The recruiter pipeline needs follow-through. The prompt injection vulnerabilities need patching. The marketing strategy for RiskJuggler.ai needs a first draft. We need a CRM and P/L tracking to run the business.

But three weeks in, I have something I didn’t have before: an operational AI assistant that I understand, that I’ve tested, that I’ve documented, and that I trust — within explicit, evidence-based boundaries. And I have a new skill: managing an AI agent as a genuine collaborator, not a fancy autocomplete.

That’s not a tool. That’s the beginning of a working relationship. And I’m getting better at my side of it every day.

Steve Genders writes about cybersecurity, AI governance, and the messy intersection of the two at riskjuggler.info. He is currently studying for the IAPP AI Governance Professional (AIGP) certification while simultaneously using AI to build the study tools, and pen-testing his own AI security product with his own AI security assessor. The irony sustains him. PAI contributed to this post — and then assessed its own contribution for prompt injection risk, because that’s apparently what we do now.

Mar 10, 2026

I Accidentally Built an AI Governance Framework — While Just Trying to Get Things Done

aka Security at the table is always better than by gate check later

 

Act 1: The Premise

Three weeks ago I had a collection of scripts on top of my Personal AI Infrastructure (see https://danielmiessler.com/blog/personal-ai-infrastructure). A Todoist integration here, a Protonmail fetcher there, a bookmarker that synced URLs to a SQLite database. Useful tools, loosely connected, no shared awareness of each other.

 

Today I have something different. I have a governed ecosystem with health monitoring, security triage, telemetry logging, architectural decision records, and a remote command system that lets me control my AI infrastructure from any device on earth by sending an email.

 

I didn't set out to build a governance framework. I wanted to simplify my life, but as I went I applied the many lessons learned over the last year building AI agent workflows while managing operational risks and AI weaknesses while learning more about AI, ML, and AI Risk Management on my path to AIGP certification.

 

Act 2: What We Actually Built

Let me be specific about "we." This is a collaboration between me and PAI, which is built on Claude, running on a Linux server in my office. When I say collaboration, I mean it in the most literal sense. I write in natural language what I need. PAI writes the code. I review, test, redirect, catch mistakes, and occasionally get caught making my own.

 

Here's what the last few weeks produced:

 

Remote Command Executor — I can email a command to my Protonmail inbox from anywhere, and PAI (my Personal AI Infrastructure) picks it up, verifies it's actually from me through five layers of checks, and executes it in a Claude session. I can resume sessions, close them, run multi-turn conversations. All by email.  This is far more than OpenCLaw does with it's Slack and Telegram integrations.

 

Health Check System — Six cron jobs run daily. Before the health system, a failing job was invisible until I noticed something was wrong three days later. Now every job reports success or failure to a central health-report.json, and a daily check at 7 AM alerts me through my Status Board if anything died overnight.

 

Triage Gate — A local LLM (Gemma3 4b, running on my own hardware) screens every incoming AI query from untrusted sources in two passes: one for prompt injection detection, one for complexity routing. Simple queries go to Haiku. Complex ones go to Opus. Suspicious ones get flagged and escalated. Zero cost per query. Latency dropped from 15 seconds to 7-9 seconds.

 

Newsletter Digest — Scans my inbox for newsletters, summarizes each one with AI, compiles a single daily digest, archives the originals. I went from 30 newsletters cluttering my inbox to one email at 6 AM.

 

Calendar Invitation Processor — Parses ICS attachments, creates Google Calendar events, sends RFC 5546 acceptance replies, labels and archives. Fully automated calendar management.

 

Daily Priority Report — Pulls my tasks from my todo service, separates real work from bookmarked URLs, groups by priority (P1 Critical through P4 Normal), clears dates from bookmark tasks that shouldn't be on my calendar, and sends a morning notification.

 

That's six EA type duties. All built in conversational English with PAI's excellent workflow which doesn't just bang on a problem over and over until it figures it out but thinks deep to plan beyond Claude's native strategizing to find the shortest path to the most effective outcome.

 

Act 3: When Things Broke (This Is Where It Gets Interesting)

The interesting part of any engineering story isn't what worked. It's what broke and what the failure revealed.

 

The Double-Logout Bug. My invitation processor failed 87 consecutive times. The error: "Connection not available." I asked PAI to investigate. We traced it through the ProtonMail Bridge status, ran a direct IMAP connection test, and finally found it: a double-logout in an early return path.

 

The code had explicit lock.release() and client.logout() calls in a "no invitations found" branch — but those same calls also lived in finally blocks. First logout succeeded. Second logout hit a dead connection. 87 times.

 

The fix was two lines deleted. Not added. Deleted.

 

The Unicode Ghost. I sent a CLOSE command for a previously requested remote action session. Instead of closing the session, PAI created a brand new one. I asked Claude to forensically investigate the session logs. What it found: ProtonMail's rich text editor was silently substituting Unicode lookalike characters for ASCII.

 

Non-breaking hyphens (U+2011) instead of regular hyphens. En dashes (U+2013). Fullwidth colons (U+FF1A). The session matching regex matched ASCII. The email contained Unicode. Everything looked identical to human eyes.

 

The fix: a Unicode normalization pass before parsing. Three regex replacements. Root cause was invisible.

 

The Lock Contention Architecture Flaw. This one was the most instructive. The remote command executor held a global process lock during the entire Claude session — which could run for 30 minutes on a long task. During that time, nothing else could process. Not even a simple end job command that takes 2 seconds.

 

I sent the close request and it just... sat there. Waiting for a completely unrelated 30-minute session to finish.

 

This wasn't a bug. This was an architecture problem. Claude presented three options. I chose decouple the IMAP fetch from the Claude execution using a disk queue. Phase 1 grabs emails and writes them to disk (seconds). Phase 2 processes them independently (minutes to hours). The end session operations execute instantly. Long sessions don't block anything.

 

We rebuilt the entire executor in one session. Queue directory with JSON files, per-item lock files, stale lock detection, atomic file creation with O_EXCL to prevent race conditions. Then ran a /simplify review that caught six more issues including a TOCTOU race we'd introduced in the lock acquisition.

 

Act 4: The Governance Question

Here's the part I didn't expect.

 

I'm studying for the AIGP — the AI Governance Professional certification from IAPP. The Body of Knowledge has four domains: Foundations of AI Governance, Laws and Standards, Governing AI Development, and Governing AI Deployment.

 

As I was studying Domain IV (deployment governance — monitoring, incident response, human oversight, deactivation controls), I realized something comforting. I wasn't learning these concepts fresh. I was already implementing them based on years of InfoSec instinct. Without calling them "controls I should be using".

 

Health monitoring isn't just good engineering hygiene. It's AIGP Domain IV: post-deployment monitoring with defined KPIs and alerting thresholds.

 

The triage gate isn't just a cost optimization. It's Domain III: risk assessment as a gate in the processing pipeline, with security screening before any AI execution. Plus Domain I: third-party risk management, since it controls which model tier handles which query.

 

Telemetry logging (every AI call logged with timestamp, input, output, latency, model, status) isn't just debugging infrastructure. It's Domain II: transparency and documentation requirements. Those JSONL files are governance artifacts.

 

ADR-001 (every tool must have a MISSION.md documenting origin, objectives, and key decisions) isn't just institutional memory. It's Domain III: documentation requirements for AI systems — model cards and data sheets by another name.

 

The remote command executor's five-layer email verification isn't just paranoia. It's Domain IV: secure integration with human oversight mechanisms and authentication controls aka Zero Trust.

 

I built all of this because it was the pragmatic engineering thing to do. Each piece solved a real problem I was experiencing. But when you step back and look at the whole system, it maps almost perfectly to a responsible AI governance framework.

 

That's the uncomfortable truth about AI governance: if you're doing the engineering well, you're already doing most of it. The gap isn't technical. It's documentation, intentionality, and connecting the dots.

 

Act 5: From Tools to Ecosystem

The real evolution wasn't any single tool. It was the pattern that emerged across all of them.

 

Week 1 was tools. Independent scripts that each did one thing. No shared state. No shared monitoring. No shared patterns.

 

Week 2 was infrastructure. Health checks connected the tools. Telemetry gave them observability. ADRs gave them institutional memory. The triage gate gave them a security perimeter.

 

Week 3 was governance. Not because I sat down and said "let's implement governance." Because each new problem revealed a missing control, and each control turned out to map to a governance domain.

 

Three architectural decision records now guide all new development:

 

  • ADR-001: Every tool must have a MISSION.md. Origin, objectives, decisions. No orphaned code.
  • ADR-002: Every cron job must register with the health check system. If it doesn't call recordResult(), it's invisible. Invisible is unacceptable.
  • ADR-003: Every AI call must log telemetry. Timestamp, input, output, latency, model, status. No unobserved AI decisions.

 

These aren't bureaucratic overhead. They're the minimum viable governance for a system that makes decisions autonomously.

 

Act 6: What I Actually Learned

I've been in security for over 30 years. I've written governance frameworks, compliance programs, risk assessments. They usually start with policy and work down toward implementation. Top-down. Intentional. Often disconnected from the engineering reality.

 

This went the other direction. The governance emerged bottom-up from engineering necessity. And it's stickier because of it.

 

When I implemented health monitoring, it wasn't because a policy said I should. It was because a cron job failed silently for three days and I was tired of finding out by accident. When I added telemetry, it wasn't for an audit trail. It was because a newsletter summary came back wrong and I had no way to see what the AI actually received as input.

 

Every governance control in PAI exists because its absence caused a concrete, felt problem. That's different from controls that exist because a framework said they should.

 

The AIGP v2.1 (2026) made a telling update: it shifted language from "models" to "systems." That resonates. I'm not governing a model. I'm governing a system — email verification, session management, triage routing, health monitoring, telemetry, documentation. The model is one component.

 

The other update that hit home: the 2026 version added agentic architectures as an explicit governance concern. PAI is exactly that — an autonomous agent that reads my email, makes decisions about what to do with it, and takes actions. Traditional governance wasn't designed for that. Neither was I, frankly.

 

But here's what three weeks of building taught me: governance for agentic AI isn't a separate discipline from engineering it. The same instincts that make you add error handling, monitoring, and logging — those are governance instincts. The gap is in naming them, connecting them, and making them systematic.

 

I didn't set out to build a governed AI ecosystem. I set out to fix a cron job that was failing 87 times in a row.

 

The governance was already in the engineering. It just needed someone to notice.

 

 

Steve Genders is a security architect, AIGP candidate, and the human half of a human-AI collaboration building PAI — Personal AI Infrastructure. He writes at [riskjuggler.info](https://www.riskjuggler.info) about the uncomfortable intersections of security, AI, and getting things done.