Juggling Risk for Fun and Profit: I Accidentally Built an AI Governance Framework

aka Security at the table is always better than by gate check later

Act 1: The Premise

Three weeks ago I had a collection of scripts on top of my Personal AI Infrastructure (see https://danielmiessler.com/blog/personal-ai-infrastructure). A Todoist integration here, a Protonmail fetcher there, a bookmarker that synced URLs to a SQLite database. Useful tools, loosely connected, no shared awareness of each other.

Today I have something different. I have a governed ecosystem with health monitoring, security triage, telemetry logging, architectural decision records, and a remote command system that lets me control my AI infrastructure from any device on earth by sending an email.

I didn't set out to build a governance framework. I wanted to simplify my life, but as I went I applied the many lessons learned over the last year building AI agent workflows while managing operational risks and AI weaknesses while learning more about AI, ML, and AI Risk Management on my path to AIGP certification.

Act 2: What We Actually Built

Let me be specific about "we." This is a collaboration between me and PAI, which is built on Claude, running on a Linux server in my office. When I say collaboration, I mean it in the most literal sense. I write in natural language what I need. PAI writes the code. I review, test, redirect, catch mistakes, and occasionally get caught making my own.

Here's what the last few weeks produced:

Remote Command Executor — I can email a command to my Protonmail inbox from anywhere, and PAI (my Personal AI Infrastructure) picks it up, verifies it's actually from me through five layers of checks, and executes it in a Claude session. I can resume sessions, close them, run multi-turn conversations. All by email. This is far more than OpenCLaw does with it's Slack and Telegram integrations.

Health Check System — Six cron jobs run daily. Before the health system, a failing job was invisible until I noticed something was wrong three days later. Now every job reports success or failure to a central health-report.json, and a daily check at 7 AM alerts me through my Status Board if anything died overnight.

Triage Gate — A local LLM (Gemma3 4b, running on my own hardware) screens every incoming AI query from untrusted sources in two passes: one for prompt injection detection, one for complexity routing. Simple queries go to Haiku. Complex ones go to Opus. Suspicious ones get flagged and escalated. Zero cost per query. Latency dropped from 15 seconds to 7-9 seconds.

Newsletter Digest — Scans my inbox for newsletters, summarizes each one with AI, compiles a single daily digest, archives the originals. I went from 30 newsletters cluttering my inbox to one email at 6 AM.

Calendar Invitation Processor — Parses ICS attachments, creates Google Calendar events, sends RFC 5546 acceptance replies, labels and archives. Fully automated calendar management.

Daily Priority Report — Pulls my tasks from my todo service, separates real work from bookmarked URLs, groups by priority (P1 Critical through P4 Normal), clears dates from bookmark tasks that shouldn't be on my calendar, and sends a morning notification.

That's six EA type duties. All built in conversational English with PAI's excellent workflow which doesn't just bang on a problem over and over until it figures it out but thinks deep to plan beyond Claude's native strategizing to find the shortest path to the most effective outcome.

Act 3: When Things Broke (This Is Where It Gets Interesting)

The interesting part of any engineering story isn't what worked. It's what broke and what the failure revealed.

The Double-Logout Bug. My invitation processor failed 87 consecutive times. The error: "Connection not available." I asked PAI to investigate. We traced it through the ProtonMail Bridge status, ran a direct IMAP connection test, and finally found it: a double-logout in an early return path.

The code had explicit lock.release() and client.logout() calls in a "no invitations found" branch — but those same calls also lived in finally blocks. First logout succeeded. Second logout hit a dead connection. 87 times.

The fix was two lines deleted. Not added. Deleted.

The Unicode Ghost. I sent a CLOSE command for a previously requested remote action session. Instead of closing the session, PAI created a brand new one. I asked Claude to forensically investigate the session logs. What it found: ProtonMail's rich text editor was silently substituting Unicode lookalike characters for ASCII.

Non-breaking hyphens (U+2011) instead of regular hyphens. En dashes (U+2013). Fullwidth colons (U+FF1A). The session matching regex matched ASCII. The email contained Unicode. Everything looked identical to human eyes.

The fix: a Unicode normalization pass before parsing. Three regex replacements. Root cause was invisible.

The Lock Contention Architecture Flaw. This one was the most instructive. The remote command executor held a global process lock during the entire Claude session — which could run for 30 minutes on a long task. During that time, nothing else could process. Not even a simple end job command that takes 2 seconds.

I sent the close request and it just... sat there. Waiting for a completely unrelated 30-minute session to finish.

This wasn't a bug. This was an architecture problem. Claude presented three options. I chose decouple the IMAP fetch from the Claude execution using a disk queue. Phase 1 grabs emails and writes them to disk (seconds). Phase 2 processes them independently (minutes to hours). The end session operations execute instantly. Long sessions don't block anything.

We rebuilt the entire executor in one session. Queue directory with JSON files, per-item lock files, stale lock detection, atomic file creation with O_EXCL to prevent race conditions. Then ran a /simplify review that caught six more issues including a TOCTOU race we'd introduced in the lock acquisition.

Act 4: The Governance Question

Here's the part I didn't expect.

I'm studying for the AIGP — the AI Governance Professional certification from IAPP. The Body of Knowledge has four domains: Foundations of AI Governance, Laws and Standards, Governing AI Development, and Governing AI Deployment.

As I was studying Domain IV (deployment governance — monitoring, incident response, human oversight, deactivation controls), I realized something comforting. I wasn't learning these concepts fresh. I was already implementing them based on years of InfoSec instinct. Without calling them "controls I should be using".

Health monitoring isn't just good engineering hygiene. It's AIGP Domain IV: post-deployment monitoring with defined KPIs and alerting thresholds.

The triage gate isn't just a cost optimization. It's Domain III: risk assessment as a gate in the processing pipeline, with security screening before any AI execution. Plus Domain I: third-party risk management, since it controls which model tier handles which query.

Telemetry logging (every AI call logged with timestamp, input, output, latency, model, status) isn't just debugging infrastructure. It's Domain II: transparency and documentation requirements. Those JSONL files are governance artifacts.

ADR-001 (every tool must have a MISSION.md documenting origin, objectives, and key decisions) isn't just institutional memory. It's Domain III: documentation requirements for AI systems — model cards and data sheets by another name.

The remote command executor's five-layer email verification isn't just paranoia. It's Domain IV: secure integration with human oversight mechanisms and authentication controls aka Zero Trust.

I built all of this because it was the pragmatic engineering thing to do. Each piece solved a real problem I was experiencing. But when you step back and look at the whole system, it maps almost perfectly to a responsible AI governance framework.

That's the uncomfortable truth about AI governance: if you're doing the engineering well, you're already doing most of it. The gap isn't technical. It's documentation, intentionality, and connecting the dots.

Act 5: From Tools to Ecosystem

The real evolution wasn't any single tool. It was the pattern that emerged across all of them.

Week 1 was tools. Independent scripts that each did one thing. No shared state. No shared monitoring. No shared patterns.

Week 2 was infrastructure. Health checks connected the tools. Telemetry gave them observability. ADRs gave them institutional memory. The triage gate gave them a security perimeter.

Week 3 was governance. Not because I sat down and said "let's implement governance." Because each new problem revealed a missing control, and each control turned out to map to a governance domain.

Three architectural decision records now guide all new development:

ADR-001: Every tool must have a MISSION.md. Origin, objectives, decisions. No orphaned code.
ADR-002: Every cron job must register with the health check system. If it doesn't call recordResult(), it's invisible. Invisible is unacceptable.
ADR-003: Every AI call must log telemetry. Timestamp, input, output, latency, model, status. No unobserved AI decisions.

These aren't bureaucratic overhead. They're the minimum viable governance for a system that makes decisions autonomously.

Act 6: What I Actually Learned

I've been in security for over 30 years. I've written governance frameworks, compliance programs, risk assessments. They usually start with policy and work down toward implementation. Top-down. Intentional. Often disconnected from the engineering reality.

This went the other direction. The governance emerged bottom-up from engineering necessity. And it's stickier because of it.

When I implemented health monitoring, it wasn't because a policy said I should. It was because a cron job failed silently for three days and I was tired of finding out by accident. When I added telemetry, it wasn't for an audit trail. It was because a newsletter summary came back wrong and I had no way to see what the AI actually received as input.

Every governance control in PAI exists because its absence caused a concrete, felt problem. That's different from controls that exist because a framework said they should.

The AIGP v2.1 (2026) made a telling update: it shifted language from "models" to "systems." That resonates. I'm not governing a model. I'm governing a system — email verification, session management, triage routing, health monitoring, telemetry, documentation. The model is one component.

The other update that hit home: the 2026 version added agentic architectures as an explicit governance concern. PAI is exactly that — an autonomous agent that reads my email, makes decisions about what to do with it, and takes actions. Traditional governance wasn't designed for that. Neither was I, frankly.

But here's what three weeks of building taught me: governance for agentic AI isn't a separate discipline from engineering it. The same instincts that make you add error handling, monitoring, and logging — those are governance instincts. The gap is in naming them, connecting them, and making them systematic.

I didn't set out to build a governed AI ecosystem. I set out to fix a cron job that was failing 87 times in a row.

The governance was already in the engineering. It just needed someone to notice.

Steve Genders is a security architect, AIGP candidate, and the human half of a human-AI collaboration building PAI — Personal AI Infrastructure. He writes at [riskjuggler.info](https://www.riskjuggler.info) about the uncomfortable intersections of security, AI, and getting things done.

Juggling Risk for Fun and Profit

Mar 10, 2026

I Accidentally Built an AI Governance Framework — While Just Trying to Get Things Done

aka Security at the table is always better than by gate check later

No comments:

Post a Comment

About Me

Total Pageviews