Juggling Risk for Fun and Profit: Deploying a Personal AI agent: A Journal from the Trenches

aka How I trained PAI to understand and enable me

Late February: The Starting Line

I didn’t set out to build an AI Chief of Staff. I set out to stop drowning in email and also to find a personal AI agent that didn’t sound near as scary from a cost and security perspective as OpenClaw.

I had 30 newsletters a day cluttering my Protonmail inbox. I had Todoist tasks that were really bookmarks pretending to be work. I had calendar invitations I was processing by hand like it was 2005. And I had a subscription to Claude Code Pro Max that I was underutilizing.

The idea was simple: Instead of building all the tools with PrescientFlow (my Claude workflow), describe the problems to my personal AI agent and build solutions together. Not wrestle with Claude Code in separate projects to create tools without a cohesive interface and Actually partner with it. I’d bring the 20 years of security architecture experience and the domain knowledge. The AI would bring the ability to write, test, and iterate code at a speed I never could alone.

I didn’t know it yet, but I was about to accelerate an emerging skill: AI agent management. Not prompt engineering — that’s typing. Management. Setting objectives, defining boundaries, reviewing output, giving feedback, and gradually extending trust as competence is demonstrated. The same skills I’ve used to manage humans for two decades, applied to a fundamentally different kind of collaborator.

Early March: The Plumbing Phase

The first tools were unglamorous. A bookmarker that synced URLs to a SQLite database. A Todoist integration that could tell the difference between an actual task and a saved link. An email fetcher that pulled my scheduler’s inbox programmatically.

Here’s the thing nobody tells you about working with an AI assistance: the first week is mostly about learning how to ask for what you want. I’d describe a problem — “I need to process calendar invitations from my Inbox” — and PAI would build a solution. Sometimes it was exactly right. Sometimes it over-engineered things. Sometimes it missed an edge case that was obvious to me but invisible to an AI that’s never managed an email inbox. It’s worth noting that Dan Miessler has built a rival to my PrescientFlow in expanding on Claude’s native excellent but sometimes dodgy planner to assure higher quality outcomes with minimal input from me.

My job wasn’t writing spec docs to get Claude to produce better outcomes. It was directing an AI that writes the spec docs. Describing the architecture. Reviewing the output. Catching the assumptions that don’t survive contact with production. Saying “no, simpler” when it added three layers of abstraction where two lines would do.

The newsletter digest came first. I told PAI: “30 newsletters a week, I need one summary email a day to bring my reading time down below 5 minutes per.” It designed the pipeline — scan inbox, summarize each with AI, compile, archive originals. 3-5 emails a day became one brief. But the first version crashed on HTML-heavy newsletters. The second version timed out on large batches. The third version worked. That debugging cycle — me describing the failure, PAI diagnosing and fixing — became the rhythm of how we work.

Then the calendar invitation processor. Parse ICS attachments, create Google Calendar events, send RFC 5546 acceptance replies, label and archive. I described the workflow; PAI built it. Any time someone sent me a meeting invite, acceptance, or change, it just appeared on my calendar and the inbox was clean. No clicks required.

These tools were reactive and narrow. Exactly what I needed to trust the system before handing it anything that mattered. But I was already learning something: the quality of what PAI produced was directly proportional to how well I managed the collaboration. Too vague asks got vague results. Even with PAI’s excellent planner workflow the more precise asks with clear constraints got production-ready code sooner and PAI was remembering what we did and learned from our failures.

The First Week of March: Things Start Breaking (This Is the Good Part)

The interesting part of any engineering story isn’t what worked. It’s what broke.

The Double-Logout Bug. My invitation processor failed 87 consecutive times. Eighty-seven. The error: “Connection not available.” I described the symptom to PAI. It read the code, traced the logic, and found it: a double-logout in an early return path. The code had explicit client.logout() calls in a “no invitations found” branch, but those same calls lived in finally blocks. First logout succeeded. Second hit a dead connection. 87 times.

The fix was two lines deleted. Not added. Deleted. I could have stared at that code for an hour and missed it. PAI found it in seconds because it could hold the entire control flow in its head simultaneously. That’s not intelligence — it’s a different kind of attention. And learning when to leverage that difference is the core skill of AI agent management.

The Unicode Ghost. Someone sent a CLOSE command for a remote session. Instead of closing it, PAI created a brand new one. Forensic investigation revealed that ProtonMail’s rich text editor was silently substituting Unicode lookalike characters for ASCII. Non-breaking hyphens (U+2011) instead of regular ones. The regex matched ASCII. The email contained Unicode. Everything looked identical to human eyes.

Three regex replacements fixed it. Root cause was invisible without the investigation.

These bugs taught me something I should have already known from 20+ years in security: the failure modes that matter are the ones you can’t see. But they also taught me something new: an AI partner that can read every line of code in a codebase simultaneously finds different bugs than a human who reads sequentially. We’re not redundant. We’re complementary.

March 3: The Bookmarks Reckoning

I ran a report on my Todoist “Today” view and found 28 bookmarked links staring back at me. Not tasks. Links. Saved from LinkedIn, sitting in my task list like they were actionable.

The full audit was worse: 640 tasks across 27 projects, but 416 of them — 65% — were bookmarks, not work. My task management system had become a link dumping ground.

This was the first moment I realized PAI wasn’t just automating my workflow. It was revealing the dysfunction in it. A human assistant wouldn’t have run that report. I wouldn’t have asked them to. But the data was right there, and once I saw it, I couldn’t unsee it.

Here’s where the AI agent management skill evolved. I didn’t say “write me a migration script.” I said: “These bookmarks need to move out of Todoist into something searchable. The new home needs to be accessible from my phone. Each bookmark needs AI-categorized topics. And after migration, label the Todoist tasks so I can review and close them.” PAI designed the architecture — SQLite locally, Google Sheets for mobile access, AI categorization in batches of 25, Todoist API labeling. It processed 446 bookmarks, categorized every one, pushed them to the Sheet, and labeled 415 Todoist tasks “migrated.” Then it rewired the daily cron from syncing Todoist to syncing the Google Sheet.

I described the outcome. PAI designed and executed the path. My job was to verify it worked and course-correct when it didn’t. That’s management.

March 8-10: The Research Machine

By this point I had a triage gate running — a local LLM (Gemma3 4b, on my own hardware) screening every incoming query in two passes. One for prompt injection detection. One for complexity routing. Simple queries went to Haiku. Complex ones went to Opus. Zero API cost for the screening.

This was me pushing PAI beyond its native capabilities. Claude Code doesn’t ship with a prompt injection pre-screening layer. It doesn’t have a local LLM triage system. I described the problem — “I want to screen inputs before they hit the API, using my own hardware” — and PAI and I designed a two-pass architecture together. Security gate first (clean/suspicious), then complexity router (simple/moderate/complex). The triage gate became one of our most important tools, even though the prompt injection research later revealed its blind spots.

I started using PAI for parallel multi-source research. Not “ask a question and get an answer.” I’d describe what I needed to understand, and PAI would spawn 3 to 9 research agents simultaneously — Claude, Gemini, Grok researchers each pursuing different angles on the same topic. The results would come back from independent sources, and PAI would synthesize the agreements and conflicts. FYI, Dan is awesome and sorted out prompts to emulate Gemini and Grok using Claude as a base. Didn’t want you to think I was running all 3 models with the added cost…

Thailand LTR visa requirements. Remote cybersecurity jobs in Southeast Asia. IAPP AIGP certification study material. Freelance platform analysis. Each one came back as a structured report cross-validated across multiple AI researchers.

The AIGP research was particularly useful. I was studying for the AI Governance Professional certification and using PAI itself as my case study for the risk assessment portions. There’s a certain poetry in using an AI system to study AI governance while simultaneously applying the governance framework to that same system.

March 10: Remote Command Execution Goes Live

This was the tool that changed the relationship. And the one that taught me the most about managing AI autonomy.

I described the concept to PAI: “I want to email a command from any device and have you execute it in a Claude session on my server.” PAI built the execute-remote system — 5 trait, zero-trust sender verification, email polling, session management, reply with results. The architecture was sound on day one. The operational edge cases took a week.

The security guy in me immediately started stress-testing it. What happens if someone spoofs the sender? What if the command contains injection? What if a long-running task holds the lock and nothing else can process?

That last one actually happened. PAI and I redesigned the lock architecture together. I described the problem; PAI proposed the solution: command-level locks for short operations, session-level locks for long-running ones, a fast-path for CLOSE commands. I reviewed it, poked holes, PAI iterated.

Then I discovered that permission denials from Claude — those moments where it says “I need to ask before doing this” — were getting silently swallowed in the remote context. No human at the keyboard means no one to approve. I told PAI: “Surface these as questions in the reply email.” It redesigned the permission flow so denied actions became readable questions I could respond to from my phone.

This is the part that’s hard to explain to people who haven’t done it. I’m not writing code. I’m not reviewing PRs. I’m managing an AI that writes code, and my value-add is the production context it can’t have — knowing that a permission denial on a remote session is a showstopper, not an edge case. Knowing that session subjects need to be human-readable because I’ll have six running simultaneously. Knowing that a lock held for 30 minutes blocks everything else.

The AI has the technical execution speed. I have the operational judgment. Neither is sufficient alone.

March 12: The Discovery Interview

This was the day the “Chief of Staff” idea became real. And the day I started managing PAI like I’d manage a new hire.

I’d been calling PAI my Chief of Staff half-jokingly. But I decided to take it seriously. So I did what any good manager does when onboarding someone to a complex role: I had them interview me. PAI designed 16 structured questions across five domains — daily operations, communication, business vision, job search strategy, and personal priorities. I didn’t write those questions. PAI did, based on its understanding of what a Chief of Staff needs to know.

The format was unusual. PAI delivered the questions through MeetBot’s chat in a Google Meet call. I spoke my answers. MeetBot captured the transcript, PAI processed it through a three-pass AI pipeline, and produced structured findings with direct quotes and thematic analysis. FYI, Dan built PAI to work on Linux but it’s really designed around Macs because he prefers to use Whisperflow to capture his thoughts and feed them directly into the terminal.

Some of what I said was uncomfortable to hear back:

“My to-do list… they’re not really well organized. They’re not really well aligned, and there’s more in the today’s view than I can actually probably accomplish.”

“I’ve got my other Claude sessions with PrescientFlow dinging me a lot, notifying me whenever they get done and they’re waiting for me. So, it’s frequently interrupting me, which is shortening the amount of time I have to spend on focused reading and making decisions.”

“There’s all kinds of news out there about prompt injections causing the agents that people have given some authority over their lives to tricking them into doing things that are bad.”

That last one is my security brain talking. I’m building an AI system that manages parts of my life, and I’m simultaneously studying the research on how those systems get compromised. The tension is productive. It keeps me honest.

The interview produced a structured findings document and a phased roadmap with trust gates – evidence-based, per-tool progression from “observe only” through “operate,” “suggest,” “act,” and eventually “delegate.” Not a timeline. Not a global switch. Per-tool, per-capability, measured from audit logs.

My security career taught me that trust is built in increments and revoked in instants. The roadmap reflects that.

March 12: MeetBot’s Zero-Trust Day

The same day as the discovery interview, I told PAI: “The display-name authentication on MeetBot is unacceptable. Anyone can change their display name.”

PAI designed and implemented a two-layer, zero trust auth system. Display name match as a quick filter, then Google Meet REST API verification of the sender’s immutable Google account ID. Fails closed — if the API verification fails for any reason, the command is denied. I reviewed the design, approved the approach, PAI wrote the code, I tested it.

Four distinct MeetBot fixes in one day, plus the discovery interview, plus a code review where PAI ran /simplify and found 46 issues across the codebase. I approved 12 high-value fixes — deduplication, parallel operations, streaming downloads. PAI executed all of them.

I remember this day clearly because it was the first day PAI felt like a colleague rather than a tool. I was giving direction, reviewing proposals, approving changes, and catching edge cases. PAI was designing solutions, writing code, running analyses, and surfacing problems I didn’t know I had. That’s not prompt engineering. That’s collaboration.

March 13: Extending the Native Capabilities

This is when I started deliberately pushing PAI beyond what Claude Code and PAI can do out of the box.

Claude Code and PAI can write code and run commands. They don’t natively have a Status Board with persistent notifications. They don’t have a web-based quiz system for certification study. They don’t have a permissions manifest framework for self-governance. They don’t have a triage gate that pre-screens its own inputs.

All of these are capabilities I asked PAI to build for itself. The Status Board — “I need a place to see what you’re working on without asking.” The AIGP quiz system — “I want my daily quiz on a web page I can open from a notification link, not in email.” The permissions manifests — “Every tool you operate needs documented rules for what you CAN do, MUST NOT do, and must ASK about.”

Each one started as me describing a gap, and PAI designing and building the solution. The quiz system went from “I want this” to live web app with SQLite, FTS5 search, AI grading, and history browsing in a single session. The bookmark migration went from “move these to Google Sheets” to a categorization pipeline processing 446 items with AI topic analysis in under 10 minutes.

I’m not a TypeScript developer. I’m a security architect who knows what systems should do. PAI is the developer. My job is knowing what to build and why. PAI’s job is knowing how to build it and doing it fast.

March 14: Pen-Testing My Own Product (With My PAI’s native capabilities)

I told PAI: “Run a full web application security assessment on RiskJuggler.ai.” That’s my own product — the one I’m building to help small businesses with cybersecurity. The irony was not lost on me.

PAI spawned a Pentester agent — a specialized sub-agent with security testing methodology, OWASP Top 10 knowledge, and the ability to run curl commands against the target. I didn’t write the test cases. PAI did, based on it’s native WebAssessment skill’s methodology. I reviewed the findings.

Results:

Severity	Count
Critical	2
High	4
Medium	8
Low	3
Total	17

The most dangerous finding: X-Forwarded-For rate limit bypass enabling unlimited password brute-forcing. PAI didn’t just find it — it wrote the proof-of-concept exploit and the recommended fix with code samples.

Over the next two days, the development team fixed the findings in waves and I tweaked PrescientFlow to add threat modeling and default ADRs to improve the developer inputs. Each time they said “done,” I told PAI: “Retest.” Four rounds total. PAI tracked what was fixed, what was partially fixed, what regressed, and what was new. By the end: 20 of 20 findings resolved, overall risk dropped from CRITICAL to LOW.

This is the kind of assessment that would cost $5,000-15,000 from a consultancy. PAI did four rounds of it in two days. And because I’m both the product owner and the security assessor, the feedback loop was measured in hours, not weeks.

March 15: The AIGP Quiz System

I told PAI: “The email-based quiz is clunky. I want a web page I can open from a notification link, answer questions with radio buttons and checkboxes, get AI-graded results inline, and browse my history by date or topic.”

PAI designed the whole thing in one session: SQLite database with FTS5 full-text search, Hono routes mounted on the existing Status Board server, dark-themed UI matching the notification system, inline AI grading with a spinner, and a history page with search and competency filtering. It migrated the 6 existing email-based quizzes into the new database. It rewired the daily cron to generate quizzes into the database and send a notification link instead of an email.

The next morning I found a bug — multi-line scenario options were getting truncated to the first line. I told PAI; it traced the issue to a regex that didn’t cross newlines, fixed it, and restarted the server. Total time from “this is broken” to “this is fixed”: about 3 minutes.

No SaaS subscription. No “upgrade for more features.” Just exactly what I needed to study for the AIGP exam, built to my specifications, running on my own hardware. That’s what happens when your AI partner can go from description to deployed web application in a single conversation.

The same day, I asked PAI to prepare a briefing document for our Chief of Staff working session. It pulled live data from the Todoist API (640 tasks, 27 projects), analyzed the bookmark topic distribution from the database, cross-referenced the discovery interview findings, and produced a 6-section briefing with concrete proposals for project restructuring, label taxonomy, and daily standup design. It converted the briefing to DOCX, uploaded it to Google Drive, and attached it to the calendar event.

Meeting prep for a working session with my AI. Which sounds absurd until you realize that “meeting prep” is exactly what a Chief of Staff does.

What We’ve Actually Built

Let me be concrete. In roughly three weeks, working as a human-AI partnership:

9 tool directories under active development (meetbot, protonmail, aigp, messages, calendar, bookmarker, todoist, triage, health)
20 security findings identified and remediated on my own product across 4 rounds of testing
446 bookmarks AI-categorized and migrated out of my task system into Google Sheets
640 tasks audited down to 224 actionable items with a restructuring plan
10+ research reports produced by parallel multi-agent research teams
Zero-trust authentication designed for MeetBot chat commands
Remote command execution from any device via email with permission flow
A triage gate that pre-screens PAI’s own inputs for prompt injection
A web-based quiz system with AI grading and searchable history
A Chief of Staff roadmap with evidence-based trust gates (G0 through G4)
A formal discovery interview where the AI interviewed me about my own workflow

None of this was me writing TypeScript. All of it was me managing an AI that writes TypeScript. Describing problems, reviewing solutions, catching edge cases, giving feedback, extending trust incrementally.

I’m evolving from “person who uses AI tools” to “person who manages AI agents.” The skill set is different. It’s less about knowing how to code and more about knowing what to build, why to build it, the right questions to ask along the way, and when the output isn’t good enough. It’s about knowing when to let the AI run and when to intervene. When to say “do it” and when to say “explain first.”

The velocity is extraordinary. Things that would take me weeks or months to code myself get built in hours. But the quality is entirely dependent on how well I manage the process. Vague direction produces vague results. Precise direction with clear constraints and honest feedback produces production-ready systems.

What “AI Chief of Staff” Actually Means Day-to-Day

It doesn’t mean an autonomous agent running my life. Not yet. Maybe not ever, depending on what the prompt injection research produces over the next year.

It means I have a system that:

Processes my email intelligently instead of me scanning 30 newsletters and cannot delete anything
Manages my calendar invitations without a click and cannot delete anything
Runs security assessments on my own products
Conducts structured research across multiple sources in parallel
Monitors the health of its own infrastructure and tells me when something breaks
Takes meeting notes and sends minutes to the right people
Maintains its own governance documentation
Screens its own inputs for prompt injection using two models to detect malicious content

And most importantly, it means I have a system that I’ve deliberately designed to earn more autonomy over time, with explicit boundaries, audit trails, and a violation ratchet that resets trust to zero if a boundary is crossed.

I told PAI during the discovery interview: “I’m not sure I’m ready for a lot of proactive. But brainstorming ideas, that’s always a good thing.”

That’s still where we are. Reactive, bounded, building trust. The roadmap says Phase 2 (proactive operations) starts when Phase 0 and Phase 1 tools are stable. Phase 3 (autonomous back office) is a six-month horizon with a 90-day track record requirement. I’m not risk averse. I’m just accutely aware of the damage that can be done that most people don’t think about when driven by FOMO.

I’m applying my AIGP risk analysis training to the system I built with AI. I’m pen-testing my security product with an AI security assessor. I’m studying prompt injection while building an agent that processes my email.

The contradictions aren’t bugs. They’re the feature. The tension between “this is incredibly useful” and “this could go very wrong” is exactly the tension that produces good engineering and strong businesses.

Conclusions: The Part I Didn’t Expect

I’ve been in security for over 20 years. I’ve built SOCs, architected zero-trust networks, translated risk for boards that couldn’t spell NIST. None of that prepared me for the specific weirdness of collaborating with an AI on building a system that the AI then operates.

The thing I didn’t expect: the management skills transfer almost perfectly. Setting clear objectives. Defining boundaries before granting authority. Reviewing work product. Giving specific, actionable feedback. Extending autonomy incrementally based on demonstrated competence. These are the same things I did as a VP managing security teams. The “employee” is different. The management principles are identical.

What’s new is the speed and the feedback loop. When I tell PAI “this approach is wrong, here’s why,” the correction is immediate and permanent. When I say “I don’t want CISO-level positioning in that recruiter research, I want Architect and Director,” it rewrites the entire deliverable in minutes. When I say “the stop-record button failed twice and the bot got stuck,” PAI investigates its own code, identifies a race condition, adds a duplicate command guard, fixes the recovery path, enhances the logging, and commits the change. All while I watch.

The same principles that make a good security architecture make a good AI collaboration. Least privilege. Defense in depth. Trust but verify. Assume breach.

I’m not done. The Todoist restructuring needs execution. The focus-time system needs integration. The recruiter pipeline needs follow-through. The prompt injection vulnerabilities need patching. The marketing strategy for RiskJuggler.ai needs a first draft. We need a CRM and P/L tracking to run the business.

But three weeks in, I have something I didn’t have before: an operational AI assistant that I understand, that I’ve tested, that I’ve documented, and that I trust — within explicit, evidence-based boundaries. And I have a new skill: managing an AI agent as a genuine collaborator, not a fancy autocomplete.

That’s not a tool. That’s the beginning of a working relationship. And I’m getting better at my side of it every day.

Steve Genders writes about cybersecurity, AI governance, and the messy intersection of the two at riskjuggler.info. He is currently studying for the IAPP AI Governance Professional (AIGP) certification while simultaneously using AI to build the study tools, and pen-testing his own AI security product with his own AI security assessor. The irony sustains him. PAI contributed to this post — and then assessed its own contribution for prompt injection risk, because that’s apparently what we do now.

Juggling Risk for Fun and Profit

Mar 17, 2026

Deploying a Personal AI agent: A Journal from the Trenches