Selected work

Public edges of AI systems that shipped.

I work where frontier AI ideas become product surfaces: agentic workflows, eval gates, review paths, and recovery systems that have to hold up outside a demo. Some diagrams are simplified because the real data and implementation details are private.

Production Multi-agent development system used by PM and Design

Hands-on I build the agent loops, evals, and repair paths myself

Research taste Atlas turns agent failures into testable product questions

See systems Contact

Abstract mobile AI development workflow board with phones, agent steps, and review paths. — **Concrete signal** PM and Design teammates used the system directly to create and merge a meaningful volume of production PRs with engineering review, enough that review capacity became the next bottleneck.

Microsoft Copilot Mobile

Multi-agent development system

I created and shipped a production multi-agent development system for Microsoft Copilot Mobile. It gives PMs, designers, and engineers a safer path from product intent to mobile changes: agents do the mechanical work, evals catch regressions, and review stays close to the diff.

Role Original builder, active developer, and product owner for the workflow
Signal PM and Design teammates used it directly to create and merge a meaningful volume of production PRs with engineering review
Focus Agent workflows, eval gates, review capacity, source-of-truth checks, and recovery paths

Read project page

Abstract mobile AI development workflow board with phone screens, review paths, and quality gates.

Atlas

Private AI workflow and eval lab

Atlas is my private system for studying how agents behave when they have real tools, local context, scheduled jobs, memory, and sensitive data boundaries. The public page shows the system shape without exposing private records or implementation details.

Role Creator, primary user, and evaluator
Shape CLI spine, agent layer, memory boundary, always-on jobs, eval harness, and replayable failures
Learning Reliable agents need tool-path discipline, observability, and recovery paths, not just better prompts

Read project page

Abstract private AI workflow system map with a command-line center, eval loops, data boundary, and protected vault.

Open questions

Product questions frontier AI teams keep running into.

Working question

AI-native messaging

What changes when messages carry people, files, events, and emails as structured context with owners, permissions, and review state?

Notes

Agent product notes

Essays about agents, evals, product constraints, workflow observability, recovery paths, and the gap between demos and real work.