Selected work
Public edges of AI systems that shipped.
I work where frontier AI ideas become product surfaces: agentic workflows, eval gates, review paths, and recovery systems that have to hold up outside a demo. Some diagrams are simplified because the real data and implementation details are private.
Microsoft Copilot Mobile
Multi-agent development system
I created and shipped a production multi-agent development system for Microsoft Copilot Mobile. It gives PMs, designers, and engineers a safer path from product intent to mobile changes: agents do the mechanical work, evals catch regressions, and review stays close to the diff.
- Role Original builder, active developer, and product owner for the workflow
- Signal PM and Design teammates used it directly to create and merge a meaningful volume of production PRs with engineering review
- Focus Agent workflows, eval gates, review capacity, source-of-truth checks, and recovery paths
Atlas
Private AI workflow and eval lab
Atlas is my private system for studying how agents behave when they have real tools, local context, scheduled jobs, memory, and sensitive data boundaries. The public page shows the system shape without exposing private records or implementation details.
- Role Creator, primary user, and evaluator
- Shape CLI spine, agent layer, memory boundary, always-on jobs, eval harness, and replayable failures
- Learning Reliable agents need tool-path discipline, observability, and recovery paths, not just better prompts
Open questions
Product questions frontier AI teams keep running into.
AI-native messaging
What changes when messages carry people, files, events, and emails as structured context with owners, permissions, and review state?
Agent product notes
Essays about agents, evals, product constraints, workflow observability, recovery paths, and the gap between demos and real work.