Daniel Chu / notes from real workflows

Notes from building AI tools that have to survive real workflows.

I write about the awkward gap between AI demos and work that has users, deadlines, data, quality bars, support paths, and people who need the system to behave.

Some notes are polished. Some are field observations from things I am still trying to understand: agents, tools, workflows, evals, recovery paths, and product judgment.

Start with AI leverage Current projects

Start with evidence Bug report A real workflow, edge case, or failure that made the problem visible.

Map the system Workflow trace Where context, tools, people, and incentives shape the outcome.

Carry the decision Decision record What should change in the product, process, or evaluation.

Start here

AI Features Are Not the Same as AI Leverage

Faster drafts are useful. The bigger question is whether AI helps a team make better tradeoffs when speed, quality, trust, and competition are all real.

Current projects

What I am building and studying now.

Development system

Copilot Mobile Multi-Agent Development System

Studying how multiple agents can help with mobile development while keeping handoff, verification, and recovery visible.

Research system

Atlas

A private system for studying agent behavior in real workflows without publishing private data.

Threads

Problems I am working through.

Agent tool use

When an agent says it used a tool, how do we know?

What I am noticing

The prose can sound grounded even when execution did not happen.

Generalized tools

Why does giving agents more power sometimes make them less reliable?

Still unresolved

How narrow should a tool be before it stops being useful?

Workflow evals

What would an eval look like if it tested the real workflow?

What I am noticing

Many evals test chat, while product behavior lives in files and side effects.

Recovery paths

If an AI system writes data, what should undo and audit look like?

Still unresolved

How much recovery should be automatic before it becomes another risk?

AI-native messaging

What if people, files, events, and emails were structured context instead of attachments?

What I am noticing

The hard part is not the entity card. It is deciding who owns the context.

Field notes

Essays and drafts from building, breaking, and thinking in public.

Product thinking

AI Features Are Not the Same as AI Leverage

When an agent says it used a tool, that is not evidence

AI agents debug symptoms before systems

When more powerful tools make agents worse

Evals should test the workflow, not the demo

If an AI can write data, it needs a recovery path

Open questions

These are not conclusions. They are questions I'm still working through.

What should an AI system refuse to automate?
When does flexibility become unreliability?
How should agents explain uncertainty without creating more work?
What does a good recovery path look like for AI-generated actions?
What kinds of constraints make agents more dependable?
How do you evaluate work when the task itself is ambiguous?

Artifacts

Small diagrams, sketches, and work-in-progress fragments.

Agent workflow map

Tool reliability matrix

AI-native messaging sketch

Recovery path checklist

Compare notes

If something here matches what you are seeing, I would like to hear from you.

LinkedIn X