Project
Atlas
Atlas is my private AI workflow system. I use it to explore how agents become useful in daily work: how they call tools, preserve context, recover from mistakes, and operate around sensitive personal data.
I do not publish Atlas source code or raw data. The useful public version is the system shape: what I tried, what broke, and what the failures taught me about AI product design.
The project started as CLI-first personal software and grew into a practical testbed for agentic workflows. The durable lesson has been that AI products are less about impressive one-off answers and more about boring reliability: source-of-truth discipline, safe writes, eval harnesses, recovery paths, and interfaces that make the right action easy for the agent.
Architecture
The public diagram below is intentionally simplified. It shows the setup without exposing private schemas, credentials, raw records, or operational secrets.
How the pieces fit
The CLI is the product's spine. Agents are encouraged to call commands instead of doing important reasoning from memory or raw database queries. That makes the system easier to test and gives the agent a smaller, more trustworthy surface.
The Azure VM runs the always-on pieces: scheduled jobs, a daemon, health checks, and notifications. The VM is not interesting because it is complex. It is interesting because it forces product questions: what should run automatically, what should require confirmation, and what should still work when the AI layer is unavailable?
The eval harness exists because prompts alone were not enough. Atlas replays real failure shapes, checks source-of-truth behavior, and catches cases where the agent sounds right but uses the wrong path. Over time, the evals became as important as the agent itself.
Product questions Atlas keeps raising
- How do you make a tool the path of least resistance for an agent?
- When should an AI ask for confirmation before writing data?
- How do you evaluate behavior that happens in files or side effects, not chat?
- What belongs in always-loaded context, and what belongs in a command?
- How do you design recovery when the AI is the thing that failed?