Project

Atlas

Atlas is my private AI workflow system. I use it to explore how agents become useful in daily work: how they call tools, preserve context, recover from mistakes, and operate around sensitive personal data.

View architecture Read related notes

I do not publish Atlas source code or raw data. The useful public version is the system shape: what I tried, what broke, and what the failures taught me about AI product design.

The project started as CLI-first personal software and grew into a practical testbed for agentic workflows. The durable lesson has been that AI products are less about impressive one-off answers and more about boring reliability: source-of-truth discipline, safe writes, eval harnesses, recovery paths, and interfaces that make the right action easy for the agent.

Architecture

The public diagram below is intentionally simplified. It shows the setup without exposing private schemas, credentials, raw records, or operational secrets.

User Questions, commands, screenshots, and review decisions

Chat and agent layer Agent sessions, skills, model routing, and tool calls

Atlas CLI Canonical commands for reads, writes, scenarios, and audits

Private data layer Local database, generated context, and domain records

Azure VM Always-on daemon, scheduled jobs, health checks, and alerts

Eval harness Replay failures, grade behavior, reset state, and catch regressions

How the pieces fit

The CLI is the product's spine. Agents are encouraged to call commands instead of doing important reasoning from memory or raw database queries. That makes the system easier to test and gives the agent a smaller, more trustworthy surface.

The Azure VM runs the always-on pieces: scheduled jobs, a daemon, health checks, and notifications. The VM is not interesting because it is complex. It is interesting because it forces product questions: what should run automatically, what should require confirmation, and what should still work when the AI layer is unavailable?

The eval harness exists because prompts alone were not enough. Atlas replays real failure shapes, checks source-of-truth behavior, and catches cases where the agent sounds right but uses the wrong path. Over time, the evals became as important as the agent itself.

Product questions Atlas keeps raising

How do you make a tool the path of least resistance for an agent?
When should an AI ask for confirmation before writing data?
How do you evaluate behavior that happens in files or side effects, not chat?
What belongs in always-loaded context, and what belongs in a command?
How do you design recovery when the AI is the thing that failed?

Atlas

Architecture

How the pieces fit

Product questions Atlas keeps raising

Related notes