How I Build with AI Agents

A guide by Roman Slack,

My workflow

Honestly my whole workflow is just an algorithm I run over and over. The same few moves on every project. And the thing that actually matters underneath all of it is context: what the agent knows, when it knows it, and how cleanly you hand it off.

Roman Slack · June 2026

I run raw Claude Code. No tools, no scaffolding, nothing fancy. Just a really good CLAUDE.md (I use Andrej Karpathy's, it's the best one I've ever seen), permissions skipped so it can actually move, and git set up so the agent has it as a tool. That's it. Everything below is just how I feed it context.

None of this is locked to one tool either. The same workflow runs on basically any agent, whether that's Claude Code, Codex, or a cheaper model like DeepSeek in another coder. I pay for the $200 Claude Code plan and keep the $20 OpenAI Codex one around as a backup for when Claude Code is down, but the model really only changes how fast you get there and how many iterations a feature takes. If you know what you're doing, you can brute force the same result with just about any frontier model from the last four or five months.

Raw Claude Code Hotkey → --dangerously-skip-permissions Karpathy's CLAUDE.md PyCharm + Jedi terminal git as a tool

There are three methods I use, and I'll go in the order that makes sense. From starting at a blank repo to adding something into a huge codebase.


The only tools I use

Steal my setup

There are really only a few custom pieces. Each card is a prompt you can paste straight into your own agent and it'll set the thing up for you.

Set it all up at once

Hand your agent the whole workflow in one go.

I want to copy a coding workflow. Help me set up four things and walk me through each one: (1) a one-keystroke way to launch Claude Code with permissions skipped, (2) a `gpush` git shortcut that adds, commits, and pushes in a single command, (3) Andrej Karpathy's CLAUDE.md as my global Claude config, and (4) an `/amnesia` command in my repo that gets a fresh agent up to speed fast. I'm on [your OS + shell].

One-key launch

The only real "tool" I use. A hotkey that starts the agent with permissions skipped so it can actually move.

Add a short shell alias so I can launch Claude Code with permissions skipped in one command (it should run `claude --dangerously-skip-permissions`). I'm on [your OS + shell]. Reload my shell and tell me the command to type.

gpush

Add, commit, and push in one move.

Add a shell function called `gpush` that takes a commit message as the first argument and an optional branch as the second. It should run `git add .`, commit with my message, then push to the current branch (or the branch I pass). Print a short usage line if I don't give a message. Reload my shell when done.

Karpathy's CLAUDE.md

The best Andrej Karpathy CLAUDE.md I've seen. Grab it here. It makes every agent noticeably better.

Download Andrej Karpathy's CLAUDE.md from https://raw.githubusercontent.com/multica-ai/andrej-karpathy-skills/main/CLAUDE.md and install it as my global CLAUDE.md (~/.claude/CLAUDE.md) so Claude Code reads it on every project. Show me where you put it and a quick summary of what's inside.

The /amnesia command

Honestly one of the most important pieces. Drop it in every repo so any fresh agent gets full context in seconds.

Create a Claude Code command file at `.claude/commands/amnesia.md` in this repo. When I run `/amnesia`, make it tell you to look around the whole project at a high level: read the root README, skim the frontend and backend, note any deploy / infra commands, and read the recent git commits, then report back concisely on what the project is and how it works. Keep the command file itself short.

Method 01

Starting from zero

A cool idea I want to try, a little experiment, or a new tool. Like building my own voice recorder that's open source, cheap, and just good.

The first step for me is just thinking about it on my own. I try to picture what I actually want it to look like and where it's going before I ever talk to the agent, because the context dump later is only as good as how clearly I can see it. Then I set up the repo, public or private, clone it, open PyCharm, and a hotkey fires up the agent. Now I just talk.

01 Ponder Think it through on my own first. Picture the end goal before the agent is even open.
02 Repo & launch Make the repo, clone it, open PyCharm, hit the hotkey and the agent is ready.
03 Context dump I speak my whole mind in one go. High level goal, then the tech calls I care about (like use the OpenAI API), then the stack, then a visual reference like “make it look like PyCharm.”
04 Review & plan I end by telling it to review all that and start to plan. Not build yet.
05 Plan mode It jumps into plan mode on its own and clears up anything I missed.
06 One shot, then QA It pretty much one shots the project. Then I make small tweaks against hot reload so I'm seeing them in real time.
07 Write the amnesia doc Have it drop an /amnesia command in the repo. Now any future tweak starts from full context instead of cold, so iterating later is way faster.

The dump follows a little algorithm in my head. Describe what I want at a high level, then the technical decisions, then I leave the rest up to the agent and it usually goes right where I expect. Five to ten minutes of talking becomes the first prompt. Since it already has all the context, the tweaks don't take long. Honestly for something this size the model barely matters, any top ten agent will do it pretty much perfectly. I just pay for the best, so I'm running Opus 4.8 on medium thinking anyway.


Method 02

A small feature in a big codebase

A real one I did: let users favorite only eight people, with a little popup telling them to remove one to add a new one.

The codebase has a backend and two frontends, Flutter and Swift, so right away you have to lead the agent down the right route. I'm usually already working on something else when one of these comes up, so I just spawn another agent in a new terminal. I spawn them left and right.

01 Orient Run /amnesia, my custom command. It explains the whole project in a super token efficient way and skims past commits. Reads it in a millisecond.
02 Point it I point it without giving the task: go look at how favorites work on the Flutter frontend and the backend, then report back.
03 Give the task Now the actual task: cap favorites at eight with the popup. You can add a backend gate but it's not really needed since the frontend stops it.
04 QA on device Test it on my test Android phone. It just works.
05 Ship gpush, my command that does git add all, commit, and push to the branch.

The key move is step two. Make it understand before it acts. Pointing it at the existing code without giving it the task loads the right context, so the actual change usually lands on the first try.


Method 03

A massive feature, or a full rework

Like swapping the entire auth system from SMS to passkeys. How do you even start to plan that? You have the agent do it for you.

This is the same as method two except you extend the search. You use language that makes it look deeper and really understand things before it writes a line. It uses way more tokens and takes longer, but it has way more to work with. It might not touch any code for upwards of an hour, just researching and collecting info, including Google searches with the 2026 tag so it has the most up to date and secure approach. That pays off later in QA.

01 Orient /amnesia, same as before.
02 Deep research Up to an hour with no code, just collecting info. Google with the 2026 tag so it's current and secure.
03 Write the plan Some back and forth, then a big plan.md it can reference.
04 Context check A gut call on whether this agent still has enough room. That decides what's next.
Context still clean Keep the same agent Tell it to start building in phases. It defaults to stages you can QA between. Backend first, deploy it, make sure it works, then the Flutter client.
Context too full Kill and start fresh Kill it and start a new agent. Run /amnesia, give it the path to plan.md, tell it to read it and get ready to work. Then go.

Always have it make a deploy guide for your infra and keep it easy to use. Then deploying is just “read the deploy guide first, then do it,” and the guide does all the prompting for me.

The QA between phases is the exact same as method two. None of this is complicated really. It's the same algorithm, just with more context up front and being careful about how you hand it off.


What actually matters

The stuff that actually matters

The methods are just mechanics. This is the part that actually makes it work.

01

Context is king

It's all context management. You're not really prompting, you're managing what the agent has for whatever you're trying to do.

02

Never be afraid to kill an agent

A fresh agent with the right context beats a full one every time. When the context gets too full, kill it, run amnesia, and point it back at the plan.

03

One task per agent

Letting tasks or features bleed into each other on one agent is the worst mistake of all time. Keep them separate. I'll only combine really small things.

04

Keep a deploy guide

Write your infra down once in a guide and make it easy for the agent to use. Then every deploy is just one line.

05

Understand your own codebase

You have to actually understand the project and the tech stack deeply, otherwise you can't use your intuition to steer. And more importantly it's how you learn. If the AI makes every decision and you're never deciding alongside it, you never see your own failures and you never get better. You just blame it all on the AI.

06

Just keep going

The really hard problems take thousands of prompts. Restart agents, clear context, keep spamming it. I've gone through thousands of prompts on one problem and always solved it. I've never really failed. It's just persistence.

That's really the whole thing. Pick the method, manage the context, hand it off clean, and do it over and over. I'm usually running ten agents at a time across the fifteen or so projects I'm working on, checking on them and killing the ones that are done. It's simple. You just have to actually understand what you're working in.

Written by Roman Slack, Agent Fleet Engineer. See more of my work on the projects page, or get in touch about AI consulting and engineering.