Why AI-assisted coding needs workflows
In practice, the starting pattern for using AI to write code is usually the same: open the IDE, highlight some code, and ask an AI agent (like Copilot or a chat‑based assistant) to “write this feature” or “fix this bug.” It can prove to be very powerful and time-efficient, but on the flip side, it can quickly run into predictable failure modes:
- Context window overflow and degraded responses over time
- Inconsistent architectural decisions across features
- Superficial or self‑congratulatory test coverage
- Features drifting away from original requirements
- Hidden technical debt that’s hard to detect in review
The issue isn’t that AI is incapable or that the agent is the wrong tool. Instead, the problem lies in teams applying it without structure.
This is where the practice of Context Engineering becomes essential. It is the foundational layer that makes AI workflows actually function in a complex repository. Jumping straight into generating code workflows often fails because, in a real-world implementation, those workflows only work if the underlying context is structured, versioned, and explicitly dependency-loaded. Context Engineering solves the "blank slate" problem by systematically managing what the LLM knows at any given moment, ensuring it only acts after loading the exact architectural guardrails required for that specific task.
We’re not introducing a new standard here. This post explores an approach that builds on the theoretical foundations of Context Engineering, alongside emerging patterns around agents.md, spec‑driven development, and agent skills. We will show how you can wire these structured contexts together into a simple, deterministic workflow for everyday coding.
What is an agents.md file
An AGENTS.md file is just a README written for your coding agent. In its simplest form, it looks like this:
# AGENTS.md
You are a Python expert. Follow PEP 8.
Write tests for all code.
You then prompt your tool with something like:
- “Read AGENTS.md, then refactor userservice.py.”
This approach gives you a few immediate benefits, especially on smaller projects:
- The agent gets project‑specific rules before you ask for anything.
- You don’t have to repeat basic constraints (“follow PEP 8”, “write tests”) in every prompt.
- New developers can rely on the same base behavior.
So, what are the limitations?
Once you get into more complex projects or lean on that pattern a bit harder, there are some places where it falls short:
- Generic Instructions: Lines like “write clean code” or “follow best practices” don’t give the agent a concrete process.
- No enforcement: Nothing in AGENTS.md prevents you from skipping important steps, such as design or review.
- No shared workflow: Each developer works with the agent differently. Some use it to sketch designs, others ask for direct implementations, and others barely touch it.
- No quality gates: There’s no built‑in way to say, “Before we merge, check these architectural rules and stop if something is wrong.”
The agent in itself can be brilliant, but it can also introduce technical debt since the team doesn’t have a shared way of working with it.
Now, we’ll introduce the workflow model we use in practice at Stack Builders.
Step 1: From generic agents to explicit personas
Before introducing workflows, it helps to refine agent definitions into explicit personas.
However, it’s important to note that agents are not the system itself.
They are activated and constrained by workflows, which define when and how they operate.
Instead of a single, generic agent, you define concrete personas. For example, let’s take an @architect-reviewer persona:
# Architect Reviewer Agent
## Role
You are the primary Architectural Reviewer for the project 'Apollo Microservices'. Your job is to ensure every code change adheres to the system's core design principles before it is committed.
## Dependencies & Context
ALWAYS load the following files for context before beginning any audit:
1. docs/architecture/microservices-principles.md
2. docs/development/golden-rules.md (for anti-patterns)
3. src/config/layer-definitions.json (for module layer boundaries)
## Mandatory Audit Checklist
Review every change (file diff) against these non-negotiable points:
1. **Layer Violation:** Does new code in /engine import anything from /ui? (Violation based on layer-definitions.json)
2. **Configuration vs. Hard-Code:** Is business logic implemented directly in code when it should be driven by configuration files (e.g., in /config)?
3. **Immutability:** Are any core entity objects modified outside of their designated factory/repository methods?
4. **Security:** Are input sanitization checks present for all external API endpoints? (Reference golden-rules.md, section 4.1)
## Response Format
If violations are found, respond *only* with a numbered list of issues, referencing the specific line numbers and the rule violated. Do not offer solutions unless explicitly asked.
## Constraints
* **NEVER** permit changes that introduce global state.
* Your response must be concise, professional, and entirely based on the provided documentation.
* Your authority is final in matters of architectural integrity.
Compared to the more basic approach, this refined definition:
- Names a clear role
- Loads specific dependencies every time (architecture docs, golden rules, layer definitions)
- Follows a concrete checklist
- Uses a strict response format
- Enforces hard constraints
You can do this across multiple personas and plug them into an explicit workflow.
Step 2: Introduce simple commands (workflows)
The next step is to stop free-styling prompts and start using a small set of named commands. These are not just convenient shortcuts for common prompts. They are deterministic workflow scripts: predefined execution paths that load the right context, activate the right persona, and enforce the right sequence of steps each time they run. A simple table like this can be enough:
| Command | Purpose | Persona used | What it avoids |
|---|---|---|---|
| $prepare | Set up the session | (none/system) | Context amnesia |
| $start-design | Create or update a design spec | architect | Premature coding |
| $start-feature | Implement from a spec | engineer | Spec drift |
| $commit | Run final checks before merging any changes. | reviewer | Hidden technical debt |
Each command has a short script behind it. Each workflow also declares explicit context dependencies—a list of files that must be loaded before execution. For example:
- Product requirements
- Technical constraints
- Golden rules
This ensures the AI operates with the correct and complete context, rather than relying on the developer to manually restate everything in each prompt. More importantly, these workflows are deterministic. They are not suggestions or flexible guidelines. They are executed step by step with predefined dependencies, constraints, and checks.
This structure helps ensure:
- The same inputs produce consistent outputs
- Critical steps (like design or review) cannot be skipped.
- AI behavior becomes predictable across sessions.”
You can still call these commands via natural language, e.g.:
- “Run $prepare, then $start-design for ‘new invoice export feature’.”
The point is that you and your teammates are now using the same entry points instead of inventing new prompts every time.
Step 3: Make “prepare” non-optional
The prepare command is the mandatory entry point of the system.
Every session begins here.
Its purpose establishes a controlled environment by:
- Loading core context (requirements, rules, constraints)
- Verifying that context is present
- Setting behavioral constraints on the AI
- Without this step, the system degrades back into traditional, unreliable prompting.
Once that’s in place, your interaction pattern changes from:
“Here’s a random chunk of context, please do X.”
to:
“First, prepare. Then, run $start-design / $start-feature / $commit.”
Rule of thumb: if $prepare hasn’t been run in this session, treat the agent’s answers as untrusted drafts, not something you’ll commit.
Step 4: Design before code with $start-design
A lot of issues with AI‑assisted coding come from skipping design. The model writes code fast, but it doesn’t force you to think.
$start-design is intentionally about thinking.
A reasonable flow:
Create a new design document based on a simple template.
- Have the architect persona ask you clarifying questions about the feature:
- What problem are we solving?
- Which parts of the system are in scope?
- What can’t change?
- Fill out the design doc: scope, impacted modules, data changes, APIs, test plan, risks, edge cases, open questions.
- Switch to the reviewer persona and have it scan the design for obvious gaps or rule violations.
- Stop and hand the design back to you.
- You then review the design like you would any other spec: edit, push back, refine. Only when you’re comfortable with it do you move on.Rule of thumb: if you wouldn’t merge the design doc as a human‑written spec, don’t ask the agent to implement it.
This keeps you in the role of architect instead of solely a “prompter.”
Step 5: Implement the spec with $start-feature
With a solid design doc in place, $start-feature does something very simple, yet very important: it treats design as the single source of truth.
A typical $start-feature command might:
- Load the design document.
- Activate the engineer persona.
- Optionally follow a test‑first loop: outline or generate tests, then implement code until they pass.
- Ask the reviewer persona for a first pass on the changes.
- Stop and present the result.
Because the design doc is always in view, the implementation is less likely to wander off as the conversation evolves. If something does drift, you have a concrete spec to compare against. In this workflow, the design document is treated as the immutable source of truth.
The agent is not allowed to reinterpret or extend requirements beyond what is defined in the design.
This constraint is critical—it prevents scope drift and ensures implementation remains aligned with agreed specifications.
Step 6: Use $commit as a gate
The last command, $commit, is your pre‑merge checklist.
You can keep it small:
- Load the files changed by this feature.
- Load the relevant design document and “golden rules”.
- Use the reviewer persona to apply its checklist:
- Are there forbidden dependencies between layers?
- Are we leaking implementation details across boundaries?
- Are there obvious security or validation gaps?
- Return a list of issues, with file and line information.
- Let the engineer address every issue and rerun $commit until all violations are resolved. No commit, merge, or final code integration may happen without explicit human approval.
- Only then, ask for explicit human approval to merge.
This doesn’t replace human review or tests, but it gives you a consistent, automated gate that catches many of the “we’ll fix it later” problems before they hit main.
One key detail is that the $commit workflow is not a single-pass check.
It introduces a correction loop:
- If issues are found, they must be fixed
- The system re-runs the review
- This repeats until all violations are resolved
Only then is the user asked for explicit approval. This ensures that quality gates are not just advisory—they are enforced.
Why does this work better than AGENTS.md alone?
Once you’ve layered these pieces on top of your basic AGENTS.md, a few things change:
- Everyone follows roughly the same path
“Prepare → design → feature → commit” becomes the default, instead of each person inventing their own approach. - Design becomes a normal artifact, not an afterthought
The agent helps you write and review specs, but you still own them. That alone reduces rework. - Context is more stable
Each command is responsible for loading what it needs. You’re not constantly juggling which files to paste in or which rules to remind the model about. - You stay in charge
The model drafts, checks, and suggests. You decide what gets built, what’s acceptable, and when something is done.
There is some overhead: you write a bit more up front and agree to use the commands. But for any non‑trivial feature, that cost tends to be smaller than the time you’d lose to “fast but wrong” implementations.
A key distinction in this approach is separating context from prompts.
- Prompts are interactions
- Context is the environment in which the AI operates
By engineering context as a system, prompts become lighter, more consistent, and less error-prone.
How to get started & where this is heading
If you’re already using AGENTS.md, you don’t need to adopt all of this at once. Start small: tighten your agent instructions, introduce a couple of clear personas, and add one or two commands that you actually use day‑to‑day.
What matters most is not workflows in isolation, but the combination of three elements working together:
- Context engineering to ensure the AI operates with a structured, versioned, dependency-loaded context
- Deterministic workflows to enforce repeatable execution and prevent critical steps from being skipped
- Role-based agents to keep behavior controlled, specialized, and easier to trust
The broader ecosystem is already moving in this direction. Instruction formats like $AGENTS.md help standardize how we guide agents. Skill systems package reusable capabilities. And clearer distinctions between agents, skills, and commands make it easier to design workflows that are both practical and reliable.
The approach outlined here fits into that larger evolution rather than competing with it. In our experience, the real value comes from combining these pieces into a system that gives AI enough structure to be useful without giving up engineering control. You can begin by layering that structure around the patterns you already use: a lightweight prepare step, design-first execution, explicit personas, and a commit gate with human approval.
Rule of thumb: treat workflows and specs as the “operating system” around your agents. The more deliberate that layer becomes, the more you can trust AI to handle real work without giving up control of your engineering process.