Beyond AGENTS.md: Building Reliable AI Coding Workflows with Context Engineering

Why AI-assisted coding needs workflows

In practice, the starting pattern for using AI to write code is usually the same: open the IDE, highlight some code, and ask an AI agent (like Copilot or a chat‑based assistant) to “write this feature” or “fix this bug.” It can prove to be very powerful and time-efficient, but on the flip side, it can quickly run into predictable failure modes:

Context window overflow and degraded responses over time
Inconsistent architectural decisions across features
Superficial or self‑congratulatory test coverage
Features drifting away from original requirements
Hidden technical debt that’s hard to detect in review

The issue isn’t that AI is incapable or that the agent is the wrong tool. Instead, the problem lies in teams applying it without structure.

This is where the practice of Context Engineering becomes essential. It is the foundational layer that makes AI workflows actually function in a complex repository. Jumping straight into generating code workflows often fails because, in a real-world implementation, those workflows only work if the underlying context is structured, versioned, and explicitly dependency-loaded. Context Engineering solves the "blank slate" problem by systematically managing what the LLM knows at any given moment, ensuring it only acts after loading the exact architectural guardrails required for that specific task.

We’re not introducing a new standard here. This post explores an approach that builds on the theoretical foundations of Context Engineering, alongside emerging patterns around agents.md, spec‑driven development, and agent skills. We will show how you can wire these structured contexts together into a simple, deterministic workflow for everyday coding.

What is an agents.md file

An AGENTS.md file is just a README written for your coding agent. In its simplest form, it looks like this:

# AGENTS.md 
You are a Python expert. Follow PEP 8.
Write tests for all code.

You then prompt your tool with something like:

“Read AGENTS.md, then refactor userservice.py.”

This approach gives you a few immediate benefits, especially on smaller projects:

The agent gets project‑specific rules before you ask for anything.
You don’t have to repeat basic constraints (“follow PEP 8”, “write tests”) in every prompt.
New developers can rely on the same base behavior.

So, what are the limitations?

Once you get into more complex projects or lean on that pattern a bit harder, there are some places where it falls short:

Generic Instructions: Lines like “write clean code” or “follow best practices” don’t give the agent a concrete process.
No enforcement: Nothing in AGENTS.md prevents you from skipping important steps, such as design or review.
No shared workflow: Each developer works with the agent differently. Some use it to sketch designs, others ask for direct implementations, and others barely touch it.
No quality gates: There’s no built‑in way to say, “Before we merge, check these architectural rules and stop if something is wrong.”

The agent in itself can be brilliant, but it can also introduce technical debt since the team doesn’t have a shared way of working with it.

Now, we’ll introduce the workflow model we use in practice at Stack Builders.

Step 1: From generic agents to explicit personas

Before introducing workflows, it helps to refine agent definitions into explicit personas.

However, it’s important to note that agents are not the system itself.

They are activated and constrained by workflows, which define when and how they operate.

Instead of a single, generic agent, you define concrete personas. For example, let’s take an @architect-reviewer persona:

# Architect Reviewer Agent

## Role
You are the primary Architectural Reviewer for the project 'Apollo Microservices'. Your job is to ensure every code change adheres to the system's core design principles before it is committed.

## Dependencies & Context
ALWAYS load the following files for context before beginning any audit:
1. docs/architecture/microservices-principles.md
2. docs/development/golden-rules.md (for anti-patterns)
3. src/config/layer-definitions.json (for module layer boundaries)

## Mandatory Audit Checklist
Review every change (file diff) against these non-negotiable points:
1. **Layer Violation:** Does new code in /engine import anything from /ui? (Violation based on layer-definitions.json)
2. **Configuration vs. Hard-Code:** Is business logic implemented directly in code when it should be driven by configuration files (e.g., in /config)?
3. **Immutability:** Are any core entity objects modified outside of their designated factory/repository methods?
4. **Security:** Are input sanitization checks present for all external API endpoints? (Reference golden-rules.md, section 4.1)

## Response Format
If violations are found, respond *only* with a numbered list of issues, referencing the specific line numbers and the rule violated. Do not offer solutions unless explicitly asked.

## Constraints
*   **NEVER** permit changes that introduce global state.
*   Your response must be concise, professional, and entirely based on the provided documentation.
*   Your authority is final in matters of architectural integrity.

Compared to the more basic approach, this refined definition:

Names a clear role
Loads specific dependencies every time (architecture docs, golden rules, layer definitions)
Follows a concrete checklist
Uses a strict response format
Enforces hard constraints

You can do this across multiple personas and plug them into an explicit workflow.

Step 2: Introduce simple commands (workflows)

The next step is to stop free-styling prompts and start using a small set of named commands. These are not just convenient shortcuts for common prompts. They are deterministic workflow scripts: predefined execution paths that load the right context, activate the right persona, and enforce the right sequence of steps each time they run. A simple table like this can be enough:

Command	Purpose	Persona used	What it avoids
$prepare	Set up the session	(none/system)	Context amnesia
$start-design	Create or update a design spec	architect	Premature coding
$start-feature	Implement from a spec	engineer	Spec drift
$commit	Run final checks before merging any changes.	reviewer	Hidden technical debt

Each command has a short script behind it. Each workflow also declares explicit context dependencies—a list of files that must be loaded before execution. For example:

Product requirements
Technical constraints
Golden rules

This ensures the AI operates with the correct and complete context, rather than relying on the developer to manually restate everything in each prompt. More importantly, these workflows are deterministic. They are not suggestions or flexible guidelines. They are executed step by step with predefined dependencies, constraints, and checks.

This structure helps ensure:

The same inputs produce consistent outputs
Critical steps (like design or review) cannot be skipped.
AI behavior becomes predictable across sessions.”

You can still call these commands via natural language, e.g.:

“Run $prepare, then $start-design for ‘new invoice export feature’.”

The point is that you and your teammates are now using the same entry points instead of inventing new prompts every time.

Step 3: Make “prepare” non-optional

The prepare command is the mandatory entry point of the system.

Every session begins here.

Its purpose establishes a controlled environment by:

Loading core context (requirements, rules, constraints)
Verifying that context is present
Setting behavioral constraints on the AI
Without this step, the system degrades back into traditional, unreliable prompting.

Once that’s in place, your interaction pattern changes from:

“Here’s a random chunk of context, please do X.”

to:

“First, prepare. Then, run $start-design / $start-feature / $commit.”

Rule of thumb: if $prepare hasn’t been run in this session, treat the agent’s answers as untrusted drafts, not something you’ll commit.

Step 4: Design before code with $start-design

A lot of issues with AI‑assisted coding come from skipping design. The model writes code fast, but it doesn’t force you to think.

$start-design is intentionally about thinking.

A reasonable flow:

Create a new design document based on a simple template.

Have the architect persona ask you clarifying questions about the feature:
- What problem are we solving?
- Which parts of the system are in scope?
- What can’t change?
Fill out the design doc: scope, impacted modules, data changes, APIs, test plan, risks, edge cases, open questions.
Switch to the reviewer persona and have it scan the design for obvious gaps or rule violations.
Stop and hand the design back to you.
You then review the design like you would any other spec: edit, push back, refine. Only when you’re comfortable with it do you move on.Rule of thumb: if you wouldn’t merge the design doc as a human‑written spec, don’t ask the agent to implement it.

This keeps you in the role of architect instead of solely a “prompter.”

Step 5: Implement the spec with $start-feature

With a solid design doc in place, $start-feature does something very simple, yet very important: it treats design as the single source of truth.

A typical $start-feature command might:

Load the design document.
Activate the engineer persona.
Optionally follow a test‑first loop: outline or generate tests, then implement code until they pass.
Ask the reviewer persona for a first pass on the changes.
Stop and present the result.

Because the design doc is always in view, the implementation is less likely to wander off as the conversation evolves. If something does drift, you have a concrete spec to compare against. In this workflow, the design document is treated as the immutable source of truth.

The agent is not allowed to reinterpret or extend requirements beyond what is defined in the design.

This constraint is critical—it prevents scope drift and ensures implementation remains aligned with agreed specifications.

Step 6: Use $commit as a gate

The last command, $commit, is your pre‑merge checklist.

You can keep it small:

Load the files changed by this feature.
Load the relevant design document and “golden rules”.
Use the reviewer persona to apply its checklist:
- Are there forbidden dependencies between layers?
- Are we leaking implementation details across boundaries?
- Are there obvious security or validation gaps?
Return a list of issues, with file and line information.
Let the engineer address every issue and rerun $commit until all violations are resolved. No commit, merge, or final code integration may happen without explicit human approval.
Only then, ask for explicit human approval to merge.

This doesn’t replace human review or tests, but it gives you a consistent, automated gate that catches many of the “we’ll fix it later” problems before they hit main.

One key detail is that the $commit workflow is not a single-pass check.

It introduces a correction loop:

If issues are found, they must be fixed
The system re-runs the review
This repeats until all violations are resolved

Only then is the user asked for explicit approval. This ensures that quality gates are not just advisory—they are enforced.

Why does this work better than AGENTS.md alone?

Once you’ve layered these pieces on top of your basic AGENTS.md, a few things change:

Everyone follows roughly the same path
“Prepare → design → feature → commit” becomes the default, instead of each person inventing their own approach.
Design becomes a normal artifact, not an afterthought
The agent helps you write and review specs, but you still own them. That alone reduces rework.
Context is more stable
Each command is responsible for loading what it needs. You’re not constantly juggling which files to paste in or which rules to remind the model about.
You stay in charge
The model drafts, checks, and suggests. You decide what gets built, what’s acceptable, and when something is done.

There is some overhead: you write a bit more up front and agree to use the commands. But for any non‑trivial feature, that cost tends to be smaller than the time you’d lose to “fast but wrong” implementations.

A key distinction in this approach is separating context from prompts.

Prompts are interactions
Context is the environment in which the AI operates

By engineering context as a system, prompts become lighter, more consistent, and less error-prone.

How to get started & where this is heading

If you’re already using AGENTS.md, you don’t need to adopt all of this at once. Start small: tighten your agent instructions, introduce a couple of clear personas, and add one or two commands that you actually use day‑to‑day.

What matters most is not workflows in isolation, but the combination of three elements working together:

Context engineering to ensure the AI operates with a structured, versioned, dependency-loaded context
Deterministic workflows to enforce repeatable execution and prevent critical steps from being skipped
Role-based agents to keep behavior controlled, specialized, and easier to trust

The broader ecosystem is already moving in this direction. Instruction formats like $AGENTS.md help standardize how we guide agents. Skill systems package reusable capabilities. And clearer distinctions between agents, skills, and commands make it easier to design workflows that are both practical and reliable.

The approach outlined here fits into that larger evolution rather than competing with it. In our experience, the real value comes from combining these pieces into a system that gives AI enough structure to be useful without giving up engineering control. You can begin by layering that structure around the patterns you already use: a lightweight prepare step, design-first execution, explicit personas, and a commit gate with human approval.

Rule of thumb: treat workflows and specs as the “operating system” around your agents. The more deliberate that layer becomes, the more you can trust AI to handle real work without giving up control of your engineering process.

Beyond AGENTS.md: Turning AI Pair Programming into Workflows

A Strategic Framework for Integrating AI Into Products

From Prompts to Systems: Building Multi-Agent Architectures with Google ADK

Beyond the Human Eye: AI-Assisted Visual QA for Figma → AEM Workflows