Beyond the Human Eye: AI-Assisted Visual QA

Introduction

Visual QA is one of those activities everyone agrees is important…right up until it becomes the bottleneck.

A page looks “basically right,” you’re under deadline, and that last review pass turns into a game of spot the difference: margin tweaks, heading sizes, tiny spacing inconsistencies that are easy to miss and painful to repeat across dozens (or hundreds) of pages.

In a recent Quito Lambda talk at Stack Builders, our team explored a practical approach to reducing manual visual QA time using AI-assisted development and pixel-based visual comparison: pulling a baseline from Figma, capturing the “about to go live” view from Adobe Experience Manager (AEM), and generating a visual diff report that shows exactly where the UI diverges.

Stack Builders works extensively with AEM and is an official Adobe Experience Manager partner, so this kind of workflow is directly aligned with the kind of enterprise-grade content operations we help teams modernize.

The Pain: Manual Visual QA Doesn’t Scale

If you’ve ever reviewed two screenshots that look identical, you know how this goes:

A paragraph is shifted by ~40px.
A heading is an H2 instead of an H3—visually “almost the same,” but not quite.
Spacing changes by a couple of pixels, and nobody notices until a stakeholder does.

Manual checks are:

Repetitive and tiring
Time-consuming
Inconsistent (different reviewers notice different things)
Risky (small UI regressions slip through and show up in production)

And importantly, you repeat the same effort for every page, every time.

The Real-World Workflow: From Content to “Live”

In many organizations (especially those running AEM), the pipeline often looks like this:

Content writing (messaging, paragraphs, structure)
Design in Figma (layouts, tokens, components, specs)
Authoring in AEM (drag-and-drop components, build pages from templates)
Visual QA (verify AEM matches Figma)
Publish (page goes live)

AEM is particularly powerful here because it enables non-developers to assemble pages using controlled templates and components, great for scale, but it also means small configuration differences can produce subtle visual drift.

The Goal: Faster QA, More Consistency, Better Evidence

The objective isn’t to “remove QA,” it’s to make QA more reliable and dramatically less manual.

A good automated approach should:

Reduce the time spent visually inspecting pages
Increase consistency across reviews
Produce evidence (diff images + percentage change) that teams can act on quickly

This is where pixel-based visual comparison shines.

Pixel-Based Comparison: Simple Idea, Huge Leverage

At the core is a straightforward method:

Capture Screenshot A (baseline, e.g., from Figma export)
Capture Screenshot B (actual UI, e.g., AEM preview)
Compare pixels (RGB values by position)
Output:
- Diff image/heatmap
- Percent difference
- Optional: segmented diffs per section (header, hero, etc.)

This is a classic form of visual regression testing, where you compare screenshots to catch unintended UI changes.

Where AI Fits: Building the Tool Faster (and Better) with “Vibe Engineering”

A key theme from the talk was the difference between:

Vibe coding: “Prompt it and ship it.”
Vibe engineering: Use AI for speed, but keep engineering discipline—security, reliability, maintainability, and real-world scalability.

The AI helped accelerate:

Rapid prototyping of integrations (Figma + AEM preview capture)
Refactoring guidance
Documentation generation
Security improvements (e.g., safer credential/token handling)

But the takeaway was clear: AI is strongest when paired with experienced engineering judgment, setting constraints, reviewing outputs, and enforcing standards.

A Practical Architecture: Figma + AEM + Screenshot Diffing

A lightweight architecture for AI-assisted visual QA looks like this:

Inputs

Figma: design source of truth
AEM Preview: “view as published” preview before release

Pipeline

Pull/export the relevant frame from Figma (via API)
Use browser automation to load AEM preview and capture a screenshot
Normalize:
- crop / resize
- reduce whitespace
- align viewport
Compare images (pixel-by-pixel)
Produce a report: baseline, actual, diff/heatmap, percent change

Example Tech Stack

Node.js + TypeScript
Express for APIs + Helmet for security headers
Playwright (Chromium) for headless browser automation + screenshot capture
Sharp for image preprocessing (crop/resize/cleanup)
pixelmatch for pixel-based diffs

This combination is popular because it’s scriptable, fast, and easy to run locally or in CI.

What the Report Gives You (and Why it Matters)

Instead of “it looks off somewhere,” you get:

A diff heatmap that pinpoints the UI drift
A different percentage that helps establish thresholds
A repeatable process that’s consistent across reviewers/pages

A “good” page might show ~3% difference (often driven by tiny nav or content mismatches), while subtle layout issues (like heading sizing + a 40px indentation) pushed the diff higher (~5%), and the heatmap immediately highlighted the problem areas.

This is the big win: you can move from subjective review to actionable evidence.

Why “AI Image Analysis” Didn’t Fully Replace Pixel Diffs (Yet)

We’ve also experimented with using an AI model to interpret differences more semantically (“this heading should be smaller,” “this padding is off”). That part didn’t work as reliably as hoped.

The likely reason: pure screenshot-based AI analysis can struggle to infer intent and structure unless it’s grounded in the design system and underlying specs.

Which leads to the most important next step…

Roadmap: From Pixel Diffs to Design-System Validation

Pixel diffs are powerful, but the long-term path is even better:

1) Tighten Your Design System Bridge (Figma ↔ Implementation)

If Figma tokens and component structure map cleanly to your code (or CMS components), you can validate:

typography scales
spacing rules
component variants
layout constraints

This reduces false positives and moves QA closer to “verify intent,” not just pixels.

2) Use Design Tokens Consistently

Define tokens once (e.g., “Small = 14px”) and ensure they’re respected across:

Figma
CSS / component library
AEM component styles

3) Expand Breakpoints

Desktop-only diffs are a start. Add:

tablet
mobile
responsive states

4) Batch Runs

Instead of page-by-page:

run an entire path, site section, or folder of pages
produce a consolidated report for review

5) Broaden CMS Compatibility

AEM is a great first target, but the concept generalizes to other CMS platforms.

Want to Make Visual QA Faster in Your AEM Pipeline?

If your team is authoring high volumes of pages in AEM and spending too much time on repetitive reviews, this kind of workflow can pay off quickly, especially once it’s wired into CI or editorial release processes.

Stack Builders works with organizations modernizing their AEM implementations and delivery pipelines.

Beyond the Human Eye: AI-Assisted Visual QA for Figma → AEM Workflows

Part 2: The Evolution of Multidisciplinary Project Management Integrating AI

Part 1: The Evolution of Multidisciplinary Project Management Integrating AI

What Founders Need to Know Before Building Their First AI Agent