Vizipy Launch: Our 14-Day MVP Journey
Day 0: The Problem Statement
We'd been on both sides of the visual bug equation: shipping broken UIs to clients and catching them too late. After yet another "the button disappeared" incident, we decided to build what we wished existed — visual regression testing that lives natively in GitHub PRs.
The constraint: 14 days from first commit to public MVP.
The Tech Stack Decision
We needed speed without sacrificing reliability. Here's what we chose:
- TypeScript — end to end, no context switching
- Playwright — for deterministic browser screenshots (not Puppeteer — Playwright's auto-wait and multi-browser support won us over)
- Pixelmatch — pixel-level image comparison at sub-millisecond speed
- OpenAI API — for generating human-readable summaries of visual changes ("The button shifted 12px left, which may affect the click area on mobile")
- GitHub Actions — the runtime for everything
Day 1-3: The Screenshot Engine
The first challenge was deterministic screenshots. Browsers render slightly differently depending on fonts, animations, anti-aliasing, and timing. A naive screenshot would produce false positives on every single run.
Our approach:
- Disable animations — inject CSS that sets
* { animation: none !important; transition: none !important; } - Wait for network idle — no pending requests means no loading spinners
- Font loading — explicitly wait for
document.fonts.ready - Consistent viewport — lock width and pixel ratio across all runs
By day 3, we had a screenshot engine that produced byte-identical images across consecutive runs of the same page. Zero false positives.
Day 4-6: The Baseline Strategy
The key insight: your baseline is the main branch. When a PR opens, we compare screenshots of the PR branch against the same pages on main. This means:
- No manual baseline management
- Baselines automatically update when PRs merge
- Branch-specific changes are isolated correctly
- No "approve all" fatigue from cascading baseline drift
We store baseline images as artifacts attached to the workflow run on the main branch's HEAD commit.
Day 7-9: The Diff Engine
Pixelmatch gives us a pixel-level diff, but raw pixel counts aren't useful for developers. "1,247 pixels changed" doesn't tell you anything actionable.
So we built a layer on top:
- Cluster changed pixels into bounding boxes (connected components analysis)
- Classify each region by size, position, and type of change
- Generate a severity score — a 2px anti-aliasing difference scores low; a missing button scores critical
- Pipe the diff regions into GPT-4 with the page context for natural language summaries
Day 10-12: The GitHub Integration
This was the "it has to feel native" phase. We wanted the experience to be: open a PR, see visual changes inline, never leave GitHub.
The bot posts a PR comment with:
- A summary of all visual changes across all tested pages
- Before/after screenshot pairs for each changed page
- AI-generated explanation of what changed and why it matters
- A status check that blocks merge if regressions exceed the threshold
We chose to ship as a GitHub Action first (not a GitHub App) because:
- Zero authentication complexity for users
- Runs in the user's own CI environment
- No server infrastructure to manage on our end
- Users can see exactly what runs in their workflow file
Day 13-14: Polish and Launch
The last two days were all about developer experience:
- Clear error messages when screenshots fail
- Helpful PR comments even when there are zero visual changes ("All 5 pages match baseline")
- A
vizipy.config.tsfile for customizing routes, viewports, and thresholds - Documentation and a 5-minute quick start guide
What We Shipped
On day 14, we had:
- A working GitHub Action that screenshots pages and compares against main
- AI-powered visual diff summaries posted as PR comments
- Support for multiple viewports (desktop + mobile)
- Baseline management that "just works" with the baseline=main strategy
- Sub-2-minute run times for typical 5-page setups
What's Next
The MVP validates the core loop: screenshot → compare → comment → block. Now we're building toward:
- Ignore regions — mask headers, timestamps, and dynamic content
- Component-level snapshots — test individual components, not just full pages
- Smarter baselines — per-branch baseline management for long-lived feature branches
- A dashboard — historical trends, flakiness tracking, and team-wide visibility
The Takeaway
14 days is tight, but constraint breeds focus. We didn't build a platform — we built a sharp tool that does one thing well: catch visual bugs before they ship.
If you're building something similar, our advice: start with the narrowest possible use case and make it feel magical before expanding scope.
*Want to try what we built? Get started with Vizipy for free — it takes less than 5 minutes to set up.*
Catch visual bugs before they ship
Vizipy runs on every PR, generates before/after screenshots, and posts visual diffs right in GitHub.
Related Posts
Why Sentry Misses 61% of Bugs (And What to Do About It)
Traditional error monitoring is blind to visual regressions. A white button on a white background? No error. Just lost revenue.
The $100 Saturday Panic Call: A Visual Bug Post-Mortem
A CSS change broke a client site over the weekend. Sentry stayed quiet. The client didn't.