Agentic Software Delivery: How I Ship Features Without Writing Code🤖
Over the past several months, I've been developing a method for delivering software that fundamentally changes the role of the engineer. Instead of writing code line by line, I describe what needs to be built, and an orchestrated system of AI agents handles the how — from generating requirements documents, to creating implementation plans with reviewable milestones, to executing tasks in parallel with built-in quality gates.
I call this approach Agentic Software Delivery, and in this post I'm going to walk you through every step of how it works. This isn't theory. This is what I actually do, every day, to ship production software.
The Big Picture
At a high level, here's the lifecycle:
That's the simple version. Each of those steps contains real depth — product thinking, architectural reasoning, quality gates, and hard-won lessons about what actually works when you hand execution to AI agents. Before I show you the tooling, let me walk you through the philosophy behind each step.
From Messy Ideas to Product Vision
The best features don't start with a spec. They start with a conversation.
Sometimes I'm talking out loud — pacing around my office, working through an idea in real time with Claude. Sometimes it's a voice memo from a brainstorm with a colleague where we riffed on what could be possible. Sometimes it's me typing furiously into a Claude prompt at 11pm, brain-dumping fragments of thought as fast as they come — half-formed user stories, architectural hunches, competitive observations, things that annoy me about existing solutions.
None of this is structured. None of it is "ready." And that's the point.
What I've learned is that the messy, fragmented beginning is where the real insight lives. The trick is having a thinking partner who can take that raw material and help you find the signal. Claude doesn't just accept my brain dump and start coding. Instead, it pushes back. It asks clarifying questions. It finds the contradictions in my thinking. It surfaces assumptions I didn't realize I was making.
"You mentioned this is for coaches, but you also described a parent-facing dashboard. Are these the same user or two different personas with different needs?"
"You said performance is critical, but you're describing real-time updates across hundreds of concurrent users. Have you thought about whether this is a WebSocket architecture or if polling with smart caching would be simpler and sufficient?"
This back-and-forth — sometimes 10 or 15 rounds — is where a vague idea crystallizes into a real product vision. Not a feature list. A vision: the problem we're solving, the opportunity in front of us, and the value we're delivering to actual humans.

Formalizing the PRD
Once the vision is clear, we formalize it into a Product Requirements Document. But even here, the PRD isn't just a list of features to build. It's a document that captures the why alongside the what:
The product principles matter more than most people realize. When I'm building something, I need the downstream agents — the ones writing code, designing interfaces, making architectural choices — to understand what kind of product this is. Is it a power-user tool where speed and information density matter? Is it a consumer app where visual delight and simplicity are non-negotiable? These principles cascade through every decision that follows, from the tech stack to the component library to the error handling patterns.
Before we move on from the PRD, we make sure we haven't lost the thread. The problem we're solving, the opportunity we're seizing, the value we're delivering to customers — all of that needs to be front and center, not buried under feature lists.
Turning Vision Into an Executable Plan
Now comes the part where most AI-assisted development goes wrong.
Most people take a vague idea, paste it into a prompt, and say "build this." What you get back is code that technically runs but doesn't cohere — no architecture, no phasing, no understanding of what matters first and what can wait. It's like handing someone a pile of lumber and saying "make a house" without blueprints.

The implementation plan is the blueprint.
We take the PRD — with its vision, requirements, and principles — and transform it into a plan that could be picked up by a team of developers at any moment. To get there, I ask Claude to look at the problem from multiple perspectives:
- A technical architect who understands how to design the systems, APIs, and data models needed to realize the product
- A product designer who can translate our product principles into concrete UX decisions that deliver real delight to customers
- A tech lead who ensures the plan is thorough and clear enough for her team to execute successfully — adding a sprinkle of low-level reality to a plan that's currently living at a high level
The result is an implementation plan organized into milestones, each containing phases, each containing tasks. And every task reads like a well-composed Jira story: not just the what, but — critically — the why, and a set of acceptance criteria that can prove the task has been completed correctly and successfully.
Delivering Incremental Value
This is where careful planning pays off. Each task in the plan should deliver value, either:
- Intrinsic value — foundational work that benefits the architecture or developers. Standing up infrastructure, creating interfaces, implementing a framework, establishing patterns.
- Extrinsic value — features a user can actually experience. Screens they can see, workflows they can complete, capabilities they can use.
We plan phases so that each one delivers something meaningful. Not "Phase 1: set up everything, Phase 2: build everything, Phase 3: test everything." Instead, each phase produces a working increment that someone could look at and say "yes, this is moving in the right direction."
Staying in the Loop
This incremental delivery model unlocks something critical: I can interject at any moment.
Between any task, any phase, any milestone — I have an opportunity to course correct. "That design isn't quite right, let me spend some time giving you feedback so you can adjust." "Looks like I described this incorrectly, let's fix it." "Check the test coverage for me." "Explain the architecture so I can make sure it's correct."

And those course corrections aren't lost. Depending on the complexity and blast radius, they can be folded back into the implementation plan as formal updates, or handled on the fly as ad-hoc adjustments. A typo in a label? Fix it inline and move on. A fundamental misunderstanding of the data model? That goes back into the plan so every downstream task reflects the correction.
This is one of the most important checkpoints in the entire process. I'm still in control when I need to be. I'm still monitoring the flight performance, watching outputs, validating decisions. Claude provides the autopilot — but I'm the pilot. I decide when to take the stick.
Dependencies are mapped explicitly, and we optimize aggressively for parallelization. If Task A and Task B don't depend on each other, they should be executable at the same time. This is where agentic delivery starts to pull away from traditional development — we can actually act on that parallelism.
Why a Single Markdown Document
I keep the implementation plan as a single markdown document inside the repository where the product is being delivered. This is a deliberate choice.
A single document represents a product roadmap that can be analyzed in a single operation. An agent can read the entire plan, understand where we are, what's done, what's next, and what's blocked — all in one context load. Compare that with Jira or Linear, where loading the equivalent roadmap requires dozens of API calls to fetch boards, sprints, stories, subtasks, and comments before the agent can even begin to reason about what to do next.
That said, Jira and Linear exist for good reason — they're where the human overlords live. Product managers need burndown charts. Leadership wants dashboards. Teammates need to know what's in flight.
So we pair the implementation plan with Jira or Linear stories by referencing the external ticket directly in each task within the plan. This gives us the best of both worlds: an agent-optimized document for execution, and a business-friendly view for everyone else. And yes, we can use agents to keep the two in sync — but that's a topic for another post.
Here's what a single task looks like inside the plan:
markdownTask 2.3.1: Real-Time Notification System
- Story: ENG-247
- Status: IN PROGRESS
- Priority: P0
- Size: L (5-7 days)
- Dependencies: Task 2.1.3 (WebSocket infrastructure), Task 2.2.1 (User preferences)
- Parallel: Can run alongside Task 2.3.2 (Email digest) and Task 2.3.3 (Mobile push)
- Requirement: PRD §4.2 — Users receive immediate feedback when teammates interact with their content
Description: Build the server-side notification pipeline and client-side notification center that delivers real-time updates when collaborators comment, share, or modify shared documents. This is the foundation that email digest (2.3.2) and mobile push (2.3.3) build on top of.
Implementation Details:
- Notification Data Model: Create
Notificationtable with polymorphicevent_type(comment, share, edit, mention), recipient, sender, read/unread status, and timestamps. Index on recipient + unread for fast queries.- Event Pipeline: Listen for domain events (
CommentCreated,DocumentShared,UserMentioned) and fan outNotificationrecords to all collaborators, excluding the actor who triggered the event.- WebSocket Delivery: Push new notifications to connected clients via existing WebSocket infrastructure (Task 2.1.3). Include catch-up mechanism for notifications missed while disconnected.
- Notification Center UI: Slide-out panel (desktop) or bottom sheet (mobile) with grouped notifications, unread badge, mark-as-read, and virtual scrolling for users with hundreds of notifications.
Acceptance Criteria:
CommentCreatedproduces notification for document owner and all collaborators (excluding the commenter)UserMentionedproduces notification for the mentioned user, even if they aren't a collaborator yet- Notifications delivered via WebSocket within 2s of the triggering event
- Catch-up delivers missed notifications on reconnect (max 100, paginated)
- Notification center displays unread count badge (caps display at "99+")
- "Mark all read" completes in <500ms for 1,000+ notifications
- Duplicate notifications prevented (idempotency key on event_type + target + recipient)
- Fan-out for a 50-person document completes in <5s
- Mobile bottom sheet has 44px+ touch targets
Testing:
- Unit tests for event handlers and fan-out logic (90% target)
- Integration tests for WebSocket delivery and reconnection
- Load test: 50 concurrent recipients, verify <2s delivery
- E2E: Create comment → notification appears in collaborator's panel within 2s
Observability:
- Track notification pipeline latency (event → delivery)
- Monitor WebSocket delivery success rate (target >99.5%)
- Alert on fan-out failures or delivery latency >5s
There's a lot to unpack there, but the key things to notice are:
- Every task traces back to a requirement in the PRD and a ticket in your project tracker. The agent knows why it's building this, not just what.
- Implementation details give the agent enough architectural direction to make sound decisions without over-constraining it.
- Acceptance criteria are concrete, testable, and provable — including performance targets and edge cases. There's no ambiguity about when a task is "done."
- Testing and observability are first-class concerns, not afterthoughts. The agent knows it needs to write tests and instrument the code as part of the task itself.
- Dependencies and parallelization are explicit, so the agent (and the orchestration layer) can reason about what's safe to run concurrently.
Parallel Execution: Where It Gets Fun
Now we have a plan. Time to build.

I have to give Geoffrey Huntley↗ a lot of credit here. Reading about his Ralph Wiggum-style↗ approach to agentic coding — spinning up independent agents that each chew through a piece of work — was a major inspiration. You'll see a ton of similarities in what follows, and his writing on the subject is well worth your time.
The execution phase works like this:
The key insight is step 4. We're not just automating the creation of code. We're automating the safety and quality gates of software development that engineers have come to trust over the last couple of decades. Code reviews. Security reviews. Architectural consistency checks. Test coverage analysis. These aren't optional nice-to-haves — they're the guardrails that keep agentic delivery from becoming "AI-generated code that nobody checked."
And we're constantly improving those gates. Every time we find a pattern of mistakes, we add a check. Every time a security issue slips through, we tighten the review. The system gets smarter over time because the prompts and review criteria evolve with what we learn.
From Philosophy to Practice
That's the method — discovery, planning, execution — with human oversight woven throughout. Before we move on, let's zoom back out and see the full picture one more time, now that you have a feel for the depth behind each phase:
The question you're probably asking at this point: "This sounds great in theory, but how do you actually do all of this consistently, project after project, without burning out on prompt engineering?"
That's exactly the problem I ran into. For months, this workflow lived as a collection of carefully crafted prompts. I was copy-pasting the same patterns, adjusting the same review criteria, re-establishing the same agent roles at the start of every session. The philosophy was solid, but the operational overhead was real.
So I turned it into a tool.
Enter Synthex
Synthex is a Claude Code plugin that encodes this entire process — every phase, every agent role, every review gate — into reusable commands and specialized AI agents. It's open source, part of the LumenAI↗ marketplace, and it evolved alongside the method itself.
Synthex didn't appear fully formed. It grew as I perfected each prompt, and it accelerated as Anthropic shipped new capabilities — sub-agents for parallel work, skills for reusable commands, plugins for distribution, and agent teams for orchestrating specialist roles. Each feature unlocked a new level of sophistication in the workflow.
The Agent Organization
At the heart of Synthex is a structured organization of 15 specialized AI agents — a virtual engineering team where each agent has a clearly defined role and domain expertise.
The Orchestration Layer coordinates execution. The Tech Lead, Lead Frontend Engineer, and Product Manager break down work, delegate to specialists, and roll up results.
The Specialist Layer provides deep domain expertise. The Architect reviews system design. The Security Reviewer checks for vulnerabilities. The Quality Engineer ensures test coverage.
The Research & Analysis Layer drives continuous improvement. UX Researchers inform product decisions. The Metrics Analyst tracks delivery health with DORA metrics. The Retrospective Facilitator helps the team learn.
How Synthex Handles Each Phase
Let's walk through how each phase maps to actual Synthex commands.
Phase 1: The PRD Process
The Product Manager agent conducts the structured interview. It doesn't accept a brief description and start generating requirements autonomously — it asks targeted questions across multiple dimensions, typically 3-5 per round:
Rendering diagram...
The output lands at docs/reqs/main.md — version controlled, reviewable, ready for planning.
Phase 2: Peer-Reviewed Planning
The write-implementation-plan command transforms the PRD into a peer-reviewed plan:
Rendering diagram...
The peer reviewers include an Architect, a Designer, and a Tech Lead. A key design decision: each review cycle spawns fresh agent instances to prevent context exhaustion.
Phase 3: Parallel Execution
The next-priority command drives execution:
Rendering diagram...
Each Tech Lead orchestrates a team of specialists:
Rendering diagram...
Quality Gates: Not Optional
Code review in Synthex isn't a single pass by a single agent. The review-code command runs a multi-perspective review in parallel:
Rendering diagram...
The verdict follows strict rules: if any reviewer reports FAIL, the overall verdict is FAIL. Security review is a mandatory quality gate — the Tech Lead cannot bypass it.
The Full Command Reference
Synthex provides 11 commands spanning the entire delivery lifecycle:
| Command | Phase | What It Does |
|---|---|---|
init | Setup | Scaffolds project structure and configuration |
write-implementation-plan | Plan | Creates peer-reviewed implementation plans from PRDs |
next-priority | Build | Executes highest-priority tasks in parallel |
review-code | Build | Multi-perspective code review with fix loop |
write-adr | Build | Documents architectural decisions |
write-rfc | Build | Creates Requests for Comments for proposals |
test-coverage-analysis | Build | Analyzes and improves test coverage |
design-system-audit | Build | Audits UI compliance with design system |
performance-audit | Ship | Full-stack performance analysis |
reliability-review | Operate | SLO compliance and operational readiness |
retrospective | Learn | Structured team retrospectives |
These map to the five phases of the delivery lifecycle:
Why This Works
I've been using this approach for months, and there are a few things that make it fundamentally different from just "asking ChatGPT to write code":
Structure creates quality. By separating requirements, planning, and execution into distinct phases with review gates between them, you avoid the biggest pitfall of AI-assisted coding: generating code without understanding what you're building or why.
Specialization beats generalization. A single AI agent asked to "build a login page" will produce something that works. Fifteen specialized agents — one for architecture, one for security, one for testing, one for frontend, one for code review — will produce something that's production-ready.
Parallel execution changes the game. With git worktrees isolating each task, you can implement three features simultaneously without conflict. What used to take a sprint takes an afternoon.
Review loops catch what you miss. The peer review on implementation plans catches ambiguity before code is written. The code review catches bugs before they ship. The security review catches vulnerabilities before they become incidents.
Everything is traceable. Every task maps to a requirement. Every code change maps to a task. Every architectural decision is recorded.
What This Changes About Your Role
This approach doesn't eliminate the need for engineering skill. It transforms it. Instead of spending your time writing for loops and debugging CSS, you spend it on:
- Product thinking — What should we build and why?
- Architecture decisions — How should the system be structured?
- Quality judgment — Is this output good enough for production?
- Strategic prioritization — What matters most right now?
These are the highest-leverage activities an engineer can do. Agentic Software Delivery lets you spend all your time on them.
Try It Yourself
Everything I've described in this post is available today, for free, as an open-source Claude Code plugin. If any of this resonated, the best way to understand it is to try it on a real project.
Synthex
Part of the LumenAI plugin marketplace
15 specialized AI agents. 11 commands. Full delivery lifecycle coverage — from brainstorming your first idea to shipping validated, reviewed, production-ready code.
/plugin marketplace add bluminal/lumenai/synthex:init/synthex:next-priorityStart small — pick your next feature, write the PRD with Claude, generate the plan, and let next-priority handle the rest. I think you'll be surprised at how quickly you get from idea to working code.
In future posts, I'll go deeper into each phase — how to write PRDs that produce great plans, how to tune the review loop for your team's needs, advanced patterns for managing complex multi-milestone projects, and how to keep your implementation plan in sync with Jira and Linear.