Workflow Configuration for Agentic Systems
Understanding agentic workflow systems
*This piece is part of How to Design an Agentic System, a series on the design decisions required to specify an internal agentic system.
If you have repeatable, high-value tasks that you want your agentic system to perform, you’re going to need a way to describe those tasks so the agent does them consistently. That’s a workflow. It’s the same concept as a pipeline in CI/CD or a runbook in operations, just with an LLM doing the work instead of a script or a human.
The Components
A workflow definition has five parts:
Steps. The sequence of things that need to happen. Gather context, generate code, run tests, fix failures, submit for review. Or: download email, classify by urgency, draft responses, queue for human review. The steps depend on your task. The point is that they’re explicit.
Verification gates. What must be true before the agent moves to the next step. Tests pass. Schema validates. Output matches a format. Without gates, the agent will happily hand broken output from step 3 to step 4 and keep going.
Constraints. What the agent can and cannot do at each step. Which files it can edit. Which APIs it can call. Which branches it can push to. Constraints are how you keep blast radius small.
Exit criteria. What “done” looks like, specifically. Not “write good code” but “all tests pass, lint is clean, PR description follows the template.” The more concrete, the less judgment the agent needs to apply.
Escalation rules. When the agent stops and asks a human. After two failed test runs. When it encounters a file it doesn’t have permission to edit. When confidence is low. Every workflow needs a “pull the emergency brake” condition.
That’s it. The format doesn’t matter much. Some teams use YAML. Some use markdown. Squarespace checks prompts into their repo as part of CI. Gas Town uses git commits as handoffs between steps. The representation is less important than having the five components defined.
Deterministic Shell, Non-Deterministic Core
Here’s the key architectural insight, and every production system I’ve looked at converges on it independently: wrap the non-deterministic model calls in deterministic code. The workflow handles sequencing, verification, and constraints. Those are deterministic. Step 1 runs before step 2. The gate checks pass or fail. The constraint allows or blocks. The model handles judgment within those boundaries. It decides how to implement the function. It decides what to say in the email. It figures out why the test failed and how to fix it.
Stripe’s Blueprints are the clearest example: deterministic code nodes connected by flexible agent loops. The sequence is fixed. The agent’s behavior within each node is not. Reliability at the pipeline level. Flexibility at the step level.
You get this wrong when you let the model control the sequencing. It will skip steps, reorder things creatively, and occasionally decide that step 4 isn’t necessary today. The deterministic shell is what makes the system predictable.
DAGs with GOTOs
If you’re evaluating orchestration frameworks, most will claim “DAG-based execution.” Just FYI, that isn’t really true. Most of them are using DAG semantics, but A stands for acyclic (no loops/cycles allowed) and agentic engineering is nothing but loops.
Agentic workflows have feedback loops: generate code, run tests, tests fail, regenerate. That’s a cycle, not a DAG. LangGraph handles this by adding conditional edges that point backwards (DAGs with GOTO’s). Temporal and Restate skip the pretense and use imperative code with while loops. The honest framing: these are bounded cycles with retry limits and token budgets. The boundedness is what makes them tractable. If your workflow has feedback loops (and it probably does), make sure your tooling handles cycles honestly instead of forcing you to pretend they don’t exist.
Tooling
You have more options than you probably think, and they range from heavyweight to surprisingly simple.
Durable execution frameworks (Temporal, Restate, Inngest) if your workflows outlive a single session, need retry logic, or involve long-running steps. These are real infrastructure, but they handle the hard coordination problems for you.
Graph-based orchestrators (LangGraph) if you want visual workflow design and are comfortable with the DAG-with-gotos tradeoff.
Declarative configs. Daunis (2025) built a declarative workflow language validated at PayPal that compressed 500+ lines of imperative code to under 50 lines of DSL. Non-engineers could safely modify agent behaviors through configuration.
Scripts and markdown. Many production systems are simpler than you’d expect. Squarespace checks prompts into CI. Gas Town uses git. OpenAI’s Harness Engineering team stored plans as first-class git artifacts and built roughly a million lines of code across 1,500 PRs with zero manually written source code. The sophistication was in the workflow definitions, not in the tooling that ran them.
If you’re building, start simple. A script that runs steps in sequence with a test gate between each one is a workflow system. You can add complexity when you know where you need it.

