Turtles All the Way Up

I have no a priori reason to believe there's anything that my agents can't eventually do.

Mar 19, 2026

I have no idea what the limits are to what can be achieved using today’s foundation models with clear instructions, appropriate context, and a thoughtful harness. I find it useful to joke that my job is simply to train my future robot overlords. (I try not to think about that piece too much.)

But here’s what I do know. Any activity can be decomposed. Researchers can be dispatched to find appropriate context. Decision heuristics can be suggested. You can build a deterministic pipeline with some combination of script, model and human-in-the-loop steps to achieve your business outcome, whether that’s writing some python to import a CSV file or proposing a competitive strategy for 2027.

When you start, your agents will probably be bad at all of those things. That’s fine. Start with a human in the loop, build a compounding system, and make sure that no keystroke goes to waste. If you provide an input to the model, it should be used immediately to improve the current work product and then retrospectively overnight to improve the capabilities of that agent.

Every correction is training data. Every “that’s not my voice” or “wrong priority” or “ask Elise, not Morgan” makes the next session better. The question isn’t whether you’re doing RLHF. You are. The question is whether you’re efficiently leveraging all of the training data that you’re providing.

The progression looks something like this:

Review everything.
Review some things.
Get a report on progress.
Long weekends and three-martini lunches :)

At Gather, three weeks in, we’re somewhere between steps one and two on most tasks. I’ll let you know when we hit step four!

I’m not claiming that agents can build and run an entire company. I’m just saying that I have no a priori reason to believe there are specific tasks that are fundamentally incapable of being performed by silicon-based life forms. And if I’m wrong, I’m still getting a substantial speedup across my company while maintaining robust guardrails and continuing as a human in the loop to bring whatever my unique contribution might be.

That’s a pretty good worst case.

What are you assuming your agents can’t do? And have you tested that assumption recently?

The Future of the Engineering Org

Discussion about this post

Ready for more?