I don’t see a ton of support for this idea - maybe because it’s impossible? But it feels to me like, what is it sort of imperative usage of agents vs declaritve programming?

Like - it feels really weird to me that the change is a prompt, and not a diff. It feels to me like, the end goal should be something like, a really concise programming language.

People talked about how once you learned C you didn’t read assembler anymore, but that’s because C was a like complete standalone specification. Everything you needed to know was in the C (Eventulaly, I’m sure the first few versions has people still diggingin to the assembler)

7:21 PM You’re onto something real and underexplored. The core tension:

Imperative agents: “Go do this thing” — the process is opaque, the result is probabilistic, replay is impossible Declarative spec: “Here is what correct looks like” — deterministic, diffable, versionable, composable The C/assembler analogy is sharp but you’re actually pointing at something stronger. C didn’t just abstract assembler — it created a closed-world semantics. You could reason about a C program without the compiler. A prompt doesn’t give you that. The “spec” and the “executor” are inseparable.

Why the diff problem is hard:

Prompts aren’t compositional — two valid prompts don’t combine predictably The “meaning” of a prompt is defined by a specific model weight, which changes under you There’s no normal form — infinite prompts express the same behavior What people are actually groping toward:

DSPy (declarative LM programming, auto-optimizes prompts) Marvin, Instructor (type-system style constraints on outputs) Temporal’s workflow model (at least makes execution auditable) Some people think the answer is just… Prolog, basically My actual opinion: The prompt-as-change-unit will look as embarrassing in 10 years as XML config files look now. But I think the solution isn’t a new language — it’s behavioral test suites as the spec, and prompts become a compiler target nobody reads directly. The diff is on the tests, not the prompt.

The reason you don’t see more support for this is that most people building agents are in the “it works, ship it” phase, not the “how do we reason about correctness” phase. You’re asking a PL theory question in an ops culture. . https://dspy.ai/

Yeah, there’s a big part of me that - I’m not holding on to writing programming code as it exists today, but having a documented closed spec that you analyze and change in a structured way, feels much more like “engineering” and sustainable than prompts.

7:24 PM Yeah, and this is actually a pretty principled distinction — not just nostalgia for code.

What makes something “engineering” vs “craft”:

Reproducibility — same spec, same behavior Compositionality — you can reason about parts independently Failure modes are enumerable (or at least bounded) A change has scope — you know what it could affect Prompts fail all four. They’re closer to tuning a medieval organ than engineering.

The deeper problem: A prompt isn’t a spec, it’s a hint to an oracle. The oracle changes. The hint that worked last Tuesday may not work after a model update. You haven’t changed anything but your behavior changed. That’s not engineering, that’s astrology with better marketing.

Where I’d push back on the field:

“Evals” get talked about as the solution but evals are just tests — tests without a spec are just regression anchors, not correctness proofs The “vibe coding” crowd is basically arguing craft is enough, which is fine for prototypes and will be a disaster at scale The honest state of things:

We don’t have the type theory for this yet We don’t even have agreement on what “correct” means for agent behavior The closed-spec dream probably requires either heavily constrained agents or a formal semantics for LLM behavior, neither of which exists You’re essentially asking for the Principia Mathematica moment for AI systems. It may require that the models themselves become more like theorem provers and less like oracles.

One option - and this is probably crazy - is basically - if we really think that AI is the new higher level thing, we build all code at compile time. (or crazier still runtime) we generate an artifact that we never read. The spec is all the code.