:: Hortensia

Who is this document for?

If you’re Boris Cherny and you merged three hundred PR’s yesterday, I doubt there’s anything in this doc I’m suggesting that’s going to help you out. (But let’s add some context here, Boris Cherny spent like 10 years at Facebook automating code review by hand) If you’re wildin’ out on gastown, or trying to figure out how to up your multi-agent work from from 8 to 10 agents. This may not be for you. This is for the folks, who are having a lot of fun but also have questions - about the long term sustainability of the business. About what this means for our old software engineering best practices. About whether we’re really going to never read or type code again.

The thing that resolves both problems — feeling too open AND scope creep — is usually: write the ending first, then cut everything the ending doesn’t need.

Practical suggestion: write the last three paragraphs right now, without looking at what you’ve written. What do you actually want to say? Then read those paragraphs against what you have and the mismatch shows you exactly what to cut and what’s missing. The cut stuff isn’t wasted — it’s a future post.

The takeway is basically - if you’re lost amidst all the hype and “you should be orchestrating 10 agents” and “vibe code straigth to production” and “300 MR’s in a day”. This is the guide to give you some confidence to sit and learn without anxiety.

One thing I know is deferred is meta-engineering i.e. ai-harness engineering, I want that to be in the conclusion as a part 2.

There’s an idea of “important” or “engineering” code/data. That should be clear, visible and changed intentionally.

I describe it as: * AI should never be used to write the code that is central to the value of your team/product/company. The agile principal “Working code over comprehensive documentation” is one that I think has been turned away from that I want to bring back.

Part of this is the counter-intuitive middle ground that like - if we really believe in a world where spec is everything, than it’s an anti-pattern to read code (this is iinked to a separate post to flesh out)

WE’ve always learned by doing. Good engineers have learned by going one level deeper. You don’t have to and shouldn’t stop writing cod.e. Make it a mix. “Stay in the drivers seat. " is the core principal. Keep writing code as much as you need to to feel like you’re making the changes. I also call this “Bionics with robotics”
In general, the sfotware engineering best practices of the past 10 years aren’t dead. Atomic Changes,

The post has a real center of gravity, but it’s buried. The Archive.org anecdote on line 93 is the best thing in the piece — specific, personal, earns the “driver’s seat” argument better than anything abstract. It’s doing more work than you’re giving it credit for.

From what’s in the draft, the candidates seem to be:

Protection rings: AI can change outer code freely; inner code (your value) needs human eyes. Stay in the driver’s seat: If you’re not making the changes, put the keyboard down. Engineering is change management: The job has always been “can I see what changed and why” — AI doesn’t change that. The hype compares your floor to someone else’s ceiling: Boris’s 300 PRs isn’t the baseline. It’s a ceiling. We’ve always shipped more code than we could review: This isn’t a new problem AI created. Skills erode if you only prompt: Learning by doing isn’t dead.

Some other thoughts or summaries to throwo out there:

The word “sustainable” is doing a lot of work there in a good way — it shifts the conversation from “should I use AI” (which feels like a values debate) to “how do I build a practice that lasts” (which is a craft question). Engineers respond to craft questions.

A few variations to try on:

“A system that doesn’t respect how humans learn isn’t sustainable — and right now, most AI advice doesn’t.” “The engineers who thrive in five years will be the ones who built a practice, not the ones who optimized for throughput.” “Sustainable means you can still engineer when the hype settles. Most current advice doesn’t get you there.” The one thing I’d stress-test: “assuming it’s not solved” might let readers off the hook too easily — someone anxious about job loss will cling to that clause. You might want to make it more defiant: even if it gets solved, the path there still requires you to keep your hand on the wheel. The sustainable argument holds either way.

Opening — “I can’t be the only one…” (you’re already here, keep it) The gap — what the hype says vs. what actual onboarding feels like Thesis to close the intro — something like: Maybe in three years this is solved and we’re all out of a job. But assuming it isn’t, we need a sustainable way to keep being engineers — and most current advice doesn’t give us one. Then the principles — each one earning the next That last intro sentence does something important: it gives the anxious reader permission to keep reading without feeling like they’re being told to ignore reality. It says I’ve considered the scary version, now here’s the useful version.

Want to try drafting that closing intro paragraph now, in your own voice? Even a rough sentence or two. Then you’d have a real anchor to build the outline from.

This line:

“saying that your agents should be busy 24/7 that there’s this gap between what people are recommending, and my personal onboarding process with AI”

I think personalizing it is more powerful, but it doesn’t sound quite right. An alternative would. be to say “A normal persons ability to use these tools” or something, but again, it hink it’s more powerful not to depersonalize it. The issue is “onboarding process” — it sounds like you’re a new hire filling out HR forms. The gap you’re describing is more visceral than that.

Try something like:

“…that there’s this gap between what people are recommending, and what I actually experience sitting down to work.” “…and my lived experience of actually using these tools.” “…and what it actually feels like to open a chat window and try to get something done.” The last one is probably closest to what you mean — it keeps it personal, it’s specific, and “try to get something done” captures the friction without making you sound behind.

The key is replacing “onboarding process” with something that sounds like a person, not a workflow.

20-ai-assisted-coding.md#120-121 Maybe the software engineering best practices section is a different blog post. It’s central to thi sthesis, and I want to include it high level, but this feels more cultural/vision, not specific to anyone section. Right — it’s a supporting argument, not a section. The point is “the fundamentals didn’t die,” and you only need to assert that, not prove it here.

It probably belongs in the introduction or just after the thesis close, as one or two sentences: For all the things AI changed, we’re not talking enough about what it didn’t. The practices that made software engineering work — small changes, clear ownership, understanding what you’re shipping — those are more important now, not less.

Then you can point to Part 2 or a separate post for the specific practices (atomic commits, MR size, etc.). That way the idea is load-bearing in this post without becoming its own section that needs to be fully developed.

The structure would be:

Opening (emotional) Thesis close + “the fundamentals didn’t die” (2-3 sentences) Principles (protection rings, driver’s seat, change management) Conclusion (sustainable practice, permission to be calm, Part 2 teaser) That’s a complete post. Everything else is either Part 2 or a link.

I’m writing a blog post “AI assisted development for the rest of us”, and I think this is one of my core principles I don’t see documented often:

I am 100% convinced that software engineering should continue to be the art of safely and intentionally transitioning code from one state to another state.

But this will probably be forever difficult reconcile without a standard definition of code.

7:22 AM Strong take, and I think you’re mostly right but slightly off on the diagnosis.

The core principle is solid — state transition safety is genuinely underappreciated in AI-assisted dev discourse, which is obsessed with generation and almost silent on transition integrity.

But “standard definition of code” isn’t actually the blocker. Here’s why I’d push back:

We already have the definitions. Code is text that compiles/interprets to deterministic behavior. ASTs, IRs, bytecode — these are all well-specified. The problem isn’t definitional ambiguity. The real reconciliation problem is that AI operates on the latent space of code, not code itself. It learned statistical relationships between tokens. It doesn’t natively reason about state transitions — it pattern-matches to outputs that look like valid transitions. So the tension isn’t philosophical, it’s architectural. The “art of safely transitioning state” requires something like: precondition awareness, invariant preservation, rollback semantics. LLMs have none of these natively. Tools bolt them on (tests, diffs, linters) but they’re not intrinsic. Your principle is actually a criterion for evaluating AI tools, not a practice that needs reconciling. The question becomes: does this tool respect the state transition contract, or does it just produce plausible-looking output? The sharper version of your thesis might be: AI dev tools are only trustworthy insofar as they are state-transition-aware, and most aren’t.

What’s the angle — are you arguing practitioners need to hold the line on this discipline, or that tools need to be redesigned?

So, I disagree! Mitchell hashimoto wrote this blog post about how infrastructure as code should mean infrastructure as spec not “programming code”

Maybe we need the word spec to be more formal here?

But the agile phrase: working code over comprehensive documentation I think also fits into this alternative definition of code.

Most people seem to use code to mean “programaminy languates@ when saying g coding is solved

7:24 AM Now we’re getting somewhere. You’re pointing at a semantic collapse — “code” got flattened to mean “programming language syntax” and that’s doing real damage to the discourse.

The Hashimoto framing is sharp. Let me steelman your position hard:

“Code” in the broadest sense = any formal, executable specification of intent That includes: IaC, SQL, config files, Makefiles, CSS, regex, OpenAPI specs, Dockerfile, even well-structured prompts The Agile “working software” principle implicitly treats behavior as the ground truth, not the syntax that produces it So when people say “coding is solved by AI” they mean token generation in programming languages — which is a tiny slice of the actual problem space Where I think the tension gets interesting:

Hashimoto’s “infrastructure as spec” argument works because infra tools have strong declarative semantics — the spec IS the state transition But most “code” in practice is imperative and stateful, which is exactly where AI generates plausible-looking garbage The Agile framing cuts the other way too — “working code” over docs means behavior is the spec, which makes AI-generated code that passes tests technically valid even if it’s a mess My actual opinion: You don’t need a new word. You need the industry to admit that “code” always meant specification of intended behavior and programming languages are just one substrate. The AI hype cycle accidentally revealed that most people never believed that.

Your state-transition principle holds either way — it’s substrate-agnostic.

Yeah I think that like, where I agree with the industry and maybe - one word code, two subtypes -

Some code is authoritative. You need the retry logic of an api client to work very specifically every time, and you need that retry logic to be systematically grokked and Changed.

This ain’t true with most things but say - api controllers, react template code. A lot of that stuff vague intent is really enough. That’s why you can say “give me a react login form and the AI does great. And more importantly reloading the page when you say “with red text” is enough,

But with API retry logic, you’re probably going to want to ensure the actual written code is clear and auditable.

7:28 AM Yes. This is a real distinction and I think it maps cleanly onto existing concepts:

Authoritative code = code where the text itself is the artifact. Correctness, auditability, and change history matter. The diff is load-bearing. Intentional code = code where the behavior is the artifact. The text is disposable scaffolding. Regenerating it wholesale is fine. This actually rhymes with:

Accidental vs essential complexity (Brooks) Mechanism vs policy (classic Unix) Specification vs implementation (formal methods) Where I’d push back slightly: your examples might be off.

React login form can be authoritative — if it handles auth state, accessibility, CSRF tokens. You picked a good vibe example but the line isn’t “frontend vs backend” The actual axis is probably: does a bug here have asymmetric consequences? Retry logic with exponential backoff that silently swallows errors = catastrophic. Login form with slightly wrong padding = who cares So the real classifier for “authoritative” might be:

Failure modes are non-obvious Bugs compound or cascade Regulatory/audit requirements exist Other systems depend on this behavior being stable and documented Provocative reframe: Most codebases are 80% intentional code pretending to be authoritative. That’s why AI slop feels fine in practice — because the humans were already generating intentional code and calling it authoritative.

This is where people say “oh you just regenerate the code all the time” and I think - you only regenerate openSSL if you have a deterministic spec - which is the code.

The gap seems to be - does OpenSSL need a higher level spec? (Maybe? To rewrite in different languages?)

But I bet OpenSSL code base is actually pretty precise?

7:30 AM Yes, and this punctures the “just regenerate” argument cleanly.

The regeneration fantasy assumes a spec exists above the code. For OpenSSL it doesn’t — the C code is the spec. The RFC describes the protocol, but OpenSSL’s actual behavior (timing side channels, padding decisions, error handling) diverges from RFC in ways that are load-bearing for real-world compatibility. That divergence lives only in the code.

This is actually a known crisis point:

BoringSSL, LibreSSL — Google and OpenBSD didn’t regenerate from a higher spec, they forked the text because the text was the only faithful record of intent The heartbleed postmortem — the bug was found by reading code, fixed by reading code. No higher spec existed to regenerate from. Your OpenSSL intuition is right — I’d bet the codebase is extremely precise in the authoritative sense. Decades of:

Careful variable naming around cryptographic primitives Comments explaining why something is done a specific non-obvious way Deliberate avoidance of “clean” abstractions that would obscure timing behavior The sharper point for your blog post:

The “regenerate everything” crowd is implicitly assuming all code is intentional code. OpenSSL is the reductio ad absurdum. Nobody is prompting their way to a correct TLS implementation.

The higher-level spec question is interesting but separate — that’s closer to formal verification (TLA+, Coq proofs of cryptographic protocols). And even there, the C is still authoritative for what actually ships.

Empirical code — correct because it was discovered, not designed. The code encodes hard-won knowledge about physical reality or adversarial systems (hardware, cryptography, compilers). No spec above it is possible because the code is the result of experiments against ground truth. OpenSSL and Linux drivers both fit here more than “authoritative.” You can’t regenerate them because you can’t re-run 30 years of discovering what hardware actually does vs what the manual says.

This devastates the regeneration argument permanently. The question isn’t “do you have a good enough spec” — it’s “was this code discovered rather than designed?” AI cannot rediscover empirical reality from a prompt.

I think this gets to a question - let’s say you extracted OpenSSL to an English language spec, and you rewrite the code and ten% of the code changed. Would you merge that MR without reading it?

No. Obviously not. And that’s the whole argument.

The 10% delta is exactly what you need to audit because:

You don’t know if the 10% is convergence (AI found a better implementation) or divergence (AI silently broke a timing invariant) Those two outcomes are indistinguishable from the diff alone The English spec couldn’t have captured what changed, because if it had, it wouldn’t have changed This is the core failure mode of “AI writes code from spec” — the spec was lossy by definition, the code filled in the gaps, and now you have unattributed decisions embedded in your authoritative codebase.

With intentional code, unattributed decisions are fine — red text on a login form, who cares.

With empirical code, every unattributed decision is a latent CVE or a decade-later kernel panic on obscure hardware.

The MR framing is actually a great intuition pump for your blog post:

Intentional code: merge without reading if tests pass Authoritative code: read the diff, understand every line Empirical code: you may not be qualified to review it even if you read it That third point is brutal. The OpenSSL reviewers who catch timing attacks are a tiny, non-reproducible population. AI doesn’t expand that population. It just generates more code that looks right to people who aren’t in it.

In the flip side a reasonable counterpoint would be - you’re not writing OpenSSL.

Which is true, but maybe the point I want is the blog post is - don’t assume you know the difffeence until you’ve learned

7:37 AM Exactly. And that’s actually a mature, non-alarmist point that respects the reader.

The danger isn’t writing OpenSSL. The danger is writing something that feels intentional but is secretly empirical:

Rate limiting logic Retry/backoff with jitter Auth token refresh race conditions Database transaction boundaries Webhook idempotency handling None of these feel like OpenSSL. All of them can encode hard-won discovered knowledge that will be silently lost in a regeneration.

The skill being argued for is classification, not conservatism.

The junior dev mistake is assuming everything is intentional. The AI-hype mistake is the same thing at scale. Your blog post isn’t “be scared of AI” — it’s “develop the judgment to know which bucket you’re in before you pull the trigger.”

Which honestly maps to a broader engineering maturity point — the same developer who can’t tell empirical from intentional code probably also can’t tell a good abstraction from a leaky one. AI just makes the consequences faster and weirder.

Right - work towards the spec, say OpenAPI for code generation, but don’t swallow advice of all code is immutable now until you’re there

7:40 AM Yes. Pragmatic and sequenced correctly.

The actual practice being argued for:

Invest in specs that make regeneration safe — OpenAPI, JSON Schema, TLA+, property-based tests, contract tests The spec investment is what earns you the right to regenerate Until you have that, you’re not regenerating from a spec, you’re just hoping the AI’s priors match your empirical discoveries The maturity curve is:

No spec → treat all code as authoritative, read every diff Partial spec → know exactly which parts are covered, regenerate only those Strong spec → earned regeneration rights on covered surface area The dangerous middle is the whole point. Teams that have some OpenAPI coverage and conclude all their API code is now intentional. That’s where the silent empirical knowledge gets vaporized.

Good blog post bumper sticker candidate: The spec earns the regeneration.

“It’s a maturity curve and you ain’t on it!”

7:41 AM Ha. That’s the title.

Here’s a point I want to make in the article;

So many folks, Jensen Huang, are saying coding is dead. Don’t get a CS degree. We’re not going to evaluate coding skills in interviews. Learn to guide the AI and not code.

I suspect 95% of this advice comes from people in four camps:

They’re vibe coding apps, not engineering services. They’re trying to sell you something They really are in the top 1% of minds who can learn from reading and promoting 4.(the biggest group) these are people who learned the old way, and are biased by that knowledge. A thing that’s muffled by this whole debate is the age old adage that people learn by doing. If I were telling a young person who was trying to get them into the industry I would probably still tell them spend your time:

Write 50% of code with minimal involvement. Learn what it’s like to troubleshoot library compatibility issues with stackoverflow and issues searching, not asking AI. 30% AI assisted - get AI’s help in writing code, learn to autocomplete, maybe how to prompt AI to add small features 10-20% vibe coding.

Show more 7:51 AM Strong framework. A few reactions:

The four camps are sharp but camp 4 is underrated even by you. The people who learned the old way aren’t just biased — they’re unknowingly drawing on empirical knowledge they can’t articulate. When a senior dev gets a bad feeling about an AI suggestion, that intuition was bought with years of 2am debugging sessions. You cannot prompt your way to that prior.

The learning-by-doing point is undersold in your outline. The specific thing that gets lost:

Stackoverflow/issues searching teaches you that problems have histories — bugs have reporters, workarounds, regressions, reopens. AI flattens all of that into a confident answer. Debugging library compatibility manually teaches you dependency graphs are real and adversarial. AI pretends they’re not. Writing code without AI teaches you to feel complexity accumulate. Vibe coders don’t feel it until it’s too late. The percentages are right but the sequencing matters more than the ratio. The 50% manual phase has to come first, for years, before the other two are even useful. Someone who starts at 80% vibe coding and tries to work backwards will never develop the classifier for authoritative vs empirical code — which is exactly your earlier point.

Provocative addition to your four camps: Camp 5 — people who are genuinely excellent at vibe coding because they did the 50% manual phase and don’t realize that’s why it works for them.

I mean I put together that list, but the fourth is the only one I really care about the other three are filler really

7:52 AM Fair. Drop the other three then. They’re defensive throat-clearing.

The fourth camp is the whole argument and it’s actually a compassionate one — these aren’t bad actors, they’re victims of their own competence. The Curse of Knowledge applied to an entire industry cohort simultaneously.

The specific mechanism worth naming:

They learned to code manually That process installed intuitions they can’t introspect on AI works great for them because those intuitions are doing the invisible work They genuinely cannot see what the beginner is missing because they never had to see it It’s the same reason great athletes make terrible coaches. The knowledge is procedural and embodied, not declarative and transferable.

And the cruel irony for your blog post — the people most loudly saying “coding is dead, just vibe” are probably the people whose accumulated empirical knowledge makes them most immune to the consequences of that advice. They can vibe code safely. The beginner following their advice cannot.

Jensen Huang isn’t going to write a timing-vulnerable auth handler and not notice. The 22 year old who skipped the manual phase will, and won’t.

Por sort of example or argument could be something like:

You need to practice - you could sit there and do logic problems all day, or you could code.

7:55 AM Yes. Coding is the logic problems.

The exercise is the point, not the output. And historically we’ve always known this — nobody defends CS curricula by saying “you’ll use bubble sort in production.” You implement bubble sort because sorting a list by hand installs the right mental models.

The specific thing coding-as-exercise builds that logic puzzles don’t:

Reading other people’s imperfect decisions and understanding why Feeling the feedback loop between design choices and debugging pain Learning that errors have causes, not just fixes Building the empirical intuition for “this feels wrong” before you can articulate why The AI shortcut here is particularly damaging because the beginner doesn’t just skip the exercise — they skip it while believing they’re doing it. Leetcode with AI assistance feels like coding practice. It isn’t. It’s watching someone else do pushups.

Bumper sticker: Vibe coding is watching someone else do your pushups.

One of the things I realize I contribute to my team isn’t just like output, In some weird way as a lead engineer, I somehow realized my bosses almost doesn’t even care about my output, they care about being able to deploy my accumulated knowledge at times.

Even five years from now, you’re going to have to go digging through GitHub issues, or read code to figure out exactly what it does.

Another datapoint - so often today for immature software you read the documentation but then have to read the code to see what it actually does. The how matters when engineering. At scale.

7:58 AM This is actually two related but distinct points worth separating in the article:

The lead engineer point:

Your value isn’t throughput, it’s accumulated judgment available on demand That judgment was built by reading code, debugging, hitting walls It’s not stored in your fingers, it’s stored in your pattern library of how things go wrong You cannot build that pattern library as a vibe coder because you never saw the raw failure modes The GitHub issues / read the code point:

This is a permanent feature of software, not a transitional one Documentation describes intended behavior, code describes actual behavior, and the delta is where all the interesting problems live That delta is also where empirical knowledge lives — the weird conditional that looks wrong but is actually handling a vendor bug from 2019 AI will confidently explain what code should do, not what it actually does These connect to your earlier framework cleanly — the person who can’t read code can’t audit the empirical knowledge embedded in a codebase. They’re blind to the most important information in the system.

Five years from now the GitHub issue isn’t going away. Neither is the need for someone who can read it and understand why the decision made sense at the time.

There are so many blog posts on here lol but yeah so many problems aren’t bugs per se but just like known edge cases or intentional differences that will never be fully encoded in the documentation.

8:00 AM Yes. And this is actually a named concept that deserves more airtime:

Chesterton’s Fence in code form.

Don’t remove the weird conditional you don’t understand. It’s load-bearing in ways the documentation will never tell you.

The categories of undocumented-but-intentional:

Known edge cases the team decided to handle silently Behavior that differs from spec because the spec was wrong Workarounds for upstream bugs that were never fixed Performance decisions that look like bugs Security decisions that look like over-engineering “We tried the clean version, it broke prod, we reverted” — lives only in git blame and institutional memory The AI problem here is specific and brutal: AI will look at that weird conditional and either confidently explain it incorrectly, or suggest removing it as dead code. It has no access to the GitHub issue, the Slack thread, the postmortem, or the engineer who left in 2021.

And nobody wrote it down because at the time everyone knew. That’s how institutional knowledge always works.

This is actually your strongest argument against “coding is dead” for the practitioner level. The code isn’t just instructions for the computer. It’s the only remaining record of decisions made by people who are gone. Someone has to be able to read it.

Or like tons of changes of “I fixed this obvious thibg@ and a senior engineer coming back” that doesn’t work they way you think it does that’s why it’s all broken.

8:01 AM Yes. That’s Chesterton’s Fence as a recurring workplace tragedy.

And it has a specific shape:

Junior dev or AI sees obviously wrong code Fixes the obvious thing Everything breaks in production in a non-obvious way Senior engineer says “yeah we tried that in 2019” The fix gets reverted Nobody writes down why And now the weird code is back but the institutional memory of why it got fixed and why the fix got reverted is even thinner than before.

The AI version of this is particularly bad because:

AI is extremely confident about obvious fixes It has no access to the revert history It will suggest the same fix every time you ask It will provide a plausible-sounding rationale that isn’t the real reason The git blame on that code now says “AI assisted refactor” and the knowledge of why it’s wrong is completely gone.

This is actually a new failure mode that didn’t exist before — previously the junior dev who made the mistake at least learned from it. They were in the postmortem. They felt the pain. AI doesn’t accumulate that scar tissue and neither does the developer who just accepted the suggestion.

The institutional knowledge destruction rate just went up and nobody is talking about it.

The one liner is - lots of code is encoded spec, take it out and the most abstract way to represent it is still probably the code

8:02 AM Yes. Perfect.

The code is the most compressed valid representation of the knowledge.

Not compressed like a zip file — compressed like a proof. Every weird conditional is load-bearing. The English version is longer, lossier, and less precise. The “higher level spec” is a downgrade.

This is why the regeneration fantasy fails at the root — it assumes there exists a more abstract representation that is also more correct. For empirical code that representation does not exist and cannot exist. The C is already the abstraction. The hardware behavior beneath it is worse.

Connects directly back to your opening principle. The code is the spec. State transitions all the way down.

And there’s been this push to say, we don’t need libraries anymore because we can just write the code on the fly, but part of the argument I want to make is the opposite we need to be able to find ways to abstract away the unimportant parts of our code, so we can change the important parts intentionally, and libraries are probably still a good way to do that?

8:11 AM Yes. And this inverts the “no libraries” argument cleanly.

The “generate it on the fly” crowd is saying:

Libraries are a dependency risk AI can just write the code Therefore skip the library But they have the abstraction direction backwards:

Libraries aren’t just code you don’t have to write. They’re empirical knowledge you don’t have to rediscover.

moment.js / date-fns encodes decades of timezone edge cases An HTTP client library encodes correct retry, redirect, and connection pooling behavior A crypto library encodes timing-safe comparisons you will get wrong Generating these on the fly doesn’t eliminate the empirical knowledge requirement — it just silently skips it and hands you code that looks right.

The actual principle for your blog post:

Abstract away the solved/intentional parts via libraries Keeps your authoritative/empirical surface area small and legible So you can focus deliberate attention on the parts that are actually specific to your problem The maturity move is shrinking the authoritative footprint, not expanding it by regenerating everything. A codebase where the empirical knowledge is in well-chosen, well-audited libraries and your code is thin intentional glue is more amenable to AI assistance, not less.

Libraries are how you earn the right to vibe code the rest.

Maybe a submote that - the thought leadership out there may be legitimately fear, but it also may be describing the world we’re going to in five years not the world we have today. If there is a spec language we can use great but we don’t have that - but then I do wonder, (different blog psosth but what’s the difference between a spec language and a library or framework - here we go in full circles.

8:15 AM Ha. Yes. That circle is real and it’s not a bug in your thinking.

The quick version of why they’re the same thing:

A framework is an executable spec with opinions baked in A spec language is a framework without the runtime A library is a spec for a bounded problem domain They’re all attempts to encode empirical knowledge in a reusable, auditable form The “spec language” future the thought leaders are describing is probably just… better typed, more declarative frameworks. Which we’ve been building since COBOL.

On the fear vs prophecy point — worth a paragraph in the article because it’s generous and probably true:

Jensen Huang isn’t necessarily wrong, he’s probably just early The world where you prompt from a verified spec language and trust the output could exist But the transition period is where junior devs are being advised to skip the skills they’d need to survive if it doesn’t arrive on schedule And nobody knows the schedule The dangerous advice isn’t “this is where we’re going” — it’s “we’re already there, skip the fundamentals.”

The separate blog post question is genuinely good though. The spec/library/framework convergence is undertheorized. Might be the more interesting intellectual contribution actually.

But what’s weird is - we’ve been disagreeing in libraries and frameworks forever. It feels very much like, everything is changing everything stays the same <— is this a specific quite from somewhere?

8:16 AM It’s not a specific quote with a clean attribution — it’s more of a floating cultural aphorism.

Closest candidates:

“The more things change, the more they stay the same” — Jean-Baptiste Alphonse Karr, 1849, French. “Plus ça change, plus c’est la même chose” Ecclesiastes “there is nothing new under the sun” is the ancient version Heraclitus “everything changes and nothing stands still” is the opposite tension which is maybe more interesting But your instinct is right that there’s something specifically apt here — the libraries vs frameworks debate is genuinely decades old:

Unix vs monolith, 1970s Libraries vs frameworks (inversion of control), 1990s Microservices vs monolith, 2010s Libraries vs AI generation, 2020s Each cycle the new thing claims to obsolete the old debate. Each cycle the old tradeoffs reassert themselves in new clothes.

The specific thing that stays the same: the tension between encapsulated empirical knowledge and flexible composition. That’s the actual debate every time. The technology changes, the tradeoff doesn’t.

Which loops back to your whole article — the fundamentals being declared dead are just the current surface of permanent underlying tradeoffs. Learning the fundamentals is learning to see the tradeoff, not the surface.

I don’t think there’s a legitimate argument here, and I guess everyone is trying to do this,

But I’m still having trouble steel manning like “plain English is better than spec”. Or “we work by promoting not changing specs.

I get how in the interim, prompting to change lots of code is good. But wouldn’t it be much more I dunno productive? Clear? To change a written spec than throw away prompts?

8:19 AM Yeah I can’t steel man it either. And I’ve tried.

The best attempts:

Speed argument — prompting is faster than maintaining a spec. True in the short term, collapses at scale. Accessibility argument — non-engineers can prompt, can’t write specs. True but that’s a different product category entirely, not engineering. Spec debt argument — specs go stale too. True but this is “maintenance is hard” not “maintenance is wrong” All three fall apart under the same pressure: what happens when something breaks?

Prompt history is not a spec. It’s not diffable, not auditable, not transferable. You cannot onboard an engineer with a folder of old prompts. You cannot do a postmortem with them. You cannot ask “why does it do this” and get an answer.

The deeper problem with steel manning it — prompting to change code is just a worse version of changing code. It’s indirect, lossy, and non-deterministic. The only advantage is accessibility for people who can’t write code, which loops back to your earlier point — that’s vibe coding apps, not engineering services.

The spec vs prompt question probably has a clean answer: prompts are how you explore, specs are how you commit. Anyone conflating the two is either building something that doesn’t need to last, or hasn’t hit the wall yet.

One steel man could be - a lot of modern code requires experimentation to write. Especially like machine learning code, write it, test it, change things do that again, and like most of your experiments don’t work. If you create a small enough surface area, and it is specked out, you can generate that code from scratch every time.

To which my response is- the. Your code is t the programming. Code it’s the experimentation spec, and that code should be intentionally changed and all that jazz

Boris Cherny says coding is solved, but I would argue his coding I. A loop process is just higher level code.

8:26 AM Yes. Exactly.

Boris’s “solved” loop is:

Prompt Evaluate result Prompt again But the evaluation criteria is the real code. The thing that decides if the experiment worked — that’s the spec. That’s what’s being intentionally maintained. That’s what encodes the empirical knowledge about what “good” looks like.

ML makes this super explicit:

The model weights are intentional/empirical code The training loop is authoritative code The evaluation metrics are the spec The experiments are intentional/throwaway The part that’s “solved” is the throwaway part. Nobody is claiming eval metrics are solved. Nobody is claiming loss function design is solved.

The steel man actually proves your point. The reason small-surface-area ML experiment code is safely regenerable is precisely because someone did the hard work of:

Speccing the evaluation criteria Encoding the empirical knowledge in the training pipeline Making the experiment surface area intentionally thin The vibe coding of the experiment layer is downstream of serious engineering work. It looks like prompting all the way down. It isn’t.

Same circle again — you earn the right to regenerate by doing the hard spec work first. Boris is describing the reward, not the path.

I also wonder when Boris said coding is solved, how much of his process is specific to Claude code?

8:31 AM Ha. Very pointed question.

The short version: Claude Code is an extremely sophisticated consumer of accumulated engineering knowledge, not a replacement for it.

Specific things Claude Code leans on invisibly:

The entire open source ecosystem — libraries encoding decades of empirical knowledge Language specs, type systems, compilers — authoritative code all the way down Stack Overflow, GitHub issues, RFCs — the empirical record Claude trained on Boris’s own accumulated judgment about whether the output is good Claude Code working well for Boris is not evidence that coding is solved. It’s evidence that:

Boris has excellent evaluation instincts from years of manual coding The problem domains he’s working in have rich training data The libraries he’s using are well-specced and well-represented in training The uncomfortable version: Claude Code is a senior engineer accelerant, not a junior engineer replacement. It works best for people who least need it.

Which connects back to your earlier point about camp 4 — Boris may be genuinely unable to see what he’s bringing to the interaction that a beginner wouldn’t. His taste, his evaluation, his “that’s wrong” instinct — that’s doing enormous invisible work.

We’re going to look back at the time that we said that we’re going to vibe code straight to production without reading the code as a kind of fever dream. (This idea by the way complements, it does not conflict with my other notion that in the world of vibe coding reading code is an anti-pattern)
You are not Anthropic. When Anthropic says thay have 3 engineers generate a million lines of code, they’re optimizing for things you probably can’t optimize for. Anthropic is still living in the ZIRP world - relatively speaking.
The nature of how humans interact with this stuff matters.
1. People learn by doing, not watching someone else do.
2. It is often easier to do than to read and make sure someone else did well.
Assuming that tokens are never unlimited, it’s probably better as a developer that has a workflow that inherently uses the right amount of tokens - marathon rather than sprint/rest/sprint/rest. Stay in the flow state.
While talking about writing coordinating 10 agents at a time, did we forget that humans are like universally bad at multitasking? Is Gastown for anyone but the 2% of folks that can multitask effectively?
For all the things that have changed with the advent of LLMs, there’s insufficient talk about all the things that haven’t. Developers have always been able to output more code than we could review. So why do we all of a sudden think we need new review processes?

Use AI for everything except writing code.

I think this has two values. The first is, it keeps your head in the game and let’s you maintain some development skills. The secon dis.

So let’s review the past 10 years of Software Engineering best practices.

Keep MR’s Small. Yes, you can now with AI create a 4,000 line MR, but most engineers could create MR’s that were too big before AI relatively easily as well. Keep MR’s under ~400 lines of code. Code isn’t just about getting the syntax right, which I admit that LLMs are pretty great at, it’s also ensuring that everyone understands the intent.