My AI Adoption Journey

(mitchellh.com)

388 points | by anurag 8 hours ago

35 comments

  • noosphr 19 minutes ago
    I've been building systems like what the OP is using since pgt3 came out.

    This is the honeymoon phase. You're learning the ins and outs of the specific model you're using and becoming more productive. It's magical. Nothing can stop you. Then you might not be improving as fast as you did at the start, but things are getting better every day. Or maybe every week. But it's heaps better than doing it by hand because you have so much mental capacity left.

    Then a new release comes up. An arbitrary fraction of your hard earned intuition is not only useless but actively harmful to getting good results with the new models. Worse you will never know which part it is without unlearning everything you learned and starting over again.

    I've had to learn the quirks of three generations of frontier families now. It's not worth the hassle. I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months. Copy and paste is the universal interface and being able to do surgery on the chat history is still better than whatever tooling is out there.

    Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.

    • tudelo 3 minutes ago
      First off, appreciate you sharing your perspective. I just have a few questions.

      > I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months.

      Can you expand more on what you mean by that? I'm a bit of a noob on llm enabled dev work. Do you mean that you will kick off new sessions and provide a context that you manage yourself instead of relying on a longer running session to keep relevant information?

      > Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.

      I appreciate your insight but I'm failing to understand how exactly knowing these tools increases performance of llms. Is it because you can more precisely direct them via prompts?

  • libraryofbabel 5 hours ago
    This is such a lovely balanced thoughtful refreshingly hype-free post to read. 2025 really was the year when things shifted and many first-rate developers (often previously AI skeptics, as Mitchell was) found the tools had actually got good enough that they could incorporate AI agents into their workflows.

    It's a shame that AI coding tools have become such a polarizing issue among developers. I understand the reasons, but I wish there had been a smoother path to this future. The early LLMs like GPT-3 could sort of code enough for it to look like there was a lot of potential, and so there was a lot of hype to drum up investment and a lot of promises made that weren't really viable with the tech as it was then. This created a large number of AI skeptics (of whom I was one, for a while) and a whole bunch of cynicism and suspicion and resistance amongst a large swathe of developers. But could it have been different? It seems a lot of transformative new tech is fated to evolve this way. Early aircraft were extremely unreliable and dangerous and not yet worthy of the promises being made about them, but eventually with enough evolution and lessons learned we got the Douglas DC-3, and then in the end the 747.

    If you're a developer who still doesn't believe that AI tools are useful, I would recommend you go read Mitchell's post, and give Claude Code a trial run like he did. Try and forget about the annoying hype and the vibe-coding influencers and the noise and just treat it like any new tool you might put through its paces. There are many important conversations about AI to be had, it has plenty of downsides, but a proper discussion begins with close engagement with the tools.

    • keyle 3 hours ago
      Architects went from drawing everything on paper, to using CAD products over a generation. That's a lot of years! They're still called architects.

      Our tooling just had a refresh in less than 3 years and it leaves heads spinning. People are confused, fighting for or against it. Torn even between 2025 to 2026. I know I was.

      People need a way to describe it from 'agentic coding' to 'vibe coding' to 'modern AI assisted stack'.

      We don't call architects 'vibe architects' even though they copy-paste 4/5th of your next house and use a library of things in their work!

      We don't call builders 'vibe builders' for using earth-moving machines instead of a shovel...

      When was the last time you reviewed the machine code produced by a compiler? ...

      The real issue this industry is facing, is the phenomenal speed of change. But what are we really doing? That's right, programming.

      • AlotOfReading 1 hour ago
        Don't take this as criticizing LLMs as a whole, but architects also don't call themselves engineers. Engineers are an entirely distinct set of roles that among other things validate the plan in its totality, not only the "new" 1/5th. Our job spans both of these.

        "Architect" is actually a whole career progression of people with different responsibilities. The bottom rung used to be the draftsmen, people usually without formal education who did the actual drawing. Then you had the juniors, mid-levels, seniors, principals, and partners who each oversaw different aspects. The architects with their name on the building were already issuing high level guidance before the transition instead of doing their own drawings.

            When was the last time you reviewed the machine code produced by a compiler?
        
        Last week, to sanity check some code written by an LLM.
        • throwup238 57 minutes ago
          > Engineers are an entirely distinct set of roles that among other things validate the plan in its totality, not only the "new" 1/5th. Our job spans both of these.

          Where this analogy breaks down is that the work you’re describing is done by Professional Engineers that have strict licensing and are (criminally) liable for the end result of the plans they approve.

          That is an entirely different role from the army of civil, mechanical, and electrical engineers (some who are PEs and some who are not) who do most of the work for the principal engineer/designated engineer/engineer of record, that have to trust building codes and tools like FEA/FEM that then get final approval from the most senior PE. I don’t think the analogy works, as software engineers rarely report to that kind of hierarchy. Architects of Record on construction projects are usually licensed with their own licensing organization too, with layers of licensed and unlicensed people working for them.

          • AlotOfReading 32 minutes ago
            That diversity of roles is what "among other things" was meant to convey. My job at least isn't terribly different, except that licensing doesn't exist and I don't get an actual stamp. My company (and possibly me depending on the facts of the situation) is simply liable if I do something egregious that results in someone being hurt.
          • blibble 25 minutes ago
            > Where this analogy breaks down is that the work you’re describing is done by Professional Engineers that have strict licensing and are (criminally) liable for the end result of the plans they approve.

            there are plenty of software engineers that work in regulated industries, with individual licensing, criminal liability, and the ability to be struck off and banned from the industry by the regulator

            ... such as myself

      • allworms 1 hour ago
        > We don't call architects 'vibe architects' even though they copy-paste 4/5th of your next house and use a library of things in their work!

        > We don't call builders 'vibe builders' for using earth-moving machines instead of a shovel...

        > When was the last time you reviewed the machine code produced by a compiler?

        Sure, because those are categorically different. You are describing shortcuts of two classes: boilerplate (library of things) and (deterministic/intentional) automation. Vibe coding doesn't use either of those things. The LLM agents involved might use them, but the vibe coder doesn't.

        Vibe coding is delegation, which is a completely different class of shortcut or "tool" use. If an architect delegates all their work to interns, directs outcomes based on whims not principals, and doesn't actually know what the interns are delivering, yeah, I think it would be fair to call them a vibe architect.

        We didn't have that term before, so we usually just call those people "arrogant pricks" or "terrible bosses". I'm not super familiar but I feel like Steve Jobs was pretty famously that way - thus if he was an engineer, he was a vibe engineer. But don't let this last point detract from the message, which is that you're describing things which are not really even similar to vibe coding.

    • datsci_est_2015 1 hour ago
      I skimmed over it, and didn’t find any discussion of:

        - Pull requests
        - Merge requests
        - Code review
      
      I feel like I’m taking crazy pills. Are SWE supposed to move away from code review, one of the core activities for the profession? Code review is as fundamental for SWE as double entry is for accounting.

      Yes, we know that functional code can get generated at incredible speeds. Yes, we know that apps and what not can be bootstrapped from nothing by “agentic coding”.

      We need to read this code, right? How can I deliver code to my company without security and reliability guarantees that, at their core, come from me knowing what I’m delivering line-by-line?

      • AloysB 22 minutes ago
        Give it a read, he mentions briefly how he uses for PR triages and resolving GH issues.

        He doesn't go in details, but there is a bit:

        > Issue and PR triage/review. Agents are good at using gh (GitHub CLI), so I manually scripted a quick way to spin up a bunch in parallel to triage issues. I would NOT allow agents to respond, I just wanted reports the next day to try to guide me towards high value or low effort tasks.

        > More specifically, I would start each day by taking the results of my prior night's triage agents, filter them manually to find the issues that an agent will almost certainly solve well, and then keep them going in the background (one at a time, not in parallel).

        This is a short excerpt, this article is worth reading. Very grounded and balanced.

      • bthornbury 1 hour ago
        Either really comprehensive tests (that you read) or read it. Usually i find you can skim most of it, but like in core sections like billing or something you gotta really review it. The models still make mistakes.
      • Quarrelsome 1 hour ago
        we're talking about _this_ post? He specifically said he only runs one agent, so sure he probably reviews the code or as he stated finds means of auto-verifying what the agent does (giving the agent a way to self-verify as part of its loop).
      • tptacek 1 hour ago
        So read the code.
        • datsci_est_2015 1 hour ago
          Cool, code review continues to be one of the biggest bottlenecks in our org, with or without agentic AI pumping out 1k LOC per hour.
          • tptacek 1 hour ago
            Ok? You still have to read the code.
        • IhateAI 1 hour ago
          I didn't get into creating software so I could read plagiarism laundering machines output. Sorry, miss me with these takes. I love using my keyboard, and my brain.
          • acessoproibido 31 minutes ago
            So then type the code as well and read it after. Why are you mad
      • codyb 58 minutes ago
        I think this is the crux of why, when used as an enhancement to solo productivity, you'll have a pretty strict upper bound on productivity gains given that it takes experienced engineers to review code that goes out at scale.

        That being said, software quality seems to be decreasing, or maybe it's just cause I use a lot of software in a somewhat locked down state with adblockers and the rest.

        Although, that wouldn't explain just how badly they've murdered the once lovely iTunes (now Apple Music) user interface. (And why does CMD-C not pick up anything 15% of the time I use it lately...)

        Anyways, digressions aside... the complexity in software development is generally in the organizational side. You have actual users, and then you have people who talk to those users and try to see what they like and don't like in order to distill that into product requirements which then have to be architected, and coordinated (both huge time sinks) across several teams.

        Even if you cut out 100% of the development time, you'd still be left with 80% of the timeline.

        Over time though... you'll probably see people doing what I do all day (which is move around among many repositories (although I've yet to use the AI much, got my Cursor license recently and am gonna spin up some POCs that I want to see soon)), enabled by their use of AI to quickly grasp what's happening in the repo, and the appropriate places to make changes.

        Enabling developers to complete features from tip to tail across deep, many pronged service architectures would could bring project time down drastically and bring project management, and cross team coordination costs down tremendously.

        Similarly, in big companies, the hand is often barely aware at best of the foot. And space exploration is a serious challenge. Often folk know exactly one step away, and rely on well established async communication channels which also only know one step further. Principal engineers seem to know large amounts about finite spaces and are often in the dark small hops away to things like the internal tooling for the systems they're maintaining (and often not particularly great at coming in to new spaces and thinking with the same perspective... no we don't need individual micro services for every 12 request a month admin api group we want to set up).

        Once systems can take a feature proposal and lay out concrete plans which each little kingdom can give a thumbs up or thumbs down to for further modifications, you can again reduce exploration, coordination, and architecture time down.

        Sadly, seems like User Experience design is an often terribly neglected part of our profession. I love the memes about an engineer building the perfect interface like a water pitcher only for the person to position it weirdly in order to get a pour out of the fill hole or something. Lemme guess how many users you actually talked to (often zero), and how many layers of distillation occurred before you received a micro picture feature request that ends up being build and taking input from engineers with no macro understanding of a user's actual needs, or day to day.

        And who often are much more interested in perfecting some little algorithm thank thinking about enabling others.

        So my money is on money flowing to... - People who can actually verify system integrity, and can fight fires and bugs (but a lot of bug fixing will eventually becoming prompting?) - Multi-talented individuals who can say... interact with users well enough to understand their needs as well as do a decent job verifying system architecture and security

        It's outside of coding where I haven't seen much... I guess people use it to more quickly scaffold up expense reports, or generate mocks. So, lots of white collar stuff. But... it's not like the experience of shopping at the supermarket has changed, or going to the movies, or much of anything else.

    • majormajor 2 hours ago
      GPT-4 showed the potential but the automated workflows (context management, loops, test-running) and pure execution speed to handle all that "reasoning"/workflows (remember watching characters pop in slowly in GPT-4 streaming API response calls) are gamechangers.

      The workflow automation and better (and model-directed) context management are all obvious in retrospect but a lot of people (like myself) were instead focused on IDE integration and such vs `grep` and the like. Maybe multi-agent with task boards is the next thing, but it feels like that might also start to outrun the ability to sensibly design and test new features for non-greenfield/non-port projects. Who knows yet.

      I think it's still very valuable for someone to dig in to the underlying models periodically (insomuch as the APIs even expose the same level of raw stuff anymore) to get a feeling for what's reliable to one-shot vs what's easily correctable by a "ran the tests, saw it was wrong, fixed it" loop. If you don't have a good sense of that, it's easy to get overambitious and end up with something you don't like if you're the sort of person who cares at all about what the code looks like.

    • beoberha 4 hours ago
      Your sentiment resonates with me a lot. I wonder what we’ll consider the inflection point 10 years from now. It seemed like the zeitgeist was screaming about scaling limits and running out of training data, then we got Claude code, sonnet 4.5, then Opus 4.5 and no ones looked back since.
      • libraryofbabel 3 hours ago
        I wonder too. It might be that progress on the underlying models is going to plateau, or it might be that we haven't yet reached what in retrospect will be the biggest inflection point. Technological developments can seem to make sense in hindsight as a story of continuous progress when the dust has settled and we can write and tell the history, but when you go back and look at the full range of voices in the historical sources you realize just how deeply nothing was clear to anyone at all at the time it was happening because everyone was hurtling into the unknown future with a fog of war in front of them. In 1910 I'd say it would have been perfectly reasonable to predict airplanes would remain a terrifying curiosity reserved for daredevils only (and people did); or conversely, in the 1960s a lot of commentators thought that the future of passenger air travel in the 70s and 80s would be supersonic jets. I keep this in mind and don't really pay too much attention to over-confident predictions about the technological future.
    • chrysoprace 1 hour ago
      I think for a lot of people the turn off is the constant churn and the hype cycle. For a lot of people, they just want to get things done and not have to constantly keep on top of what's new or SOTA. Are we still using MCPs or are we using Skills now? Not long ago you had to know MCP or you'd be left behind and you definitely need to know MCP UI or you'll be left behind. I think. It just becomes really tiring, especially with all the FUD.

      I'm embracing LLMs but I think I've had to just pick a happy medium and stick with Claude Code with MCPs until somebody figures out a legitimate way to use the Claude subscription with open source tools like OpenCode, then I'll move over to that. Or if a company provides a model that's as good value that can be used with OpenCode.

    • Lich 3 hours ago
      Is there any reason to use Claude Code specifically over Codex or Gemini? I’ve found the both Codex and Gemini similar in results, but I never tried Claude because of I keep hearing usage runs out so fast on pro plans and there’s no free trial for the CLI.
      • libraryofbabel 3 hours ago
        I mostly mentioned Claude Code because it's what Mitchell first tried according to his post, and it's what I personally use. From what I hear Codex is pretty comparable; it has a lot of fans. There are definitely some differences and strengths and weaknesses of both the CLIs and the underlying LLMs that others who use more than one tool might want to weight in on, but they're all fairly comparable. (Although, we'll see how the new models released from Anthropic and OpenAI today stack up.) Codex and Gemini CLI are basically Claude Code clones with different LLMs behind them, after all.
      • majormajor 2 hours ago
        IME Gemini is pretty slow in comparison to Claude - but hey, it's super cheap at least.

        But that speed makes a pretty significant difference in experience.

        If you wait a couple minutes and then give the model a bunch of feedback about what you want done differently, and then have to wait again, it gets annoying fast.

        If the feedback loop is much tighter things feel much more engaging. Cursor is also good at this (investigate and plan using slower/pricier models, implement using fast+cheap ones).

    • arcxi 4 hours ago
      but annoying hype is exactly the issue with AI in my eyes. I get it's a useful tool in moderation and all, but I also experience that management values speed and quantity of delivery above all else, and hype-driven as they are I fear they will run this industry to the ground and we as users and customers will have to deal with the world where software is permanently broken as a giant pile of unmaintainable vibe code and no experienced junior developers to boot.
      • acessoproibido 29 minutes ago
        >management values speed and quantity of delivery above all else

        I don't know about you but this has been the case for my entire career. Mgmt never gave a shit about beautiful code or tech debt or maintainability or how enlightened I felt writing code.

    • whatifnomoney 5 hours ago
      [dead]
  • mjr00 7 hours ago
    > Break down sessions into separate clear, actionable tasks. Don't try to "draw the owl" in one mega session.

    This is the key one I think. At one extreme you can tell an agent "write a for loop that iterates over the variable `numbers` and computes the sum" and they'll do this successfully, but the scope is so small there's not much point in using an LLM. On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.

    A lot of successful LLM adoption for code is finding this sweet spot. Overly specific instructions don't make you feel productive, and overly broad instructions you end up redoing too much of the work.

    • sho_hn 6 hours ago
      This is actually an aspect of using AI tools I really enjoy: Forming an educated intuition about what the tool is good at, and tastefully framing and scoping the tasks I give it to get better results.

      It cognitively feels very similar to other classic programming activities, like modularization at any level from architecture to code units/functions, thoughtfully choosing how to lay out and chunk things. It's always been one of the things that make programming pleasurable for me, and some of that feeling returns when slicing up tasks for agents.

      • bandrami 35 minutes ago
        "Become better at intuiting the behavior of this non-deterministic black box oracle maintained by a third party" just isn't a strong professional development sell for me, personally. If the future of writing software is chasing what a model trainer has done with no ability to actually change that myself I don't think that's going to be interesting to nearly as many people.
      • allenu 6 hours ago
        I agree that framing and scoping tasks is becoming a real joy. The great thing about this strategy is there's a point at which you can scope something small enough that it's hard for the AI to get it wrong and it's easy enough for you as a human to comprehend what it's done and verify that it's correct.

        I'm starting to think of projects now as a tree structure where the overall architecture of the system is the main trunk and from there you have the sub-modules, and eventually you get to implementations of functions and classes. The goal of the human in working with the coding agent is to have full editorial control of the main trunk and main sub-modules and delegate as much of the smaller branches as possible.

        Sometimes you're still working out the higher-level architecture, too, and you can use the agent to prototype the smaller bits and pieces which will inform the decisions you make about how the higher-level stuff should operate.

        • audience_mem 4 hours ago
          [Edit: I may have been replying to another comment in my head as now I re-read it and I'm not sure I've said the same thing as you have. Oh well.]

          I agree. This is how I see it too. It's more like a shortcut to an end result that's very similar (or much better) than I would've reached through typing it myself.

          The other day I did realise that I'm using my experience to steer it away from bad decisions a lot more than I noticed. It feels like it does all the real work, but I have to remember it's my/our (decades of) experience writing code playing a part also.

          I'm genuinely confused when people come in at this point and say that it's impossible to do this and produce good output and end results.

      • meowface 4 hours ago
        I feel the same, but, also, within like three years this might look very different. Maybe you'll give the full end-to-end goal upfront and it just polls you when it needs clarification or wants to suggest alternatives, and it self-manages cleanly self-delegating.

        Or maybe something quite different but where these early era agentic tooling strategies still become either unneeded or even actively detrimental.

        • zxor 4 hours ago
          > it just polls you when it needs clarification

          I think anyone who has worked on a serious software project would say, this means it would be polling you constantly.

          Even if we posit that an LLM is equivalent to a human, humans constantly clarify requirements/architecture. IMO on both of those fronts the correct path often reveals itself over time, rather than being knowable from the start.

          So in this scenario it seems like you'd be dealing with constant pings and need to really make sure you're understanding of the project is growing with the LLM's development efforts as well.

          To me this seems like the best-case of the current technology, the models have been getting better and better at doing what you tell it in small chunks but you still need to be deciding what it should be doing. These chunks don't feel as though they're getting bigger unless you're willing to accept slop.

    • mapontosevenths 1 hour ago
      > Break down sessions into separate clear, actionable tasks.

      What this misses, of course, is that you can just have the agent do this too. Agent's are great at making project plans, especially if you give them a template to follow.

    • iamacyborg 5 hours ago
      > On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.

      Amusingly, this was my experience in giving Lovable a shot. The onboarding process was literally just setting me up for failure by asking me to describe the detailed app I was attempting to build.

      Taking it piece by piece in Claude Code has been significantly more successful.

    • jedbrooke 6 hours ago
      so many times I catch myself asking a coding agent e.g “please print the output” and it will update the file with “print (output)”.

      Maybe there’s something about not having to context switch between natural language and code just makes it _feel_ easier sometimes

    • kcorbitt 4 hours ago
      And lately, the sweet spot has been moving upwards every 6-8 weeks with the model release cycle.
    • apercu 5 hours ago
      I actually enjoy writing specifications. So much so that I made it a large part of my consulting work for a huge part of my career. SO it makes sense that working with Gen-AI that way is enjoyable for me.

      The more detailed I am in breaking down chunks, the easier it is for me to verify and the more likely I am going to get output that isn't 30% wrong.

    • oulipo2 5 hours ago
      Exactly. The LLMs are quite good at "code inpainting", eg "give me the outline/constraints/rules and I'll fill-in the blanks"

      But not so good at making (robust) new features out of the blue

  • EastLondonCoder 6 hours ago
    This matches my experience, especially "don’t draw the owl" and the harness-engineering idea.

    The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).

    What ended up working for me was treating chat as where I shape the plan (tradeoffs, invariants, failure modes) and treating the agent as something that does narrow, reviewable diffs against that plan. The human job stays very boring: run it, verify it, and decide what’s actually acceptable. That separation is what made it click for me.

    Once I got that loop stable, it stopped being a toy and started being a lever. I’ve shipped real features this way across a few projects (a git like tool for heavy media projects, a ticketing/payment flow with real users, a local-first genealogy tool, and a small CMS/publishing pipeline). The common thread is the same: small diffs, fast verification, and continuously tightening the harness so the agent can’t drift unnoticed.

    • protocolture 2 hours ago
      >The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).

      Yeah I would get patterns where, initial prototypes were promising, then we developed something that was 90% close to design goals, and then as we try to push in the last 10%, drift would start breaking down, or even just forgetting, the 90%.

      So I would start getting to 90% and basically starting a new project with that as the baseline to add to.

    • ricardobeat 3 hours ago
      No harm meant, but your writing is very reminiscent of an LLM. It is great actually, there is just something about it - "it wasn't.. it was", "it stopped being.. and started". Claude and ChatGPT seem to love these juxtapositions. The triplets on every other sentence. I think you are a couple em-dashes away from being accused of being a bot.

      These patterns seem to be picking up speed in the general population; makes the human race seem quite easily hackable.

      • pixl97 1 hour ago
        >makes the human race seem quite easily hackable.

        If the human race were not hackable then society would not exist, we'd be the unchanging crocodiles of the last few hundred million years.

        Have you ever found yourself speaking a meme? Had a catchy toon repeating in your head? Started spouting nation state level propaganda? Found yourself in crowd trying to burn a witch at the stake?

        Hacking the flow of human thought isn't that hard, especially across populations. Hacking any one particular humans thoughts is harder unless you have a lot of information on them.

    • bdangubic 6 hours ago
      This is the most common answer from people that are rocking and rolling with AI tools but I cannot help but wonder how is this different from how we should have built software all along. I know I have been (after 10+ years…)
      • EastLondonCoder 5 hours ago
        I think you are right, the secret is that there is no secret. The projects I have been involved with thats been most successful was using these techniques. I also think experience helps because you develop a sense that very quickly knows if the model wants to go in a wonky direction and how a good spec looks like.

        With where the models are right now you still need a human in the loop to make sure you end up with code you (and your organisation) actually understands. The bottle neck has gone from writing code to reading code.

        • sksisksbbs 4 hours ago
          > The bottle neck has gone from writing code to reading code.

          This has always been the bottleneck. Reviewing code is much harder and gets worse results than writing it, which is why reviewing AI code is not very efficient. The time required to understand code far outstrips the time to type it.

          Most devs don’t do thorough reviews. Check the variable names seem ok, make sure there’s no obvious typos, ask for a comment and call it good. For a trusted teammate this is actually ok and why they’re so valuable! For an AI, it’s a slot machine and trusting it is equivalent to letting your coworkers/users do your job so you can personally move faster.

  • scarrilho 3 hours ago
    With so much noise in the AI world and constant model updates (just today GPT-5.3-Codex and Claude Opus 4.6 were announced), this was a really refreshing read. It’s easy to relate to his phased approach to finding real value in tooling and not just hype. There are solid insights and practical tips here. I’m increasingly convinced that the best way not to get overwhelmed is to set clear expectations for what you want to achieve with AI and tailor how you use it to work for you, rather than trying to chase every new headline. Very refreshing.
  • sho_hn 7 hours ago
    Much more pragmatic and less performative than other posts hitting frontpage. Good article.
    • alterom 6 hours ago
      Finally, a step-by-step guide for even the skeptics to try to see what spot the LLM tools have in their workflows, without hype or magic like I vibe-coded an entire OS, and you can too!.
  • senko 5 hours ago
    For those wondering how that looks in practice, here's one of OP's past blog posts describing a coding session to implement a non-trivial feature: https://mitchellh.com/writing/non-trivial-vibing (covered on HN here: https://news.ycombinator.com/item?id=45549434)
  • energy123 46 minutes ago
    > Immediately cease trying to perform meaningful work via a chatbot.

    That depends on your budget. To work within my pro plan's codex limits, I attach the codebase as a single file to various chat windows (GPT 5.2 Thinking - Heavy) and ask it to find bugs/plan a feature/etc. Then I copy the dense tasklist from chat to codex for implementation. This reduces the tokens that codex burns.

    Also don't sleep on GPT 5.2 Pro. That model is a beast for planning.

  • keyle 5 hours ago
    It's amusing how everyone seems to be going through the same journey.

    I do run multiple models at once now. On different parts of the code base.

    I focus solely on the less boring tasks for myself and outsource all of the slam dunk and then review. Often use another model to validate the previous models work while doing so myself.

    I do git reset still quite often but I find more ways to not get to that point by knowing the tools better and better.

    Autocompleting our brains! What a crazy time.

  • seemaze 1 hour ago
    What a lovely read. Thank you for sharing your experience.

    The human-agent relationship described in the article made me wonder: are natural, or experienced, managers having more success with AI as subordinates than people without managerial skill? Are AI agents enormously different than arbitrary contractors half a world away where the only communication is daily text exchanges?

  • bthornbury 1 hour ago
    AI is getting to the game-changing point. We need more hand-written reflections on how individuals are managing to get productivity gains for real (not a vibe coded app) software engineering.
  • underdeserver 6 hours ago
    > At a bare minimum, the agent must have the ability to: read files, execute programs, and make HTTP requests.

    That's one very short step removed from Simon Willison's lethal trifecta.

    • smj-edison 1 hour ago
      I will say one thing Claude does is it doesn't run a command until you approve it, and you can choose between a one-time approval and always allowing a command's pattern. I usually approve the simple commands like `zig build test`, since I'm not particularly worried about the test harness. I believe it also scopes file reading by default to the current directory.
    • recursive 5 hours ago
      I'm definitely not running that on my machine.
      • margalabargala 4 hours ago
        The way this is generally implemented is that agents have the ability to request a tool use. Then you confirm "yes, you may run this grep".
  • rthak 1 hour ago
    Now that the Nasdaq crashes, people switch from the stick to the carrot:

    "Please let us sit down and have a reasonable conversation! I was a skeptic, too, but if all skeptics did what I did, they would come to Jesus as well! Oh, and pay the monthly Anthropic tithe!"

  • cal_dent 5 hours ago
    Just wanted to say that was a nice and very grounded write up; and as a result very informative. Thank you. More stuff like this is a breath of fresh air in a landscape that has veered into hyperbole territory both in the for and against ai sides
  • josh-sematic 3 hours ago
    This is yet one more indication to me that the winds have shifted with regards to the utility of the “agent” paradigm of coding with an LLM. With all the talk around Opus 4.5 I decided to finally make the jump there myself and haven’t yet been disappointed (though admittedly I’m starting it on some pretty straightforward stuff).
  • e40 2 hours ago
    For those of working on large proprietary, in fringe languages as well, what can we do? Upload all the source code to the cloud model? I am really wary of giving it a million lines of code it’s never seen.
  • zubspace 5 hours ago
    It's so sad that we're the ones who have to tell the agent how to improve by extending agent.md or whatever. I constantly have to tell it what I don't like or what can be improved or need to request clarifications or alternative solutions.

    This is what's so annoying about it. It's like a child that does the same errors again and again.

    But couldn't it adjust itself with the goal of reducing the error bit by bit? Wouldn't this lead to the ultimate agent who can read your mind? That would be awesome.

    • pixl97 1 hour ago
      While this may be the end goal, I do think humanity needs to take the trip along with AI to this point.

      A mind reading ultimate agent sounds more like a deity, and there are more than enough fables warning one not to create gods because things tend to go bad. Pumping out ASI too quickly will cause massive destabilization and horrific war. Not sure who against really either. Could be us humans against the ASI, could be the rich humans with ASI against us. Anyway about it, it would represent a massive change in the world order.

    • audience_mem 4 hours ago
      > It's so sad that we're the ones who have to tell the agent how to improve by extending agent.md or whatever.

      Your improvement is someone else's code smell. There's no absolute right or wrong way to write code, and that's coming from someone who definitely thinks there's a right way. But it's my right way.

      Anyway, I don't know why you'd expect it to write code the way you like after it's been trained on the whole of the Internet & the the RLHF labelers' preferences and the reward model.

      Putting some words in AGENTS.md hardly seems like the most annoying thing.

      tip: Add a /fix command that tells it to fix $1 and then update AGENTS.md with the text that'd stop it from making that mistake in the future. Use your nearest LLM to tweak that prompt. It's a good timesaver.

    • cactusplant7374 4 hours ago
      It is not a mind reader. I enjoy giving it feedback because it shows I am in charge of the engineering.

      I also love using it for research for upcoming features. Research + pick a solution + implement. It happens so fast.

  • raphinou 6 hours ago
    I recently also reflected on the evolution of my use of ai in programming. Same evolution, other path. If anyone is interested: https://www.asfaload.com/blog/ai_use/
  • davidw 5 hours ago
    This seems like a pretty reasonable approach that charts a course between skepticism and "it's a miracle".

    I wonder how much all this costs on a monthly basis?

    • tptacek 5 hours ago
      As long as we're on the same page that what he's describing is itself a miracle.
  • butler14 6 hours ago
    I'd be interested to know what agents you're using. You mentioned Claude and GPT in passing, but don't actually talk about which you're using or for which tasks.
  • mwigdahl 7 hours ago
    Good article! I especially liked the approach to replicate manual commits with the agent. I did not do that when learning but I suspect I'd have been much better off if I had.
  • taikahessu 4 hours ago
    Do you have any ideas on how to harness AI to only change specific parts of a system or workpiece? Like "I consider this part 80/100 done and only make 'meaningful' or 'new contributions' here" ...?
  • fix4fun 6 hours ago
    Thanks for sharing your experiences :)

    You mentioned "harness engineering". How do you approach building "actual programmed tools" (like screenshot scripts) specifically for an LLM's consumption rather than a human's? Are there specific output formats or constraints you’ve found most effective?

  • henry_bone 3 hours ago
    LLMs are not for me. My position is that the advantage we humans have over the rest of the natural world, is our minds. Our ability to think, create and express ideas is what separates us from the rest of the animal kingdom. Once we give that over to "thinking" machines, we weaken ourselves, both individually and as a species.

    That said, I've given it a go. I used zed, which I think is a pretty great tool. I bought a pro subscription and used the built in agent with Claude Sonnet 4.x and Opus. I'm a Rails developer in my day job, and, like MitchellH and many others, found out fairly quickly that tasks for the LLM need to be quite specific and discrete. The agent is great a renames and minor refactors, but my preferred use of the agent was to get it to write RSpec tests once I'd written something like a controller or service object.

    And generally, the LLM agent does a pretty great job of this.

    But here's the rub: I found that I was losing the ability to write rspec.

    I went to do it manually and found myself trying to remember API calls and approaches required to write some specs. The feeling of skill leaving me was quite sobering and marked my abandonment of LLMs and Zed, and my return to neovim, agent-free.

    The thing is, this is a common experience generally. If you don't use it, you lose it. It applies to all things: fitness, language (natural or otherwise), skills of all kinds. Why should it not apply to thinking itself.

    Now you may write me and my experience off as that of a lesser mind, and that you won't have such a problem. You've been doing it so long that it's "hard-wired in" by now. Perhaps.

    It's in our nature to take the path of least resistance, to seek ease and convenience at every turn. We've certainly given away our privacy and anonymity so that we can pay for things with our phones and send email for "free".

    LLMs are the ultimate convenience. A peer or slave mind that we can use to do our thinking and our work for us. Some believe that the LLM represents a local maxima, that the approach can't get much better. I dunno, but as AI improves, we will hand over more and more thinking and work to it. To do otherwise would be to go against our very nature and every other choice we've made so far.

    But it's not for me. I'm no MitchellH, and I'm probably better off performing the mundane activities of my work, as well as the creative ones, so as to preserve my hard-won knowledge and skills.

    YMMV

    I'll leave off with the quote that resonates the most with me as I contemplate AI:-

    "I say your civilization, because as soon as we started thinking for you, it really became our civilization, which is, of course, what this is all about." -- Agent Smith "The Matrix"

    • FeteCommuniste 1 hour ago
      AI adoption is being heavily pushed at my work and personally I do use it, but only for the really "boilerplate-y" kinds of code I've already written hundreds of times before. I see it as a way to offload the more "typing-intensive" parts of coding (where the bottleneck is literally just my WPM on the keyboard) so I have more time to spend on the trickier "thinking-intensive" parts.
  • dudewhocodes 4 hours ago
    Refreshing to read a balanced opinion, from a person who has significant experience and grounding in the real world.
  • 0xbadcafebee 5 hours ago
    > I'm not [yet?] running multiple agents, and currently don't really want to

    This is the main reason to use AI agents, though: multitasking. If I'm working on some Terraform changes and I fire off an agent loop, I know it's going to take a while for it to produce something working. In the meantime I'm waiting for it to come back and pretend it's finished (really I'll have to fix it), so I start another agent on something else. I flip back and forth between the finished runs as they notify me. At the end of the day I have 5 things finished rather than two.

    The "agent" doesn't have to be anything special either. Anything you can run in a VM or container (vscode w/copilot chat, any cli tool, etc) so you can enable YOLO mode.

  • polyrand 5 hours ago
    > a period of inefficiency

    I think this is something people ignore, and is significant. The only way to get good at coding with LLMs is actually trying to do it. Even if it's inefficient or slower at first. It's just another skill to develop [0].

    And it's not really about using all the plugins and features available. In fact, many plugins and features are counter-productive. Just learn how to prompt and steer the LLM better.

    [0]: https://ricardoanderegg.com/posts/getting-better-coding-llms...

  • apercu 5 hours ago
    I find it interesting that this thread is full of pragmatic posts that seem to honestly reflect the real limits of current Gen-Ai.

    Versus other threads (here on HN, and especially on places like LinkedIn) where it's "I set up a pipeline and some agents and now I type two sentences and amazing technology comes out in 5 minutes that would have taken 3 devs 6 months to do".

  • jonathanstrange 5 hours ago
    There are so many stories about how people use agentic AI but they rarely post how much they spend. Before I can even consider it, I need to know how it will cost me per month. I'm currently using one pro subscription and it's already quite expensive for me. What are people doing, burning hundreds of dollars per month? Do they also evaluate how much value they get out of it?
    • JoshuaDavid 5 hours ago
      Low hundreds ($190 for me) but yes.
    • latchkey 5 hours ago
      I quickly run out of the JetBrains AI 35 monthly credits for $300/yr and spending an additional $5-10/day on top of that, mostly for Claude.

      I just recently added in Codex, since it comes with my $20/mo subscription to GPT and that's lowering my Claude credit usage significantly... until I hit those limits at some point.

      2012 + 300 + 5~200... so about $1500-$1600/year.

      It is 100% worth it for what I'm building right now, but my fear is that I'll take a break from coding and then I'm paying for something I'm not using with the subscriptions.

      I'd prefer to move to a model where I'm paying for compute time as I use it, instead of worrying about tokens/credits.

  • whatifnomoney 5 hours ago
    [dead]
  • jeffrallen 5 hours ago
    > babysitting my kind of stupid and yet mysteriously productive robot friend

    LOL, been there, done that. It is much less frustrating and demoralizing than babysitting your kind of stupid colleague though. (Thankfully, I don't have any of those anymore. But at previous big companies? Oh man, if only their commits were ONLY as bad as a bad AI commit.)

  • vonneumannstan 7 hours ago
    For the AI skeptics reading this, there is an overwhelming probability that Mitchell is a better developer than you. If he gets value out of these tools you should think about why you can't.
    • jorvi 6 hours ago
      The AI skeptics instead stick to hard data, which so far shows a 19% reduction in productivity when using AI.
      • simonw 5 hours ago
        https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

        > 1) We do NOT provide evidence that AI systems do not currently speed up many or most software developers. Clarification: We do not claim that our developers or repositories represent a majority or plurality of software development work.

        > 2) We do NOT provide evidence that AI systems do not speed up individuals or groups in domains other than software development. Clarification: We only study software development.

        > 3) We do NOT provide evidence that AI systems in the near future will not speed up developers in our exact setting. Clarification: Progress is difficult to predict, and there has been substantial AI progress over the past five years [3].

        > 4) We do NOT provide evidence that there are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting. Clarification: Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup.

      • raincole 4 hours ago
        There is no such hard data. It's just research done on 16 developers using Cursor and Sonnet 3.5.
    • recursive 5 hours ago
      Perhaps that's the reason. Maybe I'm just not a good enough developer. But that's still not actionable. It's not like I never considered being a better developer.
    • z0r 6 hours ago
      I'm not as good as Fabrice Bellard either but I don't let that bother me as I go about my day.
    • dakiol 7 hours ago
      Don't get it. What's the relation between Mitchell being a "better" developer than most of us (and better is always relative, but that's another story) and getting value out of AI? That's like saying Bezos is a way better businessman than you, so you should really hear his tips about becoming a billionaire. No sense (because what works for him probably doesn't work for you)

      Tons of respect for Mitchell. I think you are doing him a disservice with these kinds of comments.

      • tux1968 6 hours ago
        Maybe you disagree with it, but it seems like a pretty straightforward argument: A lot of us dismiss AI because "it can't be trusted to do as good a job as me". The OP is arguing that someone, who can do better than most of us, disagrees with this line of thinking. And if we have respect for his abilities, and recognize them as better than our own, we should perhaps re-assess our own rationale in dismissing the utility of AI assistance. If he can get value out of it, surely we can too if we don't argue ourselves out of giving it a fair shake. The flip side of that argument might be that you have to be a much better programmer than most of us are, to properly extract value out of the AI... maybe it's only useful in the hands of a real expert.
        • jplusequalt 6 hours ago
          >A lot of us dismiss AI because "it can't be trusted to do as good a job as me"

          Some of us enjoy learning how systems work, and derive satisfaction from the feeling of doing something hard, and feel that AI removes that satisfaction. If I wanted to have something else write the code, I would focus on becoming a product manager, or a technical lead. But as is, this is a craft, and I very much enjoy the autonomy that comes with being able to use this skill and grow it.

          • mitchellh 6 hours ago
            There is no dichotomy of craft and AI.

            I consider myself a craftsman as well. AI gives me the ability to focus on the parts I both enjoy working on and that demand the most craftsmanship. A lot of what I use AI for and show in the blog isn’t coding at all, but a way to allow me to spend more time coding.

            This reads like you maybe didn’t read the blog post, so I’ll mention there many examples there.

          • fizx 6 hours ago
            I enjoy Japanese joinery, but for some reason the housing market doesn't.
          • tux1968 6 hours ago
            Nobody is trying to talk anyone out of their hobby or artisanal creativeness. A lot of people enjoy walking, even after the invention of the automobile. There's nothing wrong with that, there are even times when it's the much more efficient choice. But in the context of say transporting packages across the country... it's not really relevant how much you enjoy one or the other; only one of them can get the job done in a reasonable amount of time. And we can assume that's the context and spirit of the OP's argument.
            • mold_aid 6 hours ago
              >Nobody is trying to talk anyone out of their hobby or artisanal creativeness.

              Well, yes, they are, some folks don't think "here's how I use AI" and "I'm a craftsman!" are consistent. Seems like maybe OP should consider whether "AI is a tool, why can't you use it right" isn't begging the question.

              Is this going to be the new rhetorical trick, to say "oh hey surely we can all agree I have reasonable goals! And to the extent they're reasonable you are unreasonable for not adopting them"?

            • jplusequalt 6 hours ago
              >But in the context of say transporting packages across the country... it's not really relevant how much you enjoy one or the other; only one of them can get the job done in a reasonable amount of time.

              I think one of the more frustrating aspects of this whole debate is this idea that software development pre-AI was too "slow", despite the fact that no other kind of engineering has nearly the same turn around time as software engineering does (nor does they have the same return on investment!).

              I just end up rolling my eyes when people use this argument. To me it feels like favoring productivity over everything else.

    • mold_aid 6 hours ago
      "Why can't you be more like your brother Mitchell?"
  • xyst 6 hours ago
    [flagged]
    • dang 6 hours ago
      "Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

      "Don't be snarky."

      "Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative."

      https://news.ycombinator.com/newsguidelines.html

  • therein 7 hours ago
    [flagged]
    • dang 6 hours ago
      Ok, but please don't post unsubstantive comments to Hacker News.
    • stronglikedan 7 hours ago
      most AI adoption journeys are
    • alterom 6 hours ago
      >Underwhelming

      Which is why I like this article. It's realistic in terms of describing the value-propositio of LLM-based coding assist tools (aka, AI agents).

      The fact that it's underwhelming compared to the hype we see every day is a very, very good sign that it's practical.