• designated_fridge@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    4 days ago

    Interesting!

    I have gone through my ups and downs. Lately I’ve been more and more convinced. I use Claude Code (Opus 4.5) hooked up to our internal atlassian and google drive mcps. I then ofc have to do a lot of writing (gathering requirements, writing context, etc) but instead of spending two days coding, I’ll spend half a day on this and then kick off a CC agent to carry it out.

    I then do a self review when it’s done and a colleague reviews as well before merge.

    And not for architectural work… Rather for features, fixing tech debt, etc.

    This also has the benefit of jira tickets being 1000x better than in the pre-LLM era.

    • very_well_lost@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      4 days ago

      I’m primarily using Opus 4.5 as well (via Cursor). We’ve tried pointing it at JIRA/Confluence via MCP and just letting the agent do it’s thing, but we always get terrible results (even when starting with solid requirements and good documentation). Letting an agent run unsupervised just always makes a mess.

      We never get code that conforms to the existing style and architecture patterns of our application, no matter how much we fuss with rules files or MCP context. We also frequently end up with solutions that compromise security, performance or both. Code reviews take longer than they used to (even with CodeRabbit doing a first pass review of every PR), and critical issues are still sneaking through the review process and out to prod.

      My team has been diligent enough to avoid any major outages so far, but other teams in the organization have had major production outages that have all been traced back to AI generated code.

      I’ve managed to carve out a workflow that does at least produce production-ready code, but it’s hardly efficient:

      • Start in plan mode. Define what I need, provide context, and answer any qualifying questions from the model. Once I’m happy with the ‘plan’, I tell Cursor to save a hardcopy to my local machine. This is important, because it will serve as a rolling checkpoint for when Cursor inevitably crashes.
      • Have the agent generate any unit tests we’ll need to validate this feature when it’s done.
      • Review the generated unit tests and inevitably rewrite them. Tell Cursor to update the plan based on the changes I’ve made to the tests.
      • Put the AI in “Ask” mode (so it doesn’t touch the code just yet) and tell it to summarize the first step of the plan. This makes sure that the step I care about is in the model’s context window so it doesn’t get confused or over-extend.
      • Pop back to agent mode and tell the model to proceed with step 1 and then STOP.
      • Review the model’s output for any issues. At this stage I’ll frequently point out flaws in the output and have the model correct them.
      • Back to “ask” mode, summarize the next step of the plan.
      • Execute the next step, review the output, ask for changes, etc
      • Repeat until all steps are complete.
      • Run the unit tests, then, if there are failures, have the model try to fix those. 50% of the time it fixes any issues encountered here. The other 50% it just makes an enormous mess and I have to fix it myself.
      • Once the unit tests are all passing, I need to review all of the generated code together to further check for any issues I missed (of which there are usually several)
      • When I’m finally satisfied, I tell the agent to create the PR and the rest of the team very carefully reviews it.
      • PR is approved and off we go to QA.

      This is almost always slower than if I’d just written the code myself and hadn’t spent all that extra time babysitting the LLM. It’s also slower to debug if QA comes back with issues, because my understanding of the code is now worse than if I’d written it myself.

      I’ve spoken about this in other comments, but I’m going to repeat it again here because I don’t see anyone else talking about it: When you write code yourself, your understanding of that code is always better. Think of it like taking notes. Studies have shown over and over that humans retain information better when they take notes — not because they refer back to those notes later (although that obviously helps), but because by actively engaging with the material while they’re absorbing it, they build more connections in the brain than they would by just passively listening. This is a fundamental feature in how we learn (active is better than passive), and with the rise of code generation, we’re creating a major learning gap.

      There was a time when I could create a new feature and then six months later still remember all of the intimate details of the requirements I followed, the approach I took, and the compromises I had to make. Now? I’m lucky to retain that same information for 3 weeks, and I’m seeing the same in my coworkers.