It’s impossible to avoid debates about Large Language Models (LLMs) AI these days, and I’ve been hesitant to step into the debate when I’ve had little novel to offer over excellent analysis like Gary Marcus, Ed Zitron, and Maria Sukhareva. The longer this has gone on, though, the clearer it’s become that most people aren’t reading research or expert analysis much at all. The noise of AI Boosters (and, heads up, I’ll mostly be using LLMs as the term and not AI in this piece) is so loud, the hype statements of industry CEOs being treated as factual roadmaps, and the relentless intrusion of those little star icons into every tool we use that stepping back to talk about how to assess and navigate these tools felt important for me to do. Thus, here I am, talking about LLMs on the internet. God help me.
Anyway, let’s get going. This piece will be focused on LLM use by software engineers and engineering leaders, but I’ve worked hard to make this accessible to everyone, perhaps even translatable into other fields I know less about.
What is an LLM?
They are not built like traditional software. They are not spreadsheets. They are not databases. They do not work like humans. They do not sanity check their own work. They don’t think like economists (who would immediately think of inflation) or even ordinary people (who also might take into account price changes). They don’t induce abstract models of the world that they can reliably engage, even when those models are given to them on a silver platter (as with the rules of chess).
Instead, LLMs are built, essentially, to autocomplete. Autocomplete as we know it works surprisingly well, for some purposes, but just isn’t good enough for world model induction.
- Gary Marcus, “LLMs are not like you or me—and never will be”
I’m not going to get too deep into this, as this is one of those subjects others have covered well, but I want to lay my cards on the table and be transparent about what I see these tools as doing. If you disagree with my later opinions, it may be because we see LLMs themselves differently. Should that be the case, let your problems start here.
An LLM is an extremely complex pattern matcher and pattern generator. Saying this can sound like I’m oversimplifying as part of my critique, and that’s fair. These tools are, again, extraordinarily complex. Technically speaking, they’re somewhat of a marvel. They function by taking in what you ask (your prompt), additional context you provide it (access to code files, links to websites, etc.), and break them down into tokens, and then match the tokens and their order to match to tokens and orders in its own model, in order to generate a response by choosing tokens that probabilistically match the inquiry.
What LLMs are not is “artificial intelligence”, or any kind of intelligence at all. LLMs do not learn outside of the training of the initial model, but can give the illusion of learning depending on the size of what the industry refers to as their “context window” — essentially the amount of Stuff related to your inquiry it can retain and reprocess all at once. Exceed the context window, though, and things that it felt like the LLM “learned” are now gone as if they never existed. The thing LLMs are most unambiguously good at, as a result, is natural language processing. Unfortunately, because they can’t learn and don’t have any sort of model of the world or understanding of what those tokens mean, language processing is effectively how LLMs solve every problem.
How to assess an LLM
If you’re a leader or executive in technology, it’s unlikely you aren’t being asked how you can use LLMs to accelerate your work by investors, CEOs, or peers in the industry. Not if you should, but how. You may be lucky enough to have a role, like my own, where you have the ability to thoughtfully chart your way through things. You may not. You may even be one of this executives asking that question to the people you manage. Either way, the question you’re being asked (or asking of others) is to assess these tools and use them for what they’re good at.
How do you assess an LLM, though? How do you know what it’s good at, when you or your team should use them, and when they’re a hinderance and not a boost? Let’s start with how not to assess them.
First, if you read a post or a news article talking about the impact of LLMs, and the source is entirely people like Sam Altman, Dario Amodei, or other AI executives: that is not how to assess these tools. There’s a very mistaken belief that the role of these CEOs is that of a technologist giving facts about what their products are doing. It’s better to read these statements as PR, as marketing, and that the audience for these statements are the people who might buy their products, or the investors they need to infuse more money into their cash-burning enterprises. If the hype is coming from inside the house, it has little to no value in understanding what the tools can do today, and what they may do tomorrow.
Second, your personal anecdata is of limited usefulness. If you use an LLM for something a couple of times and get good results, you haven’t proven anything. This isn’t because you’re personally lacking the ability to analyze the tooling, but because these tools are massively complex and truly gigantic pieces of technology. There is simply no way for superficial usage to tell you or your team what the experience at scale of these tools will be.
Third, step back and ask why you’re enthusiastic about these tools? How much of your drive to push the use of them is out of fear that, if you don’t, others will and will run laps around you? One of the ways the LLM hype cycle has been thrown into overdrive is through fear of being left behind. Hype from the Altmans and Amodeis of the world around being able to stand up applications in hours leads to fear that competitors only need to sprinkle some LLM on their teams to gain dominance. Fear clouds our judgment, and being honest with how much you’re worried about missing the boat will help you assess more clearly.
If that’s how not to assess, then how should you do it? That’s easy: start reading research. Read a lot of it. Keep up on every shred of it you can. Where your personal experiences are too limited, research will describe how the tools perform under rigorously defined conditions, and at higher scale than you can simulate. Further, don’t let Boosters or Hypers convince you the research is bad, outdated, or missing the secret sauce of the tools. No research is perfect, but it’s highly unlikely a ycombinator poster has given the work a real peer review. If there’s a conflict between the research and your personal experience, or the statements of AI CEOs… that conflict isn’t proof the research is wrong. Remember: these tools are complex and work unexpectedly at scale.
I’m extremely LLM skeptical, but I know LLMs are not bad for every use case. How? Because I read the research.
How to navigate today’s LLM tooling: For software engineers
If executives are feeling and passing along the pressure to use LLMs, software engineers are entirely at the business end of a firehose of that pressure. While not every company is demanding their use, the fear that any underperformance will be attributed to refusal to LLM up is real. So is the fear that failing to Git Gud will lead to the rapid outmoding of all your other skills. It doesn’t help that there are articles out by serious engineers saying you’re a straight-out fool for not using LLMs.
Before getting into how to navigate this all at the workplace and in your career, I’d advise you do something like I did above for leaders: get deeper into the research. That research will tell you things that can, at minimum, allay your fears that prompt engineering is the full future of development. There is no evidence, at present, to justify the belief that LLMs are a consistent net boost to engineering productivity. While, unlike most LLM use cases, they do have benefits in engineering, those benefits are nowhere near as simple as proclamations that it’ll make you 10 times as productive.
With that fear put aside, let me share some of my own assessments of where LLMs are useful, where they’re neutral, and where they should not be trusted. Why? Because, if you’re facing demands to use the tools, the best you can do for your sanity is use them in ways that have a better chance of being helpful.
Overall, the impact of LLMs in my assessment is effort-shifting, not time-saving. LLMs can ease the effort of getting started on a feature by generating full-stack solutions in one go. One of the reasons managers tend to overstate the benefits of LLMs is because, for a manager with too little focus time and too many meetings, reducing the effort early on by prompting between meetings really can help out. It’s also relatively simple to wipe away a branch that went nowhere and try again. And again. Each time, the effort expended is low, as the tooling iterates through approaches.
This often doesn’t result in a real reduction in time spent on the feature overall, though. While LLMs are optimized for rapidly generating things, one of their downsides is that they generate a lot of things. LLM code can be verbose, repetitive, and chaotic, because LLMs don’t “know” what good code is. They may produce it if their training data has a clean, matching solution to the problem, but the more you push an LLM in a big code base with its own patterns and idioms, the generated work can add a lot more code than you would. This matters because, once that initial run of code generation is done, you’re going to have to read it all to make sure it’s doing what you want.
Revising, refactoring, and debugging can thus be a lot more work with LLMs if you’re doing your job correctly. The same goes for impacts on the rest of the team as they have to review what you produced. The network impacts of LLM time-to-prototype on the overall team can get out of hand if you don’t do the work necessary to adjust the LLM’s output. That easing and speeding of the early part of the process creates debts that need to be paid down the line. (Also, don’t get me started on LLM-produced UIs and CSS, you’ll be here all day, just like I was when I had to un-break layouts the last time I asked it to build a feature.)
So, the first thing to ask yourself is which phase of work you get the most value from, and where you provide the most value. If refactoring is a joy to you, but getting started is not, then allowing LLMs to kickstart the process may make work feel better. It probably won’t speed things up, but work feeling less onerous has important benefits for us. That’s a great place to use it. Another area LLMs can be useful for is one where code quality is less important: writing automated tests. Allowing the machine to generate Too Much Code in tests has far lower impacts, and testing is one of the most tedious phases for a lot of feature development. If you enjoy getting started, but are feeling pressure to use these anyway, try throwing an LLM at fleshing out your tests.
Where LLMs are not any good, though, is documentation. This may sound counterintuitive given how many LLM tutorials talk about having them generate plans and produce documentation based on the code you’ve written, but on this, well, why are you booing, I’m right.
LLMs are, again, very verbose. They also hallucinate. When coding, automated tests can corral a lot of those hallucinations, but not so in documentation. Verbosity plus random lies equals likely bad and untrustworthy documentation slipping by. The counterargument you’ll hear is that it’s better to have documentation than not, but this is where I’m going to most ruffle some feathers. You’re better off with no documentation than reams of bad documentation. Especially machine-generated bad documentation.
What’s the goal of documentation? To transfer learnings and context to people in the future. When interacting with a new part of the codebase, or seeing a novel bug, the first recourse is (or, uh, should be) going to the docs to see if someone has explained what’s going on. Is there a runbook? Did someone lay out how to replicate this pattern? Are there clear descriptions of what this API is meant to do?
Now, think about the moment of needing that documentation. Down one path, the “better none than bad” path, you search for docs and find none. Quickly, you can feel confident that an answer doesn’t exist and shift gears into other avenues of research. That speed to determining whether an answer already exists has way more value than many realize. Especially in a crisis like an outage, closing down fruitless avenues quickly allows mitigation efforts to triage and move on. The same goes when developing a new feature. If the answer exists, that’s awesome, but if it doesn’t, you might as well start experimenting yourself, or just asking others on the team if they know things they didn’t document.
Down the other path, the one where every new feature gets five pages of LLMed lists, you’re very likely to find documentation on what you’re searching for. Unfortunately, you’re going to find a lot of it. That means spending time going through it al, trying to understand what it’s saying, and potentially running into hallucinations the generator did not catch. You may find yourself wasting time in a crisis running commands that don’t work, or building a feature while missing key context that would allow you do it correctly. The existence of those dubious guides will give you and the team confidence that you didn’t miss anything, making bugs more likely to slip past.
Remember what I said about how LLMs are primarily good at natural language processing, but not in having a model of the world to guide that processing? In code, automated tests can give a systematic way for an LLM to correct itself. You can, to some degree, enforce the definitions of business context onto the tools. This is not the same when generating documentation. Because they’re good at language generation, the output will look and sound good. What it won’t be is actually organized for clarity of context delivery.
What I’m saying is that documentation and project plans are communication, but their goal is not primarily to communicate detail. It’s to provide the context necessary to understand that detail, and an LLM’s inability to understand what anything in the code means will lead to documentation that is not organized to communicate that meaning. If what you want is for an LLM to reword documentation, to edit it, and you’ve supplied the structure and context, then it has a chance at helping you. It cannot and will not generate that context, because that is not what the tool does.
How to navigate today’s LLM tooling: for tech leaders and executives
You’re in charge of an engineering team, or maybe an entire organization, and you’ve effectively set your team up for success through a mix of thoughtful guidance on how and how to use LLMs, and an an approach providing options but not demanding workflows. That’s a great step one, and so consider my first piece of advice to read the above section and approach your team in that spirit.
Mission accomplished, right? That was a short section.
Only, wait, no, your job isn’t just to set up the conditions for tactical success, but to think strategically about the success of your org long term. That means, beyond understanding how the tools work, you need to have an eye on where these tools are going. To do that, I strongly encourage you to look beyond the technology and into the economics of the LLM industry.
And by “economics of the LLM industry” I mean the fact that they do not work.
When I and others bring up the extreme cash burn of LLM startups, the defenses tend to fall into either the bucket of “all startups lose money” or “even if the companies go out of business, the technology will still exist”. The problem with both of these defenses is something that plagues all of us in technology: scale.
Yes, startups often lose money, but how much money matters. There’s not an infinite amount of investment capital in the system. There will come a point where reality sets in and the well dries up. The question on cash burn is thus not whether it mattes, but at what scale. I strongly advise tech executives like myself to really look into the amount of money in play, and to compare it to every other startup and tech subindustry over all of time. There is a significant phase shift in scale between them, and that phase shift certainly looks like enough to bankrupt a lot of people if the current trajectory holds.
As for the technology continuing to exist, that’s sort of true. Yes, the technical foundations of LLMs and the models themselves will still be around. Stopping there, and assuming everything will simply continue on, misunderstands a few things. The biggest is understanding why the companies are losing money in the first place. Compute costs are a consistent and serious drain on money and resources, and the death of the companies doing all of the major LLM research means no one will be making those more efficient. Essentially all LLM tools lose their providers money today, which means any success your team has using these tools is coming at an artificially reduced cost. When investment capital propping those losses up goes away, the full cost of running a system that bankrupted everyone who provided it will now be on you and your team to pay.
The issue of these tools not being ready for prime time also becomes far more acute if OpenAI and Anthropic shrink or die, and Microsoft, Google, and Amazon pull back investment. Much of what drives LLM adoption today is the belief that these tools will eventually close their gaps and, having already gotten them implemented internally, companies will be ready too leverage them to race ahead. What if they never get better, though? What if GPT-5, a new release almost universally considered a disappointment, is the ceiling? Does a far more expensive version of it, in its current state, bring meaningful value to your organization?
What I’m getting at here is that, as an executive, you need to be thinking a lot more about what adoption of a tool that may skyrocket in price and never get better could do to your org in a world where the bubble has popped. It may seem like a no-brainer to get what value you can out of it now, but today’s new processes become expectations over time. If Claude Code is suddenly too expensive to internally support, and you have an engineering team that’s become reliant on that workflow… what then?
My advice isn’t to avoid their use entirely, but to be more strategic about assessing the risks. Treating LLMs as a guaranteed permanent, always-improving feature of your teams’ workflows could be leading your org into a trap. Treat LLMs like you would other new, unstable technologies and navigate their adoption in a way that doesn’t tie your future success to industry outcomes that may not come to pass.
Wrapping up
I’ve avoided giving too many opinions on whether the code produced by LLMs is valuable enough at all to bother with. That’s intentional, because on that, I risk leaning too much on personal anecdata. That doesn’t mean I’m bullish on the prospect, though. LLM-generated code that’s been carefully revised by an engineer is not a problem, because how you start is less important than how you finish. The issue is that a lot of LLM code isn’t being carefully revised, because the goal of the tools is explicitly to save time. That code, as it goes into production without useful oversight, can cause problems. In my experience, these problems are too common and too severe to advise anyone to rely on them for too many parts of your workflows.
If you disagree with the above, that’s fine. It’s here, in the conclusion, because the things I stand by are already said. Whatever value you’re getting from LLMs cannot be separated from its known failure modes, nor the failure modes of the industry at large. Treating LLMs, as they’re being deployed today, as the guaranteed future is almost certainly folly. Even if it somehow works out, you’re still playing craps with your org’s future.
As engineers and leaders, we aren’t here to break from God’s example and play dice with the universe. We need to drive success today, but only in ways that won’t lay the foundation for disaster tomorrow. We need to be clear-eyed about the territory and avoid making decisions based on fear and hope.
You probably can’t avoid using LLMs entirely. A portion of those uses will even be helpful. Do what you need to for success today, learn how this technology can be leveraged and where it can’t. Just don’t stop there. Keep your eyes up. Don’t forget the much higher value the people in your organization bring, and don’t buy into marketing hype for a tool that was never going to be able to do everything for everyone. Make choices based on the research, and prepare for a possible AI winter on the horizon. The hardest thing to do in technology is be responsible in the face of investor and market pressure. And like most of the work people do, being responsible is not something an LLM can do for us.