It just doesn't work. I'm trying to build a simple tool that will let me visualize grid layouts.
It needs to toggle between landscape/portrait, and implement some design strategies so I can see different visualizations of the grid. I asked it to give me a slider to simulate the number of grids.
1st pass, it made something, but it was squished. And toggling between landscape and portrait made it so it squished itself the other way so I couldn't even see anything.
2nd pass, syntax error.
3rd try I ask it to redo everything from scratch. It now has a working slider, but the landscape/portrait is still broken.
4th try, it manages to fix the landscape/portrait issue, but now the issue is that the controls are behind the display so I have to reload the page.
5th try, it manages to fix this issue, but now it is squished again.
6th try, I ask it to try again from scratch. This time it gives me a syntax error.
This is so frustrating.
You need to be reasonably experienced and guide it.
First, you need to know that Claude will create nonsensical code. On a macro level it's not exactly smart it just has a lot of contextual static knowledge.
Debugging is not it's strongest skill. Most models don't do good at all. Opus is able to one-shot "troubleshooting" prompts occasionally, but it's a high probability that it veer of on a tangent if you just tell it to "fix things" based on errors or descriptions. You need to have an idea what you want fixed.
Another problem is that it can create very convincing looking - but stupid - code. If you can't guide it, that's almost guaranteed. It can create code that's totally backwards and overly complicated.
If it IS going on a wrong tangent, it's often hopeless to get it back on track. The conversation and context might be polluted. Restart and reframe the prompt and the problems at hand and try again.
I'm not totally sure about the language you are using, but syntax errors typically happens if it "forgets" to update some of the code, and very seldom just in a single file or edit.
I like to create a design.md and think a bit on my own, or maybe prompt to create it with a high level problem to get going, and make sure it's in the context (and mentioned in the prompts)
Like trying to write with a wet noodle - always off in some way.
Write the code feels way more precise and not less efficient.
While this may be possible, it likely requires a very detailed prompt and/or spec document.
---
Here is an example of something I successfully built with Claude: https://rift-transcription.vercel.app
Apparently I have had over 150 chat sessions related to the research and development of this tool.
- First, we wrote a spec together: https://github.com/Leftium/rift-transcription/blob/main/spec...
- The spec broke down development into major phases. I reviewed detailed plans for each phase before Claude started. I often asked Claude to update these detailed plans before starting. And after implementation, I often had to have Claude fix bugs in the implementation.
- I tried to share the chat session where Claude got the first functional MVP working: https://opncd.ai/share/fXsPn1t1 (unfortunately the shared session is truncated)
---
"AI mistakes you're probably making": https://youtu.be/Jcuig8vhmx4
I think the most relevant point is: AI is best for accelerating development tasks you could do on your own; not new tasks you don't know how to do.
---
Finally: Cloudlflare builds OAuth with Claude and publishes all the prompts: https://hw.leftium.com/#/item/44159166
https://github.com/lawless-m/Devolver
it uses hooks to export the session
https://github.com/lawless-m/Devolver/blob/master/HOOKS.md
and then parses the session logs and dumps them out
https://github.com/lawless-m/Devolver/blob/master/JSONL_FORM...
I then have it go to a central location because I use multiple machines and it creates a website so I can see what I've been working on.
Reading these figures now, I think it counts its own prompts, you know it talks to itself. There's no way I've typed ten thousand prompts on that projectLord help us
Like OP, I've been similarly struggling to get as much value from CC (grok et c) as "everyone" else seems to be.
I'm quite curious about the workflow around the spec you link. To me, it looks like quite an extensive amount of work/writing. Comparable or greater than the coding work, by amount, even. Basically trading writing code files for writing .md files. 150 chat sessions is also nothing to sneeze at.
Would you say that the spec work was significantly faster (pure time) than coding up the project would have been? Or perhaps a less taxing cognitive input?
(I also implemented the previous version with only a high-level, basic understanding of websockets: https://rift-transcription.vercel.app/sherpa)
I think of Claude like a "force-multiplier."
I have been able to implement ideas I previously gave up on. I can test out new ideas much faster.
For example, https://github.com/Leftium/gg started out as 100% hand-crafted code. I wanted gg to be able to print out the expression in the code in addition to the value like python icecream. (It's more useful to get both the value and the variable name/expression.) I previously tried, and gave up. Claude helped me add this feature within a few hours.
And now gg has its own virtual dev console optimized for interacting with coding agents. A very useful feature that I would probaly not have attempted without Claude. It's taken the "open in editor" feture to a completely new level.
I have implemented other features that I would have never attempted or even thought about. For example without Claud's assistance https://ws.leftium.com would not have many features like the animated background that resembles the actual sky color.
60 minute forecast was on my TODO list for a long time. Claude helped me add it within an afternoon or so.
Note: depending on complexity of the feature I want to add the spec varies in the level of detail. Sometimes there is no spec outside Claude's plans in the chat session.
[1]: https://github.com/gruns/icecream
In my experience, trying to make a plan/specs that really match what I want often ends in a struggle with Claude trying to regress to the mean.
Also it’s so easy to write code that I always have tons of ideas I end up implementing that diverge from the original plan…
- It was definitely not one CC session. In fact, this spec is a spin-off of several other specs on several other branches/projects.
- I've actually experienced quite the opposite: I suggest an idea for the spec and Claude says "great idea!" Then I change my mind and go in the opposite direction: "great idea!" again. Once in a while, I have to argue with Claude to get my idea implemented (like adding dependencies to parse into a proper AST instead of regex.)
- One tip: it's very useful to explain the "why" to Claude vs the "what." In fact, if you just explain the why/problem without a specific solution, Claude's suggestions may surprise you!
Yes, this is what the hype says doesn't it?
Or... are they all lying?
A novice was trying to fix a broken Lisp machine by turning the power off and on.
Knight, seeing what the student was doing, spoke sternly: “You cannot fix a machine by just power-cycling it with no understanding of what is going wrong.”
Knight turned the machine off and on.
The machine worked.
http://www.catb.org/jargon/html/koans.html
Your first one-shot might be a good rough prototype. From there, you continue the conversation with your refinements. While Claude goes and works on that for 15 minutes - you can go and do other work. Or talk with another Claude in another window to make progress on another project.
A good mental model is to imagine you're talking to a remote developer. You need to give them an extremely detailed spec on the first go if you expect them to get it right the first time. Sometimes it's better to explain "this is my grand vision, but how about we first mockup a prototype to see if that's actually how I want it to work". Sometimes Claude will suggest you talk about your plan together first to remove the ambiguities from the plan, or you can encourage Claude to do that with you.
(Also, the remote developer mindset is useful - treat the remote developer with respect, with humanity, and they're more likely to be helpful towards you and motivated to align with your goals.)
Consider that in an hour or two of conversation, you now have your app, completed, fully debugged... and not once did you look at the code, and you spent half of that time catching up on your other tasks. That's vibe coding.
HN - posts and comments - is full of it.
And my personal experiments with the free chatbots contradict it ofc.
As for listening to Hacker News... yeah, this is one of the worst places (well, Mastodon is worse) and HN is surprisingly AI-doomerish. I don't check in here very often anymore, and as of this week I just get Claude to summarize HN headlines as a morning podcast for me instead.
My own experience: my first few uses of Claude in Dec 2024 seemed rubbish, I didn't get it. Then one day I asked it to make me a search engine. The one shot from that wasn't perfect, but it worked, and I saw it build it in front of my eyes. That was the moment & I kept iterating on it. I haven't used Google or Kagi in almost a year now.
Anyway, hope it helps, but if not using AI makes you feel more comfortable, go with what fills your life with more value & meaning & enjoyment.
So you have the resources to index the whole www on your own?
My pattern matching brain says this is normal for hype. It's a good product, but no where near to the level you read about in some places (like HN in this case)
You haven't even said what programming language you're trying to use, or even what platform.
It sounds to me like you didn't do much planning, you just gave it a prompt to build away.
My preferred method of building things, and I've built a lot of things using Claude, is to have a discussion with it in the chatbot. The back and forth of exploring the idea gives you a more solid idea of what you're looking for. Once we've established the idea I get it to write a spec and a plan.
I have this as an instruction in my profile.
> When we're discussing a coding project, don't produce code unless asked to. We discuss projects here, Claude Code does the actual coding. When we're ready, put all the documents in a zip file for easy transfer (downloading files one at a time and uploading them is not fun on a phone). Include a CONTENTS.md describing the contents and where to start.
So I'll give you this one as an example. It's a Qwen driven System monitor.
https://github.com/lawless-m/Marvinous
here are the documents generated in chat before trying to build anything
https://github.com/lawless-m/Marvinous/tree/master/ai-monito...
At this point I can usually say "The instructions are in the zip, read the contents and make a start." and the first pass mostly works.
It seems you try to tell the tool to do everything in one shot. That is a very wrong approach, not just with Claude but with everything(you ask a woman for a date and if you do not get laid in five minutes you failed?). When I program something manually and compiles, I expect it to be wrong. You have to iron it and debug it.
Instead of that:
1.Divide the work in independent units. I call this "steps"
2.Subdivide steps into "subsets" You work in an isolated manner on those subsets.
3.Use an inmediate gui interface like dear imgui to prototype your tool. Translating then into using something else once it works is quite easy of LLMs.
4.Visualize everything. You do not need to see the code but you need to visualise every single thing you ask it to do.
5.Tell Claude what you want and why you want it and update the documentation constantly.
6. Use git in order to make rock solid steps that Claude will not touch when it works and you can revert changes or ask the ia to explore a branch, explaining how you did something and want something similar.
7. Do not modify code that already works rock solid. Copy it into another step leaving the step as reference and modify it there.
5.Use logs. Lots of logs. For every step you create text logs and you debug the problems giving Claude the logs to read them.
6.Use screenshots. Claude can read screenshots. If you visualise everything, clause can see the errors too.
7.Use asserts, lots of asserts, just like with manual programming.
It is not that different from managing a real team of people...
If successfully using Claude Code is as difficult as successful dating, I'm not sure this tech will prevail. ;)
Stuff like "divide the work up" is something you do when doing it yourself. Making a GUI prototype isn't really much work at all in the age of LLMs, akin to drawing up a few ideas on a notepad. Using git for small steps is something lots of people do for their own work and rebase later. Using extensive logging is mostly just something you have in your AGENTS.md for all your projects and forget about, similarly getting it setup to make and look at screenshots.
What part of this is more work than doing it yourself?
This is especially true when the vision is a little hazy and the path isn’t clear. When doing it yourself, you can make decisions in the moment, try things, pivot… when trying to delegate these things, it becomes a chore to try to clarify things that are inherently unclear, and pivot an idea when the person (or AI) being delegated to doesn’t fully grasp the pivot and keeps bringing in old ideas.
I think most people have had an experience trying to delegate a task, where it becomes so much work to wrangle the person, that they just do it themselves. I’ve run into this countless times. That’s how it feels to use AI.
Documentation. Comments. Writing a plan and/or a spec before you begin coding. Being smart with git commits and branches.
In about 2 weeks we have a functional game, 60 levels, 28 different types of enemies, a procedurally generated daily challenge mode, an infinity mode. Tower crafting and upgrades, an economy system in the game for pay for the upgrades.
This likely would have taken us months to get to the point that we are at, it was playable on Day 2.
---
I'd like to add an archives mode to the daily challenge. This will allow players to complete any daily challenges they didn't attempt on the actual day.
It will look like a calendar, with the dates in Green if it was played, and in white if not.
The archive should only go back to January 30, 2026, the day the project started. Include a to do to change this date prior to release.
Rewards for completing daily challenges via the archive should be 25% of the normal value.
---
Claude Code then asked me a couple of clarifying questions before it harnessed the superpowers:writing-plans skill and generate a document to plan the work. The document it put together is viewable at https://gist.github.com/Jeremy1026/cee66bf6d4b67d9a527f6e30f...
There were a couple of edits that I made to the document before I told it to implement. It then fired off a couple of agents to perform the tasks in parallel where possible.
Once it finished I tested and it worked as I had hoped. But there was a couple of follow up things that would make it more intertwined with everything else going on around daily challenges. So I followed up with:
---
lets give 1 cell for compelting an archived daily challenge
---
And finally:
---
Now that we are tracking completions, can we update the notification to complete daily mission to include "Keep your X day streak"
---
-Start Prompt-
Currently, a towers level sets the maximum a single stat can be. Can you tell me what those stat caps are?
-End Prompt-
This primed the context to have information about the stat caps and how they are tied to levels. I followed up after it gave me a chart back with Tower Level and Max Stat Rank with some real stats from play
-Start Prompt-
Lets change the stat cap, the caps are currently far too high. All towers start at 1 for each IMPACT stat, my oldest tower is Level 5, and its stats are I-3, M-4, P-6, A-3, C-1, T-1. How do you think I could go about reducing the cap in a meaningful way.
-End Prompt-
It came back with a solution to reduce the individual stat cap for individual stats to be tower level + 1. But I felt that was too limiting. I want players to be able to specialize a tower so I told it have the stat cap be total, not per stat.
-Start Prompt-
I'm thinking about having a total stat cap, so in this towers case, the total stats are 18.
-End Prompt-
It generated a couple of structures of how the cap could increase and presented them to me.
-Start Prompt-
Yes, it would replace the per-stat cap entirely. If a player wants to specialize a tower in one stat using the entire cap that is fine.
Lets do 10 + (rank * 3), that will give the user a little bit of room to train a new tower.
Since it's a total stat cap, if a user is training and the tower earns enough stat xp to level beyond the cap, lock the tower at max XP for that stat, and autoamtically level the stat when the user levels up the tower.
-End Prompt-
It added the cap, but introduced a couple of build errors, so I sent it just the build errors.
-Start Prompt-
/Users/myuser/Development/Shelter Defense/Shelter Defense/Views/DebugTowerDetailView.swift:231:39 Left side of mutating operator isn't mutable: 'tower' is a 'let' constant
/Users/myuser/Development/Shelter Defense/Shelter Defense/Views/DebugTowerEditorView.swift:181:47 Left side of mutating operator isn't mutable: 'towerInstance' is a 'let' constant
-End Prompt-
And thus, a new stat cap system was implemented.
Put that down! What are you doing? Don't put that in your mouth. Where are you going? Stop that! Why are you sitting there alone, Johnny?
However ChatGPT is really helpful doing sysadmin style tasks on Linux.
The LLM under the hood is essentially a very fancy autocomplete. This always needs to be kept in mind when working with these tools. So you have to focus a lot on what the source text is that’s going to be used to produce the completion. The better the source text, the better the completion. In other words, you need to make sure you progressively fill the context window with stuff that matters for the task that you’re doing.
In particular, first explore the problem space with the tool (iterate), then use the exploration results to plan what needs doing (iterate), when the plan looks good and makes sense, only then you ask to actually implement.
Claude’s built in planning mode kind of does this, but in my opinion it sucks. It doesn’t make iterating on the exploration and the plan easy or natural. So I suggest just setting up some custom prompts (skills) for this with instructions that make sense for the particular domain/use case, and use those in the normal mode.
For context, I’ve built about 15k loc since Christmas on the $20 plan, plus $18 of extra usage. Since this is a side project, I only hit the limits once or twice per week.
Two questions:
1. How are you using Claude? Are you using https://claude.ai and copying and pasting things back and forth, or are you running one of the variants of Claude Code? If so, which one?
2. If you're running Claude Code have you put anything in place to ensure it can test the code it's writing, including accessing screenshots of what's going on?
There are 3 major steps:
(Plan mode)
1. assuming this is an existing codebase, load the relevant docs/existing code into context (usually by typing @<PATH>
2. Ask it to make a plan for the feature you want to implement. Assuming you’ve already put some thought into this, be as specific and detailed as you can. Ask it to build a plan that’s divided into individually variable steps. Read the plan file that it spits out, correct and bad assumptions it made, ask it questions if you’re unclear one what it’s saying, refine, etc.
(agent mode) Ask it to build the plan, one step at a time. After it builds each step verify that it’s correct, or have it help you verify it’s correct in a way you can observe.
I have been following this basic process mostly with Opus 4.5 in a mixture of claude code and cursor working on a pretty niche image processing pipeline (also some advanced networking stuff on the side) and have hand-written basically zero code.
People say - “your method sounds like a lot of work too” and that’s true, it is still work, but designing at a high level how I want some CUDA kernel to work and how it fits into the wider codebase and then describing it in a few sentences is still much faster than doing all of the above anyway and then hand writing 100 lines of CUDA (which I don’t know that well).
I’d conservatively estimate that i’ve made 2x the progress in the same amount of time as if I had been doing this without LLM tools.
It will ask you questions, break down the project into smaller tasks, work through them one by one with UAT check points along the way.
It also handles managing your context
Good luck!
This reminds me of someone who dropped into #java on undernet once upon a time in the 90s. "I can't get it to work" , and we kept trying to debug, and for some reason we kept hitting random new walls. It just never would work! Turns out that they were deleting their .java file and starting over each time. Don't do that.
---
Take it as a sequence of exercises.
Maybe start like this:
Don't use claude code at all to begin with. It's a pair programming exercise, and you start at the keyboard, where you're confident and in control. Have claude open in the web interface alongside, talk through the design with it while working; and ask to google stuff for you, look up the api, maybe ask if it remembers the best way(s) to approach the problem. Once you trust it a bit, maybe ask for code snippets or even entire functions. They can't be 100% correct because it doesn't have context... you might need to paste in some code to begin with. When there's errors, paste them in, maybe you'll get advice.
If you're comfy? Switch seats, start using claude code. Now you're telling claude what to do. And you can still ask the same questions you were asking before. But now you don't need to paste into the web interface anymore, and the AI sure as heck can type faster than you can.
Aren't you getting tired of every iteration where you're telling the AI "this went wrong", " that went wrong"? Maybe make sure there's a way for the AI to test stuff itself, so it can iterate a few cycles automatically. Your LLM can iterate through troubleshooting steps faster than you can type the first one. Still... keep an eye on it.
And, really that's about where I am now.
In my little experience, what I've seen work is that you need to provide a lot of constraints in the form of:
- Scope: Don't build a website, but build a feature (either user facing or infra, it doesn't matter). I've found that chunking my prompts in human-manageable tasks that would take 0.5-1 day, is enough of a scale down.
- Docs .md files that describe how the main parts of the application work, what a component/module/unit of code looks like, what tools&technologies to use (and links to the latest documentation and quickstart pages). You should commit these to code and update them with every code change (which with Claude is just a reminder in each prompt).
- Existing code, if it's not a greenfield project.
It really moves away from the advertised paradigm of one-shot vibe-coding but since the quality of the output is really good these days, this long preparation will give you a production ready output much sooner than with traditional methods.
Read through anthropics knowledge share, check out their system prompts extracted on github, write more words in AGENTS/CLAUDE.md, you need to give them some warmup to do better at tasks.
What model are you using? Size matters and Gemini is far better at UI design work. At the same time, pairing gemini-3-flash with claude-code derived prompts makes it nearly as good as Pro
Words matter, the way you phrase something can have disproportionate effect. They are fragile at times, yet surprisingly resilient at others. They will deeply frustrate you and amaze you on a daily basis. The key is to get better at recognizing this earlier and adjusting
You can find many more anecdotes and recommendations by looking through HN stories and social media (Bluesky has a growing Ai crowd, coming over from X, good community bump recently, there are an anti-ai labelers/block lists to keep the flak down)
If you're building a web app, give it a script that (re)starts the full stack, along with Playwright MCP or Chrome DevTools MCP or agent-browser CLI or something similar. Then add instructions to CLAUDE.md on how and when to use these tools. As in: "IMPORTANT: You must always validate your change end-to-end using Playwright MCP, with screenshot evidence, before reporting back to me that you are finished.".
You can take this further with hooks to more forcefully enforce this behavior, but it's usually not necessary ime.
* have Claude produce wireframes of the screens you want. Iterate on those and save them as images.
* then develop. Make sure Claude has the ability to run the app, interact with controls, and take screenshots.
* loop autonomously until the app looks like the wireframes.
Feedback loops are required. Only very simple problems get one-shot.
To answer the question I would highlight the wrong regions in neon green manually via code. Now feed the code (zipped if necessary) to the AI along with a screenshot. Now give it relatable references for the code and say "xxxx css class/gtk code/whatever is highlighted in the screenshot in neon. I expect it to be larger but it's not, why?"
The idea is, you want to build up the right context before starting development. I will either describe exactly what I want to build, or I ask the agent for guidance on different approaches. Sometimes I’ll even do this in a separate Claude (not Claude Code) conversation, which I feel works a bit faster. Once we have an approach, I will ask it to create an implementation plan in a markdown file, I clear context and then tell it to implement the plan.
Check out the “brainstorming” skill and the “git worktrees” skill. They will usually trigger the planning -> implementation workflow when the work is complex enough.
https://github.com/obra/superpowers
For example, I have this project where the idea is to use code verification to ensure the code is correct, the stated goal of the project is to produce verified software and the daffy robot still can't seem to understand that the verification part is the critical piece so... it cheats on them so they pass. I had the newest Claude Code (4.6?) look over the tests on the day it was released and the issues it found were really, really bad.
Now, the newest plan is to produce a tool which generates the tests from a DSL so they can't be made to pass and/or match buggy code instead of the clearly defined specification. Oh, I guess I didn't mention there's an actual spec for what we're trying to do which is very clear, in fact it should be relatively trivial to ensure the tests match for some super-human coding machine.
The alternative to this isn’t even necessarily no AI, just not using it this way.
Furthermore, and more generally, one of the great things about (traditional) coding is that it allows 'thinking through making' - by building something you learn more about the problem and thus how best to solve it. Code generation just leaves you with reviewing, which is less powerful in this way I believe. See also 'thinking through writing [prose]'.
It doesn't mean those tools do not have value though but they're not capable of "coding ", in the sense we mean in the industry, and generating code isn't coding.
That said: I suspect that OP is providing low-detail prompts.
These tools cannot read your mind. If you provide an under-specified prompt, they will fill in all the details for things that are necessary to complete the task, but that you didn't provide. This is how you end up with slop.
You still need knowledge of what you are building so you can drive it, guide it, fix things.
This is the core of the question about LLM assisted programming - what happens when non programmers use it?
We have the answer already, which product was fully built by a non-programmer with those tools? I can't find an example.
They just trip into their own code at some point and if there's nobody to watch, they end up with something they can't recover from.
It's especially devastating when they don't know enough git to get back on tracks
I wanted to tear my ears out.
What is crystal clear to me now is using LLMs to develop is a learned and practiced skill. If you expect to just drop in and be productive on day one, forget it. The smartest guy I know _who has a PhD in AI_, is hopeless at using it.
Practice practice practice. It's a tool, it takes practice. Learn on hobby projects before using it at work.
I don’t blame people for being upset when it can’t do what all the hype says it will do.
The way people talk about the latest Claude Code is the same way people were talking 2-3 years ago about whatever the latest model was then. Every release gets marketed as if it’s a new level of magic, yet we’re still here having the same debates about merit, because reality doesn’t match the marketing and hype.
It has gotten better, I tried something with early ChatGPT that failed horribly (a basic snake game written in C), and just tried the exact same thing again last week and it worked—it wasn’t good, but it technically worked. But if it took 3 years to get good enough to pass my basic test, why was I being fed those lies 3 years ago? The AI companies are like the boy who cried wolf. At this point, it’s on them to prove they can do what they say, not up to me to put in extraordinary efforts to try and get value out of their product.
Last week I sat through a talk from one of our SVPs who said development is cheap and easy now, then he went on about the buy vs build debate for 20 minutes. It’s like he read a couple articles and drank the kool-aid. I also saw someone talking about ephemeral programs… seeing a future where if you want to listen to some MP3s, you’ll just type in a prompt to generate a bespoke music player. This would require AI to reliably one-shot apps like Winamp or iTunes in a few words from a layperson with no programming background. These are the ideas the hype machine is putting in people’s minds that seem detached from reality.
I don’t think the, “you’re holding it wrong”, type responses are a good defense. It’s more that it’s being marketed wrong, because all these companies need to maintain the hype to keep raising money. When people use the AI the way the hype tells them it should work… it doesn’t work.
I agree with you, expectations are not being set correctly.
That's my point. Learn to use the tools, including as they were three years ago, and magic does happen
The process being described by many in the comments removes all the magic. It sounds laborious and process heavy. It removes the part of the job I like, while loading the job with more work I don’t enjoy. This feels like the opposite of what we should want AI to do.
Then, repeatedly ask Claude to criticize the plan and use the "AskUserQuestion" tool to ask for your input.
Keep criticizing and updating the plan until your gut says Claude is just trying to come up with things that aren't actually issues anymore.
Then unleash it (allow edits) and see where you get. From there you may ask for one off small edits. Or go back into plan mode again
CC was slow and the results I was getting were subpar having it debug some easy systems tasks. Later in the afternoon it recovered and was able to complete all my tasks. There’s another aspect to these coding agents: the providers can randomly quantize (lobotomize) models based on their capacity, so the model you’re getting may not be the one someone else is getting, or the same model you used yesterday.
1. Good for proof of concepts, prototypes but nothing that really goes to heavy production usage 2. Can make some debugging and fixing that usually requires looking the stack, look the docs and check the tree 3. Code is spaghetti all way down. One might say it is ok because it is fast to generate, but the bigger the application, every change gets more expensive and it always forget to do something. 4. Tests it generates is mostly useless. 9/10 times it always passes on all tests it creates for itself but the code does not even start. No matter what type of test. 5. Frequently lied about the current state of the code and only when pushed it will admit it was wrong.
As others said, it is a mix of the (misnomer) Danny Kruger effect and some hype.
I tried possibly every single trick to get it working better but I feel most are just tricks. They are not necessarily making it work better.
It is not completely useless, my work involves doing prototypes now and then and usually they need to be quite extensive. For that it has been a help. But I don't feel it is close to what they sell
Also I suggest giving it low-level instructions. Its half-decent for low level stuff especially if it has access to preexisting code. Also note that it does exactly what you tell it to do like a genie. I've asked it to write a func that already exists in the codebase and it wrote a massive chunk of code. It wasn't until after it was done that I remembered we already have the solution to the problem done. Anyhow the hype is unreal so tailor expectations accordingly.
For example, if you tell it to compile and run tests, you should never be in a situation with syntax errors.
But if you don’t give a prompt that allows to validate the result, then it’s going to get you whatever.
> Include tests, screenshots, or expected outputs so Claude can check itself. This is the single highest-leverage thing you can do.
Think about AI the same way you'd think about trading courses: would you buy a course that promises 10,000% returns? If such returns were possible, the course seller would just trade instead of selling courses.
Same logic here - if "vibe-coding" really worked at scale, Claude would be selling software, not tokens.
from the basics, did you actually tell it that you want those things? its not a mind reader. did you use plan mode? did you ask it to describe what its going to make?
If you treat it as an astonishingly sophisticated and extremely powerful autocomplete (which it is) - you have plenty of opportunities to make your life better.
Personnaly, I'm trying to learn the "make it write the plan, fix the plan, break it down even more, etc..." loops that are necessary; but I haven't had a use case (yet?) where the total time spent developing the thing was radically shorter.
LLMs make wonders on bootstrapping a greenfield project. Unfortunately, you tend to only do this only once ;)
This is why LLMs look so impressive in demos. Demos are nearly always greenfield, small in scale, and as long as it launches, it looks successful.
and then try again.
I’m not a software engineer by training nor trade, so caveats apply, but I found that the brainstorming -> plan writing -> plan execution flow provided by the skills in this plugin helps immensely with extracting assumptions and unsaid preferences into a comprehensive plan—-very similar to the guidance elsewhere in this thread, except automated/guided along by the plugin skills
First prompt, ask it to come with a plan, break it down to steps and save it to a file.
Edit file as needed.
Launch CC again, use the plan file to implement stage by stage, verify and correct. No technical debugging needed. Just saying X is supposed to be like this, but it’s actually like that goes a long way.
It is much better than other models I have tried. Didn't think the post would blow up so much tbh..
for now, anyway.
Relevant excerpt:
I spent a bit of time last month building a site to solve a problem I’ve always found super-annoying about the legislative process. It’s hard to read Federal bills because they don’t map to the underlying code. Anyone who has worked in Congress knows what I mean, you get a bill that says “change this word from ‘may’ to ‘shall’ in section XYZ of Federal law.” To understand what it does, and find possible loopholes they are trying to sneak in, you have to go to that underlying Federal law and look at where it says “may” and then put “shall” in there and read it. It’s basically like a manual version of copy and pasting, except much more complicated and with lawyers trying to trick you.
So I wrote an app that lets you upload legislation, and it automatically shows you how it changes Federal law. There are commercial versions of this software, and some states do it for their proposed legislation. But I haven’t seen anything on the Federal level that is free, so I built it. (The code is here.) It’s not very good. It’ll probably break a lot. There’s no “throat to choke” if you use it and it’s wrong. And my guess is that Anthropic or Gemini ultimately will be able to do this function itself eventually. But the point is that if I can build something like this in my spare time and deploy it without any training at all, then it’s just not that hard for an organization with some capital to get rid of some of its business software tools.
[1] https://www.thebignewsletter.com/p/monopoly-round-up-the-2-t...
syntax error is nothing, I just paste the error into the tui and it fixes it usually.
There used to be more or less one answer to the question of "how do I implement this UI feature in this language"
Now there are countless. Welcome to the brave new world of non-deterministic programming where the inputs can produce anything and nothing is for certain.
Everyone promises it can do something different if you "just use it this way".