A year of vibes

(lucumr.pocoo.org)

80 points | by lumpa 5 hours ago

12 comments

anshulbhide 2 minutes ago
> The pull request model on GitHub doesn’t carry enough information to review AI generated code properly — I wish I could see the prompts that led to changes. It’s not just GitHub, it’s also git that is lacking.
Yes! Who is building this?
simonw 2 hours ago
I really feel this bit:
> With agentic coding, part of what makes the models work today is knowing the mistakes. If you steer it back to an earlier state, you want the tool to remember what went wrong. There is, for lack of a better word, value in failures. As humans we might also benefit from knowing the paths that did not lead us anywhere, but for machines this is critical information. You notice this when you are trying to compress the conversation history. Discarding the paths that led you astray means that the model will try the same mistakes again.
I've been trying to find the best ways to record and publish my coding agent sessions so I can link to them in commit messages, because increasingly the work I do IS those agent sessions.
Claude Code defaults to expiring those records after 30 days! Here's how to turn that off: https://simonwillison.net/2025/Oct/22/claude-code-logs/
I share most of my coding agent sessions through copying and pasting my terminal session like this: https://gistpreview.github.io/?9b48fd3f8b99a204ba2180af785c8... - via this tool: https://simonwillison.net/2025/Oct/23/claude-code-for-web-vi...
Recently been building new timeline sharing tools that render the session logs directly - here's my Codex CLI one (showing the transcript from when I built it): https://tools.simonwillison.net/codex-timeline?url=https%3A%...
And my similar tool for Claude Code: https://tools.simonwillison.net/claude-code-timeline?url=htt...
What I really want it first class support for this from the coding agent tools themselves. Give me a "share a link to this session" button!
[-]
- CuriouslyC 14 minutes ago
  You can export all agent traces to otel, either directly or via output logging. Then just dump it in clickhouse with metadata such as repo, git user, cwd, etc.
  You can do evals and give agents long term memory with the exact same infrastructure a lot of people already have to manage ops. No need to retool, just use what's available properly.
- neutronicus 25 minutes ago
  Emacs gptel just produces md or org files.
  Of course the agentic capabilities are very much on a roll-your-own-in-elisp basis.
- NeutralForest 1 hour ago
  I think we already have the tools but no the communication between those? Instead of having actions taken and failures as commit messages, you should have wide-events like logs with all the context, failures, tools used, steps taken... Those logs could be used as checkpoints to go back as well and you could refer back to the specific action ID you walked back to when encountering an error.
  In turn, this could all be plain-text and be made accessible, through version control in a repo or in a central logging platform.
  [-]
  - pigpop 1 hour ago
    I'm currently experimenting with trying to do this through documentation and project planning. Two core practices I use are a docs/roadmap/ directory with an ordered list of milestone documents and a /docs/retros/ directory with dated retrospectives for each session. I'm considering adding architectural decision records as a dedicated space for documenting how things evolve. The quote fta could be handled by the ADR records if they included notes on alternatives that were tried and why they didn't work as part of the justification for the decision that was made.
    The trouble with this quickly becomes finding the right ones to include in the current working session. For milestones and retros it's simple: include the current milestone and the last X retros that are relevant but even then you may sometimes want specific information from older retros. With ADR documents you'd have to find the relevant ones somehow and the same goes for any other additional documentation that gets added.
    There is clearly a need for some standardization and learning which techniques work best as well as potential for building a system that makes it easy for both you and the LLM to find the correct information for the current task.
- stacktraceyo 2 hours ago
  I’d like to make something like this but in the background. So I can better search my history of sessions. Basically start creating my own knowledge base of sorts
  [-]
  - simonw 1 hour ago
    Running "rg" in your ~/.claude/ directory is a good starting point, but it's pretty inconvenient without a nicer UI for viewing the results.
    [-]
    - the_mitsuhiko 1 hour ago
      Amp represents threads in the UI and an agent can search and reference its own history. That's for instance also how the handoff feature leverages that functionality. It's an interesting system and I quite like it, but because it's not integrated into either github or git, it is sufficiently awkward that I don't leverage it enough.
    - simonw 1 hour ago
      ... this inspired me to try using a "rg --pre" script to help reformat my JSONL sessions for a better experience. This prototype seems to work reasonably well: https://gist.github.com/simonw/b34ab140438d8ffd9a8b0fd1f8b5a...
      Use it like this:
      cd ~/.claude/projects rg --pre cc_pre.py 'search term here'
- kgwxd 1 hour ago
  > There is, for lack of a better word, value in failures
  Learning? Isn't that what these things are supposedly doing?
  [-]
  - the_mitsuhiko 1 hour ago
    If by "these things" you mean large language models: they are not learning. Famously so, that's part of the problem.
- 0_____0 1 hour ago
  "all my losses is lessons"
kashyapc 1 hour ago
"Because LLMs now not only help me program, I'm starting to rethink my relationship to those machines. I increasingly find it harder not to create parasocial bonds with some of the tools I use. I find this odd and discomforting [...] I have tried to train myself for two years, to think of these models as mere token tumblers, but that reductive view does not work for me any longer."
It's wild to read this bit. Of course, if it quacks like a human, it's hard to resist not quacking back. As the article says, being less reckless with the vocabulary ("agents", "general intelligence", etc) could be one way to to mitigate this.
I appreciate the frank admission that the author struggled for two years. Maybe the balance of spending time with machines vs. fellow primates is out of whack. It feels dystopic to see very smart people being insidiously driven to sleep-walk into "parasocial bonds" with large language models!
It reminds me of the movie Her[1], where the guy falls "madly in love with his laptop" (as the lead character's ex-wife expresses in anguish). The film was way ahead of its time.
[1] https://www.imdb.com/title/tt1798709/
[-]
- the_mitsuhiko 14 minutes ago
  > Maybe the balance of spending time with machines vs. fellow primates is out of whack.
  It's not that simple. Proportionally I spend more time with humans, but if the machine behaves like a human and has the ability to recall, it becomes a human like interaction. From my experience what makes the system "scary" is the ability to recall. I have an agent that recalls conversations that you had with it before, and as a result it changes how you interact with it, and I can see that triggering behaviors in humans that are unhealthy.
  But our inability to name these things properly don't help. I think pretending it to be a machine, on the same level as a coffee maker does help setting the right boundaries.
- mjr00 23 minutes ago
  It helps a lot if you treat LLMs like a computer program instead of a human. It always confuses me when I see shared chats with prompts and interactions that have proper capitalization, punctuation, grammar, etc. I've never had issues getting results I've wanted with much simpler prompts like (looking at my own history here) "python grpc oneof pick field", "mysql group by mmyy of datetime", "python isinstance literal". Basically the same way I would use Google; after all, you just type in "toledo forecast" instead of "What is the weather forecast for the next week in Toledo, Ohio?", don't you?
  There's a lot of black magic and voodoo and assumptions that speaking in proper English with a lot of detailed language helps, and maybe it does with some models, but I suspect most of it is a result of (sub)consciously anthropomorphizing the LLM.
  [-]
  - skydhash 12 minutes ago
    Very much this. My guess is that common words like article have very impact as they just occurs too frequently. If the LLM can generate a book, then your prompt should be like the index of that book instead of the abstract.
- mlinhares 55 minutes ago
  Same here, I'm seeing more and more people getting into these interactions and wondering how long until we have widespread social issues due to these relationships like people have with "influencers" on social networks today.
  It feels like this situation is much more worrisome as you can actually talk to the thing and it responds to you alone, so it definitely feels like there's something there.
CuriouslyC 6 minutes ago
I understand the parasocial bit. I actively dislike the idea of gooning, ERP and AI therapists/companions, but I still notice I'm lonelier and more distant on the days when I'm mostly writing/editing content rather than chatting with my agents to build something. It feels enough like interacting with a human to keep me grounded in a strange way.
mritchie712 57 minutes ago
tacking on to the "New Kind Of" section:
New Kind of QA: One bottle neck I have (as a founder of a b2b saas) is testing changes. We have unit tests, we review PRs, etc. but those don't account for taste. I need to know if the feature feels right to the end user.
One example: we recently changed something about our onboarding flow. I needed to create a fresh team and go thru the onboarding flow dozens of times. It involves adding third party integrations (e.g. Postgres, a CRM, etc.) and each one can behave a little different. The full process can take 5 to 10 minutes.
I want an agent go thru the flow hundreds of times, trying different things (i.e. trying to break it) before I do it myself. There are some obvious things I catch on the first pass that an agent should easily identify and figure out solutions to.
New Kind of "Note to Self": Many of the voice memos, Loom videos, or notes I make (and later email to myself) are feature ideas. These could be 10x better with agents. If there were a local app recording my screen while I talk thru a problem or feature, agents could be picking up all sorts of context that would improve the final note.
Example: You're recording your screen and say "this drop down menu should have an option to drop the cache". An agent could be listening in, capture a screenshot of the menu, find the frontend files / functions related to caching, and trace to the backend endpoints. That single sentence would become a full spec for how to implement the feature.
divbzero 2 hours ago
> My biggest unexpected finding: we’re hitting limits of traditional tools for sharing code. The pull request model on GitHub doesn’t carry enough information to review AI generated code properly — I wish I could see the prompts that led to changes. It’s not just GitHub, it’s also git that is lacking.
The limits seem to be not just in the pull request model on GitHub, but also the conventions around how often and what context gets committed to Git by AI. We already have AGENTS.md (or CLAUDE.md, GEMINI.md, .github/copilot-instructions.md) for repository-level context. More frequent commits and commit-level context could aid in reviewing AI generated code properly.
tolerance 1 hour ago
Armin has some interesting thoughts about the current social climate. There was a point where I even considered sending a cold e-mail and asking him to write more about them. So I’m looking forward to his writing for Dark Thoughts—the separate blog he mentions.
rootnod3 54 minutes ago
Sorry, but why would including the prompt in the pull request make any difference? Explain what you DID in the pull request. If you can't summarize it yourself, it means you didn't review it yourself, so why should I have to do it for you?
JKCalhoun 2 hours ago
Got distracted: love the "WebGL metaballs" header and footer on the site.
rootnod3 50 minutes ago
"I have seen some people be quite successful with this."
Wait until those people hit a snafu and have to debug something in prod after they mindlessly handed their brains and critical thinking to a water-wasting behemoth and atrophied their minds.
EDIT: typo, and yes I see the irony :D
[-]
- wiseowise 29 minutes ago
  > Wait until those people hit a snafu and have to debug something in prod after they mindlessly handed their brains and critical thinking to a water-wasting behemoth and atrophied their minds.
  You've just described typical run of the mill company that has software. LLMs will make it easier to shoot yourself in the foot, but let's not rewrite history as if stackoverflow coders are not a thing.
  [-]
  - rootnod3 13 minutes ago
    Difference: companies are not pushing their employees to use stack overflow. Stack overflow doesn't waste massive amounts of water and energy. Stack overflow does not easily abuse millions of copyrights in a second by scraping without permission.
    [-]
    - rootnod3 5 minutes ago
      Another difference: stack overflow tells you you are wrong or tells you and do your own research or to read the manual (which in a high percentage of cases is the right answer). It doesn't tell you that you are right and proceeds to hallucinate some non-existent flags for some command invocation.
bgwalter 2 hours ago
It is nice that he speaks about some of the downsides as well.
In many respects 2025 was a lost year for programming. People speak about tools, setups and prompts instead of algorithms, applications and architecture.
People who are not convinced are forced to speak against the new bureaucratic madness in the same way that they are forced to speak against EU ChatControl.
I think 2025 was less productive, certainly for open source, except that enthusiasts now pay the Anthropic tax (to use the term that was previously used for Windows being preinstalled on machines).
[-]
- data-ottawa 21 minutes ago
  Maybe it's because I'm a data scientist and not a dedicated programmer/engineer, but setup+tooling gains this year have made 2025 a stellar year for me.
  DS tooling feels like it hit much a needed 2.0 this year. Tools are faster, easier, more reliable, and more reproducible.
  Polars+pyarrow+ibis have replaced most of my pandas usage. UDFs were the thing holding me back from these tools, this year polars hit the sweet spot there and it's been awesome to work with.
  Marimo has made notebooks into apps. They're easier to deploy, and I can use anywidget+llms to build super interactive visualizations. I build a lot of internal tools on this stack now and it actually just works.
  PyMC uses jax under the hood now, so my MCMC workflows are GPU accelerated.
  All this tooling improvement means I can do more, faster, cheaper, and with higher quality.
  I should probably write a blog post on this.
- r2_pilot 1 hour ago
  >>"I think 2025 was less productive"
  I think 2025 is more productive for me based on measurable metrics such as code contribution to my projects, better ability to ingest and act upon information, and generally I appreciate the Anthropic tax because Claude genuinely has been a step-change improvement in my life.
- JimDabell 1 hour ago
  > In many respects 2025 was a lost year for programming. People speak about tools, setups and prompts instead of algorithms, applications and architecture.
  I think the opposite. Natural language is the most significant new programming language in years, and this year has had a tremendous amount of progress in collectively figuring out how to use this new programming language effectively.
  [-]
  - 9rx 9 minutes ago
    > and this year has had a tremendous amount of progress in collectively figuring out how to use this new programming language effectively.
    Hence the lost year. Instead of productively building things, we spent a lot of resources on trying to figure out how to build things.
- grim_io 2 hours ago
  I'm glad there has been a break in endless bikeshedding over TDD, OOP, ORM(partially) and similar.
- sixtyj 1 hour ago
  Absolutely. So much noise.
  "There’s an AI for that" lists 44,172 AI tools for 11,349 tasks. Most of them are probably just wrappers…
  As Cory Doctorow uses enshittification for the internet, for AI/LLM there should be something like a dumbaification.
  It reminds me late 90s when everything was "World Wide Web". :)
  Gold rush it is.
- wiseowise 39 minutes ago
  > algorithms, applications and architecture.
  Which one is that? Endless leetcode madness? Or constant bikeshedding about today's flavor of MVC (MVI, MVVM, MVVMI) or whatever else bullshit people come up with instead of actually shipping?
Rakshath_1 24 minutes ago
[dead]