I find this impressive: in my experience, codex-rs loves to add tests even when not prompted. Of course, it’s a bit of a crap shoot as to whether the test tests useful behavior.
(My favorite so far: it created an empty file in /home/whatever and added a test to verify that some code it wrote would indeed fail when tested on this empty input and that it would fail with the correct error message. Never mind that this covered approximately none of the desired behavior and that the test would, of course, fail on any other system.
That would be really interesting. I doubt it's the case, actually probably the opposite? The harnesses seem very happy to write extensive test suits, without me having to ask much.
Not ready for production yet but ive been working on https://wingman.actor for quite a while. Its a golang based portable agent runtime with minimal dependencies.
I think the TypeScript ecosystem is more suitable for this.
I do not think Rust is a bad language. But the agent ecosystem changes very quickly, and in Rust, assembling and reshaping agent workflows is difficult.
Many people prefer Rust, and I understand why. It is a genuinely excellent language, and “Rust is a great language” is a strong message that attracts many developers. But as long as lifetimes exist, I think it will remain difficult.
The lifetime system assumes, in some sense, that humans can fully predict the lifecycle of values and resources. I am not sure that is truly possible in all domains. I am also not sure whether that model is linguistically suitable for the agent ecosystem.
In agent systems, requirements change constantly. Tools change, workflows change, providers change, schemas change, and failure policies change. In that kind of environment, I am not sure Rust is the right fit.
I like Rust a lot, and it is a language I genuinely want to learn. But I am not sure that applying Rust to everything is really the right answer.
I think Rust makes a lot of sense in relatively stable infrastructure ecosystems: operating systems, runtimes, sandboxes, and core low-level layers. But agent code usually requires high-level abstraction and rapid workflow composition. Doing that in Rust takes a tremendous amount of time.
Why do agent systems change more than other things? Maybe while were here: What even is an agent system anyway? Does one work on agent systems as the final product, or is the agent system what you work with to make something else?
The definition of “agent” has changed quite a bit, even in ACL papers and other academic work.
Looking at recent examples, the practical boundary seems to be whether an LLM uses tools. In some 2023 papers, certain pipeline-based systems were still referred to as agents. More recently, the term seems to mean something looser but more action-oriented: a system that understands a goal, uses tool calls, selects actions, and executes them.
In other words, there is still no fully settled engineering definition of what an agent is. I am not an expert or a graduate student; I mostly work as a subcontractor who gets hired by university professors to reproduce specific paper metrics.
In general, every system changes frequently in its early stage. Agent systems are no different. The workflows keep changing because the field does not yet have stable, openly accepted standards for AI development.
That is also why Claude, Codex, and others are fighting to define the standard. I think the term "harness," which Anthropic has been popularizing recently, is part of the same trend. By harness I mean the execution layer around the model call itself: context management, tool dispatch, retry and fallback policies, eval loops. That layer is still actively shifting. The naming is not settled, the responsibilities are not settled, and the boundaries between the harness and the model are not settled either. Each provider is drawing those lines a little differently right now.
So my view is this: agent systems change frequently because the definition differs from person to person, the field keeps updating rapidly, and there is no engineering standard that has been firmly established yet.
Even the I/O standard itself is not really settled.
[1]: https://github.com/withastro/flue/blob/8fdf8e0e9df5bd33c3120...
[2]: https://github.com/search?q=repo%3Awithastro%2Fflue+test+pat...
(My favorite so far: it created an empty file in /home/whatever and added a test to verify that some code it wrote would indeed fail when tested on this empty input and that it would fail with the correct error message. Never mind that this covered approximately none of the desired behavior and that the test would, of course, fail on any other system.
Go/Rust way better choices. Besides, if it’s all vibe coded, it shouldn’t matter for the author
I do not think Rust is a bad language. But the agent ecosystem changes very quickly, and in Rust, assembling and reshaping agent workflows is difficult.
Many people prefer Rust, and I understand why. It is a genuinely excellent language, and “Rust is a great language” is a strong message that attracts many developers. But as long as lifetimes exist, I think it will remain difficult.
The lifetime system assumes, in some sense, that humans can fully predict the lifecycle of values and resources. I am not sure that is truly possible in all domains. I am also not sure whether that model is linguistically suitable for the agent ecosystem.
In agent systems, requirements change constantly. Tools change, workflows change, providers change, schemas change, and failure policies change. In that kind of environment, I am not sure Rust is the right fit.
I like Rust a lot, and it is a language I genuinely want to learn. But I am not sure that applying Rust to everything is really the right answer.
I think Rust makes a lot of sense in relatively stable infrastructure ecosystems: operating systems, runtimes, sandboxes, and core low-level layers. But agent code usually requires high-level abstraction and rapid workflow composition. Doing that in Rust takes a tremendous amount of time.
Looking at recent examples, the practical boundary seems to be whether an LLM uses tools. In some 2023 papers, certain pipeline-based systems were still referred to as agents. More recently, the term seems to mean something looser but more action-oriented: a system that understands a goal, uses tool calls, selects actions, and executes them.
In other words, there is still no fully settled engineering definition of what an agent is. I am not an expert or a graduate student; I mostly work as a subcontractor who gets hired by university professors to reproduce specific paper metrics.
In general, every system changes frequently in its early stage. Agent systems are no different. The workflows keep changing because the field does not yet have stable, openly accepted standards for AI development.
That is also why Claude, Codex, and others are fighting to define the standard. I think the term "harness," which Anthropic has been popularizing recently, is part of the same trend. By harness I mean the execution layer around the model call itself: context management, tool dispatch, retry and fallback policies, eval loops. That layer is still actively shifting. The naming is not settled, the responsibilities are not settled, and the boundaries between the harness and the model are not settled either. Each provider is drawing those lines a little differently right now.
So my view is this: agent systems change frequently because the definition differs from person to person, the field keeps updating rapidly, and there is no engineering standard that has been firmly established yet.
Even the I/O standard itself is not really settled.
[0] https://mastra.ai/
Go, C#, what have you.
Nah, thank god we have javascript
I love C# too.