What changed is that the models got good enough to follow written instructions reliably. A skill is just a tested workflow in markdown that the agent reads and follows instead of improvising. You can also bundle scripts that the agent runs during the workflow, which covers what most people use lightweight MCP servers for, except the agent can read the script source and extend it.
Karpathy talks about an "economy of agents" and says we should stop writing HTML docs for humans and start writing markdown docs for agents [1]. Anthropic just shipped a skill-creator that benchmarks whether a skill still works after model updates. There are already tens of thousands of community skills on GitHub.
Distribution still feels early. Most useful skills are tiny. A markdown file, maybe one script. Useful enough to keep reusing, but not something anyone turns into a proper GitHub repo with a README and install instructions. So they stay on one machine.
I have been writing skills for my own agents for a while and I keep running into this. The format works. Moving them between machines or handing one to someone else does not.
Curious if others are hitting the same wall or if there are approaches I am missing.
[1] https://www.youtube.com/watch?v=kwSVtQ7dziU (Karpathy on No Briars podcast, skills discussion around 1:03:40)
But in practice you end up with skills that depend on other skills, or a skill that assumes specific instructions are already loaded, conflicting skills, versioning and supply chain issues - and suddenly you need dependency resolution.
I've built a package-manager approach for this (APM - github.com/microsoft/apm) and the thing that surprised me most was how quickly even small teams end up with config sprawl - and how much a manifest that travels with the project helps.
The "too small for a repo" thing is real, but one pattern that works is having a monorepo per dev team or org with all skills and building jointly over there.
The "cognitive load" problem latand6 raised for reviewing every skill is real. That's where you integrate security and quality gates – treat these skills like any other software artifact. You wouldn't manually review every line of every dependency, so automate the validation here too.
I do agree with Jacques Ellul in Technological Society that technique precedes science, and that's certainly the case with LLMs; however, this whole industry waves off rigorous validation in favor of personal anecdotes ("it feels more productive to me!" "they didn't study after Opus 4.5 was released").
That’s what I do for each ticket.