Skills Are the Wrong Abstraction for Production Agentic Systems


I keep hearing the same word in product reviews, architecture sessions, standups: skills. PMs spec them. Engineers build libraries of them. Teams spend weeks curating them.

Most of the time, the resulting system is harder to operate, harder to secure, and harder to reason about than what they started with.

A document is not a capability

When teams say “skills” they usually mean one of two things: a document that tells an agent how to behave, or a callable capability the agent can invoke. Most implementations conflate both into a single markdown file that tries to do both jobs at once.

That conflation is where things break. Instructions about how to do something are not the same as the ability to do it. When you bundle them together you get something too vague to be a reliable tool and too operational to be treated as passive knowledge.

In practice this becomes a system prompt with better filing. Token cost is high, context is often irrelevant, and the agent has no mechanism to decide which part of a loaded skill applies right now.

The security boundary no one names

Many skill files contain operational instructions. Not conceptual guidance. Literal invocations: run this CLI, execute this bash script, install this dependency.

That made sense for developer tooling. Claude Code, Cursor, local agents running inside a sandbox the developer controls. Fine.

Production is a different context. A production agent has access to real infrastructure, real data, real APIs. When it can execute shell commands because a skill file said to, you have not built an intelligent assistant. You have built a privileged process with a natural language interface and no audit trail.

The argument against this is usually “we control which skills get loaded.” That is a brittle guarantee. Skills get updated, copied, and misrouted. The blast radius of a confused agent that can run shell commands is categorically different from one constrained to defined tool contracts.

Production agents should have no ambient capability. Every action is a named tool with typed inputs, typed outputs, and defined failure modes.

If you cannot express a capability as a function signature, it does not belong in a production agent.

The PM problem

The skills framing also does something damaging upstream: it gives product managers the wrong unit of design.

When a PM thinks in skills, they think in nouns. What skills does this agent have? What skills does it need? They build a skill inventory and treat it like a feature list.

What they are not thinking about is prompt composition, context strategy, or what the agent needs to know at each decision point. The result is over-specified, under-targeted agents loaded with everything they might ever need, behaving inconsistently because nothing clarifies what applies when.

The better question is not “what skills does the agent have” but “what does it need to know at this step, and where does that come from.” That is a context engineering question. It leads to very different architecture.

What you actually need

The useful parts of the skills concept decompose into three things that should be separate:

Diagram showing a skill file decomposing into three separate concerns: tool contracts, retrieval surfaces, and context composition.

Tool contracts define what an agent can do. Function signatures, typed inputs, typed outputs, defined failure modes. These belong in code with versioning and access control, not in markdown files.

Retrieval surfaces provide context on demand. When an agent needs domain knowledge, it queries for it based on current task state, not because a skill file was preloaded at startup.

Context composition is the actual design problem. What information is in scope at each step? What gets dropped between agents? These are architecture decisions, not content decisions.

When you separate these, the skill library becomes unnecessary. You have tools the agent can call, knowledge it can retrieve, and a context strategy governing what is in scope. That is more secure and more predictable than a skill library.

Writing