Claude Skills are lazy-loaded context

Anthropic announced a new feature yesterday: ‘Skills’, a new method to add ‘abilities’ to their models. I think it’s a step toward something more reasonable than the Model Context Protocol.

MCP can easily dump tens of thousands of tokens into your context—GitHub’s official MCP is famous for doing so. Combine that with the Lost in the Middle problem, and it’s easy to see why many have struggled to effectively integrate MCPs into their workflows.

In my opinion, the ‘Skills pattern’ is primed to alleviate that issue through a ‘progressive disclosure’ approach. (Which is a fancy way of saying they’re files that are (essentially) lazy-loaded by the LLM.)

The problem of context

The Transformer architecture has a limited length of tokens it can process—specifically, the number of vectors the Attention mechanism / MLP blocks can handle. This length is what we call the ‘context window’: how much the model ‘sees’ at-once.

If we have a limited window, we obviously want as much relevant information in that as possible. Anything extraneous has serious impact on the model’s ability to generate useful completions.

Model Context Protocols are famously verbose: the functions and schema definitions can be significant. (That’s before tool calls/responses are pushed into the window.) State-of-the-art models can cope with this by using huge context windows, but that doesn’t mean the response is optimal (or even usable).

For most MCPs, the signal/noise ratio can be quite low. For example: an MCP call to ‘find all users matching [filters]’ will return the users you’re looking for—but you’re only going to use a subset of those users. Doesn’t matter: all of them will be taking up space in your context window.

Solving the issue

The main value-add for Skills is their ability to be loaded only when the LLM deems it relevant — if you’re not parsing Excel files, no need to have the tool’s functions/schema taking up space in the context window.

The engineering article goes into great detail describing the ’three-level’ approach—the reality is much simpler: each Skill’s name and description are preloaded at startup in Claude’s context. Just like a tool, the LLM can choose to ‘call’ (i.e. read the file) the Skill, which will just load the rest of the file’s content. The 3rd level is simply referencing other files in that body content.

This ‘preloaded instructions’ pattern is essentially what already exists in Rules (à la Cursor rules, for example), only more context-friendly by lazy-loading the body content of each skill file.

Their power is in their use with an LLM that has access to a filesystem (i.e. Claude) and CLI tools: you can provide the determinism of a tool call without all the boilerplate required to do so. Describe your task, include a small code snippet that performs some related operation, and you’re done: the LLM can reliably invoke that tool when needed.

Tips for using skills

Here are a few things I’ve found in my (brief) time playing with these:

Apply when needed — LLMs are really good at most everyday things: you probably don’t need a skill for your task
The name and description matter a lot — just like tools, the metadata made available to the LLM is critical for ensuring reliable use of the Skill
Avoid premature grouping — prefer more, smaller skill files than fewer, larger ones, to avoid extraneous content in the context window
Ask the LLM to create/update Skills when things go wrong — if responses get wonky, ask it to reflect on past responses and update the used Skills accordingly

The problem of context#

Solving the issue#

Tips for using skills#

The problem of context

Solving the issue

Tips for using skills