Claude skills, MCP, and knowing which one to use.

I use Claude constantly for writing insights like this, writing code, for incident analysis, for drafting client deliverables. At some point the repeat instructions piling up in every chat become a productivity tax, and you start looking at the tooling Anthropic actually built to solve that. That is where skills and MCP come in, and where most people get confused about which one to reach for.

Most people building Claude automations run into the same wall: they start with skills, realize they need live data, bolt on an MCP server, and then wonder why the whole thing is fragile and they blow through a million tokens in a short period. The answer is not that skills or MCP are wrong. It is that they solve different problems, and conflating them creates a mess that is hard to debug and harder to maintain.

This is the mental model I use when scoping out agentic AI work, both for internal tooling and for client engagements. Skills handle how Claude thinks and executes. MCP handles what Claude can see and touch. Get that separation right from the start and everything else follows cleanly.

The three primitives

Anthropic gives you three distinct tools for extending Claude's capability. Each occupies a different layer of the stack.

Primitive

What it is

Best analogy

Projects

A persistent knowledge base. Upload documents, brand guidelines, SOPs and Claude reads them as static reference material every time it responds.

A library. Reference only. Nothing executes.

Skills

A structured instruction file (SKILL.md) that tells Claude exactly how to perform a specific repeatable task: the workflow, the output format, the edge cases.

A trained employee with a written SOP. Procedural. Deterministic.

MCP

A protocol layer that plugs Claude into live external systems: calendars, databases, APIs, inboxes. Gives it real-time read/write access to the world.

A set of tools on the employee's desk. Connectivity. Dynamic.

These three layers compose, they do not compete. A well-built agentic system will often use all three: projects hold background context, skills define the execution logic, MCP provides live data access. The mistake is treating them as interchangeable.

Skills in depth

A skill is not a prompt. It is a file-system artifact: a folder containing a SKILL.md file and optionally a references/ directory and a scripts/ directory. Drop the folder into ~/.claude/skills/ and Claude discovers it automatically. No configuration file. No registration step.

                        
                        
                        
                    
Skill folder structure

                # Minimal skill
csv-cleaner/
  SKILL.md

# Full skill with references and scripts
incident-reporter/
  SKILL.md
  references/
    severity-guide.md
    escalation-matrix.md
  scripts/
    parse-logs.py
    calculate-mttr.py
                

The YAML frontmatter block at the top of SKILL.md controls when a skill activates. Claude scans all available skills and scores each description against your prompt. The highest-scoring skill fires. If nothing clears the threshold, Claude responds normally. This means your description is not documentation. It is the activation mechanism, and it needs to be explicit to the point of being almost embarrassingly specific about trigger phrases.

                        
                        
                        
                    
SKILL.md — YAML frontmatter

                ---
name: incident-reporter
description: |
  Generates structured incident reports from raw on-call logs, alert
  data, or freeform incident notes. Use this skill whenever the user
  says "write up the incident", "create a postmortem", "draft the
  incident report", or provides a raw timeline and asks for a
  structured output. Also activates when the user pastes alert
  data and asks for a summary.

  Do NOT use for: general log analysis, SLO reviews, or capacity
  planning tasks — those have separate skills.
---

## Overview
This skill transforms raw incident data into a structured postmortem
following the DURESS framework...
                

Key insight

The description field is the single most load-bearing line in the whole file. If it is vague, the skill never fires. If it is too broad, it hijacks unrelated conversations. The sweet spot is an aggressively specific list of trigger phrases paired with explicit negative boundaries: what the skill should NOT activate for.

When to add scripts

Plain English instructions handle judgment, formatting, and language tasks cleanly. But some tasks require actual computation: calculating running averages, parsing XML, resizing images in a batch. That is when you add a scripts/ directory.

The division is clean in practice: scripts handle computation, instructions handle judgment. Your SKILL.md tells Claude exactly which script to run, with what arguments, and how to interpret the output. The script does the math. Claude does the writing.

Script interface rule

Every script should accept input/output file paths as command-line arguments and never hardcode paths. Include a comment block at the top explaining arguments, expected output, and possible error conditions. Claude reads that documentation to communicate failures back to the user.

MCP in depth

MCP (Model Context Protocol) is a connectivity layer. It has nothing to do with how Claude executes a task. It is purely about what external systems Claude can interact with during that execution. Think of it as giving Claude a set of tools: a calendar tool, a database query tool, an email send tool. Skills tell Claude what to do; MCP tools give Claude the ability to do it.

How the layers compose

Claude (reasoning layer)

reads SKILL.md, decides what to do

↓

MCP server

exposes tools to Claude via protocol

↓

Your calendar

Your database

Your inbox

live systems with real-time read/write access

MCP servers expose a defined set of tools (list_events, create_task, query_db, send_email) and Claude decides when and how to call them. You write an MCP server once and any agent or skill can use it. That is the key architectural advantage at scale: MCP tools are reusable across skills. Your New Relic MCP server that queries alert data can be used by your incident reporter skill, your SLO review skill, and your capacity planning skill without duplicating a line of code.

Running your own MCP server

MCP servers are lightweight HTTP or stdio processes. You can build one in Python or TypeScript in an afternoon. Each tool you expose follows the same pattern: a name, a description Claude uses to decide when to call it, an input schema, and the implementation.

                        
                        
                        
                    
Python MCP server — minimal example

                from mcp.server import Server
from mcp.types import Tool, TextContent
import httpx

app = Server("newrelic-mcp")

@app.list_tools()
async def list_tools():
    return [
        Tool(
            name="query_alerts",
            description="Query open alerts from New Relic for a given service",
            inputSchema={
                "type": "object",
                "properties": {
                    "service_name": {"type": "string"},
                    "severity":    {"type": "string", "enum": ["critical", "warning"]}
                }
            }
        )
    ]

@app.call_tool()
async def call_tool(name, arguments):
    if name == "query_alerts":
        # hit New Relic API, return results
        data = await fetch_newrelic_alerts(**arguments)
        return [TextContent(type="text", text=str(data))]
                

MCP server design principle

Keep each MCP server scoped to a single system or domain: one server for New Relic, one for PagerDuty, one for your internal CMDB. Monolithic MCP servers that talk to five different systems are hard to version, hard to secure, and hard to debug when something breaks.

Skills vs MCP: the honest decision matrix

The question I get most often is: should I build this as a skill or as an MCP server? The answer depends on what the task actually needs.

Use a skill when...

You are codifying a repeatable workflow that Claude already has the data for
The task is primarily about judgment, formatting, or language transformation
You want Claude to follow a precise step-by-step process every time
The input arrives in the conversation: pasted text, uploaded file, typed request
You are automating something you currently type out as instructions 3+ times a week

Use MCP when...

Claude needs to read or write to a live external system during the task
The data changes between sessions and cannot be pasted in each time
Multiple different skills or agents will need access to the same external system
You need Claude to take actions in the world: create tickets, send messages, update records
You are building a tool that other developers will also need to use

Use both when...

The task requires live data and structured execution logic
Example: an incident reporter skill that queries alert data via New Relic MCP
Example: a capacity planning skill that reads current utilization via a metrics MCP
Example: an SLO review skill that pulls current burn rates before generating the report

Use neither when...

The task is genuinely one-off and will not repeat
The output format changes every time, so just prompt Claude directly
You are still figuring out the process. Build the skill after you know what you want.

What this actually costs

This is the part people skip until the bill shows up. The architecture choices you make have a direct token cost, and a bloated MCP setup can burn through budget 10x faster than a clean skill-only workflow for the same task.

Here is a realistic comparison for a single incident report generation, using current API pricing (Haiku 4.5 at $1.00/$5.00 per million tokens):

Clean architecture

Skill description scoring ~500 tokens

SKILL.md instructions read ~800 tokens

MCP: 3 targeted tool calls ~3,000 tokens

Output (postmortem draft) ~1,200 tokens

Total ~$0.007 / run

Bloated MCP setup

Monolithic MCP context dump ~50,000 tokens

Redundant tool calls (no scope) ~12,000 tokens

System prompt repeated in skill ~5,000 tokens

Output (same postmortem draft) ~1,200 tokens

Total ~$0.069 / run

Same output. 10x the cost. At 50 runs a day on a team, that is the difference between $12.50/month and $103.50/month on Haiku alone. Scale that to Sonnet and the numbers hurt more. The discipline of keeping each layer doing only its job is not just architectural cleanliness, it is budget discipline.

The most expensive mistake

Building an MCP server that dumps an entire system's state into context on every call instead of exposing targeted query tools. If your MCP server returns 50k tokens of "here is everything about this service" instead of answering the specific question asked, you are paying for context you do not need. Scope your tools. Return only what the skill actually asks for.

Managing skills at scale

One skill is simple. Ten skills start to conflict. At twenty skills, without deliberate architecture, you will have activation collisions, ambiguous triggers, and skills that fire in the wrong context. This is the multi-skill orchestration problem, and it gets worse the more you build.

Claude's selection mechanism is straightforward: it reads all available skill descriptions, scores each against your request, and activates the highest-scoring match. If two skills have overlapping trigger phrases, the wrong one will sometimes win. If descriptions are too vague, nothing fires. If they are too broad, everything fires.

The conflict pattern to watch for

The most common failure at scale is when two skills share a trigger verb. "Review this" matches both a code review skill and a document review skill. The fix is not to disambiguate the trigger. It is to write the negative boundaries explicitly. Your code review skill should say "Do NOT activate for document, contract, or prose reviews."

Three rules for a conflict-free skill library

01

Non-overlapping territories

Every skill owns a clearly defined domain. The incident reporter handles postmortems. The SLO reviewer handles burn rates and error budgets. The capacity planner handles utilization trends. No overlap. Draw the lines before you write a single SKILL.md.

02

Aggressive negative boundaries

Every skill's YAML description must explicitly list other skills' territories as exclusions. This is not optional at scale. Your incident reporter should say "Do NOT use for SLO reviews, capacity planning, or alert triage." Your SLO reviewer should say "Do NOT use for postmortems or incident timelines."

03

Distinctive trigger language per skill

If you find yourself using the same trigger phrase for two different skills, one of them has a scope problem. "Write up the incident" should match exactly one skill. "Review our SLO burn" should match exactly one skill. If there is ambiguity, resolve it at the description level rather than hoping Claude picks correctly.

State management across sessions

Context windows are finite. For long-running projects like a multi-week observability rollout, an ongoing audit, or a book written chapter by chapter, Claude forgets what happened in the previous session. The standard workaround is a context log pattern baked into the skill itself.

                        
SKILL.md — session continuity instruction

                ## Session continuity

At the start of every session, read context-log.md to understand
what was completed last time, what is pending, and what decisions
were made.

At the end of every session, append a summary to context-log.md
with:
- What was completed this session
- What is blocked and why
- What should happen next
- Any decisions made that affect future sessions

This is the shift handover pattern from incident response. The incoming responder reads the chart, knows exactly what happened, and picks up without losing context. Your skill does the same thing. It reads its own notes from the previous session before it does anything else.

MCP at scale: one server per domain

The equivalent management problem for MCP is different. Instead of activation conflicts, you get tool name collisions, authentication sprawl, and versioning headaches. The cleanest architecture keeps MCP servers scoped tightly.

Scale tier	Skills approach	MCP approach	What breaks without discipline
1-5 skills	Write as you go, minimal architecture needed	1-2 MCP servers, probably personal use	Not much. This is the happy path.
5-15 skills	Map territories before writing. Audit descriptions for overlap.	Dedicated server per external system. Shared auth layer.	Activation collisions, tool name conflicts, debug hell
15+ skills	Formal skills registry. Regular conflict audits. Versioning discipline.	API gateway in front of MCP servers. Access control per skill.	Skills library becomes unmaintainable. MCP becomes a sprawl of undocumented endpoints.

Common mistakes I keep seeing

These are not theoretical. They are patterns I have run into repeatedly, both in my own builds and in client engagements where someone handed me a broken automation to fix.

Mistake 01

Putting your whole system prompt in a skill

A skill is for repeatable task execution, not for personality and global instructions. If you are copying your entire system prompt into SKILL.md, you are using the wrong layer. Global context, tone guidelines, and background knowledge about your company go in a Project. The skill handles the specific task.

Fix: Projects hold who Claude is and what it knows. Skills hold what Claude does. Keep them separate.

Mistake 02

Building an MCP server for data that never changes

If the data is static, or changes once a quarter, you do not need an MCP server. Uploading a reference document to a Project costs nothing and works immediately. I have seen teams build and maintain a full MCP server just to serve a severity matrix that gets updated twice a year.

Fix: MCP is for live, dynamic data. If you can put it in a PDF and upload it, do that instead.

Mistake 03

Versioning skills inside the chat

Iterating on a SKILL.md by pasting new versions into the conversation and asking Claude to compare them. Your skills/ directory is code. It goes in git. You get diffs, rollbacks, and a history of what changed and why. Treating it as a chat artifact means you will lose changes, and you will have no idea why a skill started behaving differently last Tuesday.

Fix: git init your skills directory on day one. Commit every description change with a note on what you were trying to fix.

Mistake 04

One MCP server for everything

Building a single MCP server that connects to New Relic, PagerDuty, Slack, your CMDB, and your ticketing system. It feels efficient until you need to update the PagerDuty auth token and the whole server goes down. Or until you want to give one team access to metrics but not ticketing.

Fix: One server per external system. Deploy them independently. Auth scoped per server.

Mistake 05

Writing the skill before you know the process

Encoding a workflow into a skill before you have run it manually enough times to know all the edge cases. You end up with a skill that works 80% of the time and fails in ways that are hard to diagnose because the instructions do not cover the cases you had not seen yet.

Fix: Run the task manually 10+ times first. Write down every decision point and edge case. Then write the skill.

How to know if your skill is actually working

This is the part of the docs nobody writes. You drop a SKILL.md into your skills directory, send a message that should trigger it, and Claude responds. But did the skill fire, or did Claude just happen to do something similar on its own? Here is how to tell, and how to iterate without breaking everything.

Verifying activation

The most direct check: ask Claude explicitly. After any response that should have used a skill, ask "which skill did you use for that?" Claude will tell you the skill name if one activated, or tell you no skill fired. Do this every time you write a new skill until you have confidence in the trigger language.

The second signal is output structure. If your skill specifies a precise output format (specific sections, a particular heading structure, code blocks in a defined order) and the response matches it exactly, the skill fired. If the output is Claude's default style for that kind of task, it probably did not.

Debug workflow

When a skill does not fire: first check if your trigger phrase exactly matches something in the description. Then check for competing skills that might be outscoring it. Then broaden the description with more trigger phrase variants. When a skill fires too often: add explicit negative boundaries. Start with the most obvious adjacent tasks it should not handle.

Iterating safely

Never edit a live skill description and immediately test it in a real workflow. Keep a test-prompts.md file next to each skill with 5-10 example inputs that should trigger it and 3-5 that should not. Run through that list after every description change before you use it in production. Takes two minutes. Saves a lot of confusion later.

When you are tuning a description that keeps misfiring, change one thing at a time. Add one trigger phrase, test. Add one negative boundary, test. If you rewrite the whole description at once and something breaks, you will not know which change caused it.

                        
                        
                        
                    
test-prompts.md — example structure

                # incident-reporter — test prompts

## Should activate
- "write up last night's incident"
- "create a postmortem for the payment service outage"
- "draft the incident report, here's the timeline: [...]"
- "we had a p1 this morning, can you structure it up"

## Should NOT activate
- "what's our current SLO burn rate"
- "review this runbook and tell me what's missing"
- "analyze these logs for anomalies"
- "can you summarize what happened in standup"
                

Turning your existing prompts into skills in 15 minutes

If you have been using Claude for a while, you already have skills. They just live in your chat history as repeated instructions you paste every time. Here is how to extract them properly.

01

Find your highest-frequency repeated prompt

Scroll back through your Claude chats. What instruction do you paste most often? The one you have practically memorized because you type it three times a week. That is your first skill. Copy the best version of it.

02

Create the folder structure

Make a directory in ~/.claude/skills/ named after the task (use kebab-case, keep it short). Create a SKILL.md file inside it. That is all you need to start.

03

Write the YAML frontmatter first

Before you paste in your prompt, write the name and description fields. List every phrase you actually use to kick off this task. Think about what you type, not what sounds official. "clean up these meeting notes" is better than "meeting transcript summarization".

04

Paste your prompt as the skill body

Your existing prompt goes in as the Overview and Instructions sections. Do not rewrite it yet. Get it working first. You can refine later once you know the activation is solid.

05

Add negative boundaries

Think about the two or three tasks most likely to be confused with this one. Add a "Do NOT use for" section in the description listing them explicitly. This one step prevents most activation problems before they happen.

06

Test it and commit

Send the exact phrase you normally use to kick off the task. Verify the skill fired (ask Claude which skill it used). If it works, commit to git immediately. You now have a versioned, reproducible workflow instead of a prompt in your clipboard.

The whole thing should take 15 minutes for a skill you already know well. The investment pays back on the first time a teammate asks "how do you do that so consistently?" and you can just point them at a file.

The environment split that matters

Skills behave differently depending on where they run, and this affects what you can build.

Claude Code is the command-line interface for developers. Skills here run in your project directory under .claude/skills/ or globally at ~/.claude/skills/. They have full access to the file system, bash, and can execute scripts. This is where you build skills that manipulate files, run Python scripts, or interact with your codebase. MCP servers registered here have access to local tools, local credentials, and local processes. This is the environment for serious automation work.

Claude Desktop (CoWork) is the desktop agent for non-developers. Same SKILL.md format, different execution context. Skills here can interact with your screen and applications through the agent's GUI capabilities, but they cannot run arbitrary bash commands. MCP servers in this context tend to be hosted services rather than local processes.

Practical note

If you are building skills for a team, Claude Code is the right environment. It supports scripting, version control for your skills/ directory, and proper MCP server configuration per project. CoWork is better for non-technical users who need the same skill to work through a GUI without touching a terminal.

What this looks like end to end

Here is a concrete example from an SRE context. We needed an incident reporter that could pull live alert data, structure it, and output a formatted postmortem draft. Three layers:

1

Projects layer: static context

Upload severity definitions, escalation matrix, and team roster as reference documents in a project. Claude reads these as background knowledge in every session.

2

MCP layer: live data access

Build a New Relic MCP server exposing three tools: query_alerts, get_deployment_history, and fetch_error_rate. Claude calls these during execution to pull current data. No copy-pasting alert payloads into the chat.

3

Skills layer: execution logic

Write an incident-reporter SKILL.md that defines the exact postmortem structure, tells Claude when to call each MCP tool, specifies the output format, and handles edge cases like missing data or multi-service incidents. Add a calculate-mttr.py script for the timeline math.

The result: type "write up last night's incident" and Claude pulls the alert data via MCP, calculates the timeline via the script, applies the structure from the skill, and outputs a draft postmortem that follows the team's format. No instructions pasted. No data copied. One command.

Where to start

If you are new to this: start with one skill, no MCP. Find the task you type instructions for most often and encode it. Get comfortable with the SKILL.md format and the activation mechanism. Add MCP when you hit the wall where Claude needs data it cannot see in the conversation.

If you are building for a team: map your skill territories before writing a single file. Define the negative boundaries. Set up a shared ~/.claude/skills/ directory in version control. Build your first MCP server scoped to a single external system. Expand from there.

The primitives are simple. The discipline is in knowing which layer solves which problem and not conflating them when you are under pressure to ship.