How to Build a Production-Ready Claude Code Skill

1.

The Claude Code Skill ecosystem is expanding rapidly. As of March 2026, the anthropics/skills repository reached over 87,000 stars on GitHub and more people are building and sharing Skills every week.

How can we build a Skill from scratch in a structured way? This article walks through designing, building, and distributing a Skill from scratch. I’ll use my own experience shipping an e-commerce review Skill (Link) as a running example throughout.

2. What Is a Claude Skill?

A Claude skill is a set of instructions that teaches Claude how to handle specific tasks or workflows. Skills are one of the most powerful ways to customize Claude to your specific needs.

Skills are built around progressive disclosure. Claude fetches information in three stages:

Metadata (name + description): Always in Claude’s context. About 100 tokens. Claude decides whether to load a Skill based on this alone.
SKILL.md body: Loaded only when triggered.
Bundled resources (scripts/, references/, assets/): Loaded on demand when needed.

With this structure, you can install many Skills without blowing up the context window. If you keep copy-pasting the same long prompt, just turn it into a Skill.

3. Skills vs MCP vs Subagents

Before building a Skill, let me walk you through how Skills, MCP, and Subagents are different, so you can make sure a Skill is the right choice.

Skills teach Claude how to behave — analysis workflows, coding standards, brand guidelines.
MCP servers give Claude new tools — sending a Slack message, querying a database.
Subagents let Claude run independent work in a separate context.

Image Generated with Gemini

An analogy that helped me: MCP is the kitchen — knives, pots, ingredients. A Skill is the recipe that tells you how to use them. You can combine them. Sentry’s code review Skill, for example, defines the PR analysis workflow in a Skill and fetches error data via MCP. But in many cases a Skill alone is enough to start.

4. Planning and Design

I jumped straight into writing SKILL.md the first time and ran into problems. If the description is not well designed, the Skill will not even trigger. I’d say spend time on design before you write the prompts or code.

4a. Start with Use Cases

The first thing to do is define 2–3 concrete use cases. Not “a helpful Skill” in the abstract, but actual repetitive work that you observe in practice.

Let me share my own example. I noticed that many colleagues and I were repeating the same monthly and quarterly business reviews. In e-commerce and retail, the process of breaking down KPIs tends to follow a similar pattern.

That was the starting point. Instead of building a generic ‘data analysis Skill,’ I defined it like this: “A Skill that takes order CSV data, decomposes KPIs into a tree, summarizes findings with priorities, and generates a concrete action plan.”

Here, it is important to imagine how users will actually phrase their requests:

“run a review of my store using this orders.csv”
“analyze last 90 days of sales data, break down why revenue dropped”
“compare Q3 vs Q4, find the top 3 things I should fix”

When you write concrete prompts like these first, the shape of the Skill becomes clear. The input is CSV. The analysis axis is KPI decomposition. The output is a review report and action plan. The user is not a data scientist — they are someone running a business and they want to know what to do next.

That level of detail shapes everything else: Skill name, description, file formats, output format.

Questions to ask when defining use cases:

Who will use it?
In what situation?
How will they phrase their request?
What is the input?
What is the expected output?

4b. YAML Frontmatter

Once use cases are clear, write the name and description. It decides whether your Skill actually triggers.

As I mentioned earlier, Claude only sees the metadata to decide which Skill to load. When a user request comes in, Claude decides which Skills to load based on this metadata alone. If the description is vague, Claude will never reach the Skill — no matter how good the instructions in the body are.

To make things trickier, Claude tends to handle simple tasks on its own without consulting Skills. It defaults to not triggering. So your description needs to be specific enough that Claude recognizes “this is a job for the Skill, not for me.”

So the description needs to be somewhat “pushy.” Here is what I mean.

# Bad — too vague. Claude does not know when to trigger.
name: data-helper
description: Helps with data tasks

# Good — specific trigger conditions, slightly “pushy”
name: sales-data-analyzer
description: >
Analyze sales/revenue CSV and Excel files to find patterns,
calculate metrics, and create visualizations. Use when user
mentions sales data, revenue analysis, profit margins, churn,
ad spend, or asks to find patterns in business metrics.
Also trigger when user uploads xlsx/csv with financial or
transactional column headers.

The most important thing is being explicit about what the Skill does and what input it expects — “Analyze sales/revenue CSV and Excel files” leaves no ambiguity. After that, list the trigger keywords. Go back to the use case prompts you wrote in 4a and pull out the words users actually say: sales data, revenue analysis, profit margins, churn. Finally, think about the cases where the user doesn’t mention your Skill by name. “Also trigger when user uploads xlsx/csv with financial or transactional column headers” catches those silent matches.

The constraints are: name up to 64 characters, description up to 1,024 characters (per the Agent Skills API spec). You have room, but prioritize information that directly affects triggering.

5. Implementation Patterns

Once the design is set, let’s implement. First, understand the file structure, then pick the right pattern.

5a. File Structure

The physical structure of a Skill is simple:

my-skill/
├── SKILL.md # Required. YAML frontmatter + Markdown instructions
├── scripts/ # Optional. Python/JS for deterministic processing
│ ├── analyzer.py
│ └── validator.js
├── references/ # Optional. Loaded by Claude as needed
│ ├── advanced-config.md
│ └── error-patterns.md
└── assets/ # Optional. Templates, fonts, icons, etc.
└── report-template.docx

Only SKILL.md is required. That alone makes a working Skill. Try to keep SKILL.md under 500 lines. If it gets longer, move content into the references/ directory and tell Claude in SKILL.md where to look. Claude will not read reference files unless you point it there.

For Skills that branch by domain, the variant approach works well:

cloud-deploy/
├── SKILL.md # Shared workflow + selection logic
└── references/
├── aws.md
├── gcp.md
└── azure.md

Claude reads only the relevant reference file based on the user’s context.

5b. Pattern A: Prompt-Only

The simplest pattern. Just Markdown instructions in SKILL.md, no scripts.

Good for: brand guidelines, coding standards, review checklists, commit message formatting, writing style enforcement.

When to use: If Claude’s language ability and judgment are enough for the task, use this pattern.

Here is a compact example:

—
name: commit-message-formatter
description: >
Format git commit messages using Conventional Commits.
Use when user mentions commit, git message, or asks to
format/write a commit message.
—

# Commit Message Formatter

Format all commit messages following Conventional Commits 1.0.0.

## Format
():

## Rules
– Imperative mood, lowercase, no period, max 72 chars
– Breaking changes: add `!` after type/scope

## Example
Input: “added user auth with JWT”
Output: `feat(auth): implement JWT-based authentication`

That’s it. No scripts, no dependencies. If Claude’s judgment is enough for the task, this is all you need.

5c. Pattern B: Prompt + Scripts

Markdown instructions plus executable code in the scripts/ directory.

Good for: data transformation/validation, PDF/Excel/image processing, template-based document generation, numerical reports.

Supported languages: Python and JavaScript/Node.js. Here is an example structure:

data-analysis-skill/
├── SKILL.md
└── scripts/
├── analyze.py # Main analysis logic
└── validate_schema.js # Input data validation

In the SKILL.md, you specify when to call each script:

## Workflow

1. User uploads a CSV or Excel file
2. Run `scripts/validate_schema.js` to check column structure
3. If validation passes, run `scripts/analyze.py` with the file path
4. Present results with visualizations
5. If validation fails, ask user to clarify column mapping

The SKILL.md defines the “when and why.” The scripts handle the “how.”

5d. Pattern C: Skill + MCP / Subagent

This pattern calls MCP servers or Subagents from within the Skill’s workflow. Good for workflows involving external services — think create issue → create branch → fix code → open PR. More moving parts mean more things to debug, so I’d recommend getting comfortable with Pattern A or B first.

Choosing the Right Pattern

If you are not sure which pattern to pick, follow this order:

Need real-time communication with external APIs? → Yes → Pattern C
Need deterministic processing like calculations, validation, or file conversion? → Yes → Pattern B
Claude’s language ability and judgment handle it alone? → Yes → Pattern A

When in doubt, start with Pattern A. It is easy to add scripts later and evolve into Pattern B. But simplifying an overly complex Skill is harder.

6. Testing

Writing the SKILL.md is not the end. What makes a Skill good is how much you test and iterate.

6a. Writing Test Prompts

“Testing” here does not mean unit tests. It means throwing real prompts at the Skill and checking whether it behaves correctly.

The one rule for test prompts: write them the way real users actually talk.

# Good test prompt (realistic)
“ok so my boss just sent me this XLSX file (its in my downloads,
called something like ‘Q4 sales final FINAL v2.xlsx’) and she wants
me to add a column that shows the profit margin as a percentage.
The revenue is in column C and costs are in column D i think”

# Bad test prompt (too clean)
“Please analyze the sales data in the uploaded Excel file
and add a profit margin column”

The problem with clean test prompts is that they do not reflect reality. Real users make typos, use casual abbreviations, and forget file names. A Skill tested only with clean prompts will break in unexpected ways in production.

6b. The Iteration Loop

The basic testing loop is simple:

Run the Skill with test prompts
Evaluate whether the output matches what you defined as good output in 4a
Fix the SKILL.md if needed
Go back to 1

You can run this loop manually, but Anthropic’s skill-creator can help a lot. It semi-automates test case generation, execution, and review. It uses a train/test split for evaluation and lets you review outputs in an HTML viewer.

6c. Optimizing the Description

As you test, you may find the Skill works well when triggered but doesn’t trigger often enough. The skill-creator has a built-in optimization loop for this: it splits test cases 60/40 into train/test, measures trigger rate, generates improved descriptions, and picks the best one by test score.

One thing I learned: Claude rarely triggers Skills for short, simple requests. So make sure your test set includes prompts with enough complexity.

7. Distribution

Once your Skill is ready, you need to get it to users. The best method depends on whether it is just for you, your team, or everyone.

Getting Your Skill to Users

For most people, two methods cover everything:

ZIP upload (claude.ai): ZIP the Skill folder and upload via Settings > Customize > Skills. One gotcha — the ZIP must contain the folder itself at the root, not just the contents.

.claude/skills/ directory (Claude Code): Place the Skill in your project repo under .claude/skills/. When teammates clone the repo, everyone gets the same Skill.

Beyond these, there are more options as your distribution needs grow: the Plugin Marketplace for open-source distribution, the Anthropic Official Marketplace for broader reach, Vercel’s npx skills add for cross-agent installs, and the Skills API for programmatic management. I won’t go into detail on each here — the docs cover them well.

Before sharing, check three things: the ZIP has the folder at root (not just contents), the frontmatter has both name and description within the character limits, and there are no hardcoded API keys.

And one more thing — bump the version field when you update. Auto-update won’t kick in otherwise. Treat user feedback like “it didn’t trigger on this prompt” as new test cases. The iteration loop from Section 6 doesn’t stop at launch.

Conclusion

A Skill is a reusable prompt with structure. You package what you know about a domain into something others can install and run.

The flow: decide whether you need a Skill, MCP, or Subagent. Design from use cases and write a description that actually triggers. Pick the simplest pattern that works. Test with messy, realistic prompts. Ship it and keep iterating.

Skills are still new and there is plenty of room. If you keep doing the same analysis, the same review, the same formatting work over and over, that repetition is your Skill waiting to be built.

If you have questions or want to share what you built, find me on LinkedIn.

What's Hot

5 AI features coming to your next car

Recently Unearthed 1983 Ford Probe IV Concept Hits Bring a Trailer

Cancer Risks, Signs, Symptoms, Tests, Treatments, and More

Follow the AI Footpaths | Towards Data Science

Mistral AI Releases Mistral Small 4: A 119B-Parameter MoE Model that Unifies Instruct, Reasoning, and Multimodal Workloads

Hallucinations in LLMs Are Not a Bug in the Data

US Treasury publishes AI risk Guidebook for financial institutions

Enterprise AI factories are here and NTT DATA is building them with NVIDIA

Bayesian Thinking for People Who Hated Statistics