Skip to content

Agent Skills Best Practices

What it is

An agent skill is a self-contained, named behaviour module that an autonomous agent can discover, trigger, and execute. Skills define what to do (instructions), when to do it (triggers), what tools are available, and what permissions are required. Well-authored skills are the difference between a reliable agent workflow and a chaotic, unpredictable one.

This document covers skill authoring for the two main runtimes used in this stack:

  • Claude Code skills (.claude/skills/ directory, Markdown + YAML frontmatter)
  • OpenClaw skills (skills/ directory, YAML + optional Markdown instructions)

The principles apply equally to both.

What problem it solves

Poorly authored skills: - Trigger on the wrong input (false positives waste tokens and cause incorrect actions) - Fail to trigger when needed (false negatives lose automation value) - Bloat the context window with unnecessary preamble - Produce inconsistent outputs because instructions are ambiguous - Silently fail without surfacing the right error to the operator

Best-practice structure prevents all of these.

Where it fits in the stack

Pattern. Governs prompt/skill engineering for all autonomous agent workflows. Applies at design time (when authoring skills) and at runtime (when the agent selects and executes them).

Skill anatomy

Claude Code skill (Markdown)

---
name: commit
description: Create a git commit with a well-structured message. Trigger when the user
             says "commit", "save changes", "/commit", or asks to finalise staged changes.
             Do NOT trigger for general git questions or status checks.
---

# Commit Skill

## When to use
Invoke this skill when the user asks to commit staged changes. Do not invoke if
the user is asking about git status, history, or other operations.

## Steps
1. Run `git status` to confirm there are staged changes
2. Run `git diff --staged` to understand what is changing
3. Draft a commit message: imperative mood, 72-char subject, optional body
4. Run the commit with the drafted message
5. Confirm success with `git log -1 --oneline`

## Output
Respond with the one-line commit SHA and message. No preamble.

OpenClaw skill (YAML)

name: paperless-intake
version: 1.0.0
description: Extract key fields from a forwarded document and file it in Paperless-ngx
trigger:
  keywords: ["file this", "intake document", "add to paperless"]
  file_types: ["application/pdf", "image/jpeg", "image/png"]
permissions:
  channels: ["telegram:personal", "whatsapp:personal"]
  users: ["owner"]
tools:
  - paperless_api
  - ocr_extract
instructions: |
  You are a document intake assistant. When the user forwards a document:
  1. Extract: document type, date, correspondent, and any monetary amounts
  2. POST to Paperless-ngx with the correct tag based on document type
  3. Reply with: "Filed as [Type] from [Correspondent], [Date]. ID: [paperless_id]"
  Never ask follow-up questions. If fields are missing, use "unknown" and file anyway.

Trigger design

The trigger is the most critical part of a skill. A bad trigger causes the skill to run when it should not (false positive) or miss runs when it should fire (false negative).

Good trigger patterns

Approach When to use
Explicit keywords User clearly uses a phrase that maps to one skill ("commit", "file this")
Slash commands Deterministic invocation; user types /commit, /deploy
Schedule (cron) Time-triggered tasks; no ambiguity
File type + keyword Document intake; only fires when a PDF is forwarded
Webhook event type GitHub PR opened, Paperless document added

Trigger anti-patterns

# BAD — too broad; fires on any question
trigger:
  keywords: ["help", "what", "how"]

# BAD — description is the full instructions; agent reads too much to decide
description: |
  This skill handles all document management tasks including filing, searching,
  retrieving, updating, and deleting documents from Paperless-ngx ...

# GOOD — tight, discriminative description
description: >
  File a forwarded document into Paperless-ngx. Trigger ONLY when the user
  forwards a file and says "file this" or "intake". Do not trigger for searches.

Rule: The description field is the routing signal. It must be specific enough that another skill with similar semantics will NOT also fire.

Instructions design

Lean instructions

Instructions should be step-by-step, deterministic, and minimal. Every extra word in the instruction costs context tokens at every invocation.

# BAD — verbose, repeats context the model already has
## Instructions
You are a helpful assistant that will carefully and thoughtfully help the user
commit their changes to git. Git is a version control system. When you commit...
[200 more words of context]

# GOOD — direct, step-by-step, terminates with expected output
## Steps
1. `git status` — confirm staged changes exist; abort if none
2. `git diff --staged` — identify what changed
3. Draft: imperative subject (<72 chars) + optional body
4. `git commit -m "$(cat <<'EOF'\n{message}\nEOF)"`
5. Output: "{sha} {subject}"

Progressive disclosure

Put the core steps in the skill itself. Reference external docs only for edge cases:

## Steps
1. [core happy path — 3–6 steps]
2. [one sentence for each error case]

## Edge cases
See: ../../playbooks/dev-workflow-ai-assisted.md for conflict resolution

Deterministic scripts over open-ended instructions

For predictable tasks, give the exact command to run rather than describing the goal:

# BAD — model decides how to do it
instructions: "Push the changes to the remote repository"

# GOOD — exact command with escape hatches
instructions: |
  Run: git push -u origin HEAD
  If the push fails with "rejected":
    1. Run git fetch && git rebase origin/main
    2. Retry the push once
    3. If still failing, report the error and stop — do not force push

Permission model

Always declare the minimum permissions a skill needs:

permissions:
  channels: ["telegram:personal"]    # not ["*"]
  users: ["owner"]                   # not ["*"]
  tools:
    - paperless_api                  # only what is needed
    # NOT: shell_exec, file_system, http_any

For Claude Code skills, document any dangerous operations with a warning:

!!! danger "Destructive operation"
    This skill can delete files. Always confirm with the user before running
    with `--confirm` flag omitted.

Skill taxonomy for this stack

Organise skills into layers to avoid overlap and make discovery predictably:

Layer Examples Notes
Intake paperless-intake, email-parse Receive and classify inputs
Retrieve vikunja-tasks, paperless-search Read from services
Transform summarise, extract-fields, translate Process content
Write create-task, file-document, send-message Mutate state
Report daily-digest, weekly-summary Aggregate and present
Dev commit, pr-review, deploy Software engineering actions
Maintenance knowledge-base-update, cleanup Repo/system maintenance

Quality assurance checklist

Use this checklist during design, execution, and performance reviews to ensure high-quality agent skills.

Design review

  • [ ] Trigger specificity: Trigger description is specific enough to avoid false positives.
  • [ ] Trigger coverage: Trigger description is broad enough to catch all legitimate invocations.
  • [ ] Actionable steps: Instructions are step-by-step, not open-ended goals.
  • [ ] Principle of least privilege: Permissions are minimised to what is strictly needed.
  • [ ] Error handling: Edge cases (missing inputs, API errors, ambiguous requests) are handled explicitly.

Execution review

  • [ ] Positive test: Invoke with an exact-match trigger phrase → skill fires correctly.
  • [ ] Negative test: Invoke with a near-miss phrase that should NOT trigger → skill does not fire.
  • [ ] Output validation: Run the skill on a test input → output matches expected format.
  • [ ] Resilience test: Introduce a simulated API error → skill handles it gracefully and reports the error.

Performance review

  • [ ] Token efficiency: Instruction length is under 300 tokens (measure with tiktoken).
  • [ ] Context minimization: No redundant context that the model already knows from its system prompt.
  • [ ] Consistency: Output format is consistent across 3+ separate test runs.

Common skill patterns

Document intake skill

name: document-intake
trigger:
  keywords: ["file this", "add to paperless", "intake"]
  file_types: ["application/pdf"]
instructions: |
  Extract: type, date (ISO-8601), correspondent, amount (or null).
  POST to POST /api/documents/ with tag derived from type.
  Reply: "Filed [type] from [correspondent] on [date]. ID: {id}"
  On API error: reply with the HTTP status and stop.

Scheduled digest skill

name: morning-brief
trigger:
  schedule: "0 7 * * 1-5"  # weekdays 07:00
instructions: |
  1. GET /api/tasks?status=open from Vikunja → list today's tasks
  2. GET today's weather from wttr.in/London?format=3
  3. Respond with: "**Today**: {weather}\n**Tasks ({n})**: {bullet_list}"
  Under 100 words. No preamble.

Code review skill (Claude Code)

---
name: review-pr
description: Review a pull request for correctness, style, and potential regressions.
             Trigger when user says "review this PR", "review PR #N", or "/review-pr".
---
## Steps
1. `gh pr diff {number}` — fetch the diff
2. Identify: logic errors, missing tests, style violations, security issues
3. Output structured review: **Summary** (2 sentences) + **Issues** (bulleted, max 5)
4. For each issue: file:line, severity (critical/warning/info), brief explanation

Skill versioning and maintenance

# Include version and last-updated in all skills
name: daily-digest
version: 2.1.0
last_updated: "2026-03-21"
changelog:
  "2.1.0": "Added weather to morning brief"
  "2.0.0": "Migrated from Notion to Vikunja for task retrieval"
  "1.0.0": "Initial version"

Review skills when: - The tool they invoke changes its API - The model runtime changes (new models may handle instructions differently) - You observe consistent false positives or false negatives - The task requirements change

Comparing skill systems

Runtime Skill format Trigger types Tool binding
Claude Code Markdown + YAML frontmatter Keywords, slash commands Any tool available in session
OpenClaw YAML + Markdown instructions Keywords, schedule, webhooks, file types Declared tool list
Anthropic Agent SDK Python + @skill decorator Programmatic Function calls
n8n Visual node workflow Webhooks, schedules, events n8n node library

Strengths

  • Clear trigger guidance prevents the most common failure mode (wrong skill fires)
  • Step-by-step instructions eliminate ambiguity in execution
  • Minimised permissions reduce blast radius of agent errors
  • Versioning enables safe iteration without breaking existing automation

Limitations

  • Strict conventions add overhead for very simple, one-off tasks
  • Skill behaviour depends on the underlying model; a skill tested on Claude 3.5 may perform differently on Qwen 2.5
  • Trigger keyword lists need maintenance as vocabulary drifts
  • Complex branching logic is better expressed as code than as skill instructions

When to use it

  • When building a growing library of reusable skills across one or more agent runtimes
  • When skill false-positives or context bloat are recurring problems
  • When multiple people (or agents) will be writing and sharing skills
  • When automation reliability is more important than development speed

When not to use it

  • For one-off tasks that will never be reused
  • When no autonomous skill routing is in use (pure API calls don't need skills)
  • When the task logic is inherently complex enough to require code, not instructions

Sources / References

Contribution Metadata

  • Last reviewed: 2026-03-21
  • Confidence: high