Most developers use Claude Code interactively — you type a request, Claude responds, you review. That’s powerful. But there’s a second mode that most developers haven’t unlocked: agents that run autonomously, make decisions, and complete tasks without you staying at your keyboard.
This guide covers how Claude Code agents differ from skills, how autonomous agents are configured, and practical patterns for work that benefits from multi-agent coordination.
What Makes Something an “Agent” vs a Skill?
The word “agent” gets used loosely in AI tooling. For Claude Code specifically, the distinction is concrete:
A skill is a prompt template invoked by a human. You type /security-audit, Claude Code runs the audit, you review the output. The human is in the loop throughout.
An agent is a configuration that defines how Claude Code should behave when operating autonomously — what tools it can use, what it should do when it encounters a decision point, and how far it should proceed before stopping for review.
The agent runs; you monitor. You’re not absent, but you’re not the pacemaker.
In Claude Code’s Agent SDK model:
- A subagent handles a specific task domain (security, testing, documentation)
- An orchestrator coordinates multiple subagents and synthesizes their outputs
- A swarm is a set of agents working in parallel on different aspects of the same problem
Why Agents Matter for Development Workflows
Interactive Claude Code sessions have a ceiling. You can only context-switch so fast. If you need Claude to refactor 40 files, analyze 200 test failures, and update documentation — doing it synchronously means watching Claude Code work for hours.
Agents break the ceiling. You define the task, set the boundaries, and come back to results.
Common agent patterns in production:
Background audits — security or code quality scans that run while you work on features and surface a prioritized list when they finish
Test generation sweeps — agents that walk through untested code and generate test cases systematically, working through a queue rather than waiting for you to point at each file
Documentation syncs — agents that monitor recent commits and update API docs, changelogs, or README files when code changes
DevOps responders — agents that watch build logs, classify failures, attempt known fixes, and escalate to a human only when they can’t resolve
Configuring a Claude Code Agent
Agents are defined in CLAUDE.md or in dedicated skill files that specify autonomous operating parameters:
---
name: test-generator-agent
description: Autonomously generate tests for untested code modules
mode: agent
---
## Agent Scope
Analyze the `src/` directory and identify functions and modules with insufficient test coverage.
Work through the queue systematically:
1. Use coverage reports (run `npm run coverage`) to identify untested code
2. Generate test files for the top 10 least-covered modules
3. Ensure tests pass before moving to the next module
## Decision Authority
You are authorized to:
- Create new test files in `__tests__/` directories
- Run `npm test` to verify tests pass
- Fix simple test failures (wrong assertions, missing mocks)
You must stop and ask a human before:
- Modifying any source file (only modify test files)
- Deleting existing tests
- Creating more than 20 new test files in one run
## Progress Reporting
After every 3 modules completed, output a status summary:
- Modules completed
- Test files created
- Coverage improvement estimate
- Any modules skipped (with reason)
## Completion
When the queue is exhausted or the 10-module limit is reached:
1. Run `npm test` one final time
2. Output a summary of all work done
3. List any modules that need human review
The key sections: scope (what to work on), decision authority (autonomous vs. escalate), and reporting (how often to update you).
Multi-Agent Patterns
The real leverage comes from coordinating multiple agents. Three patterns cover most use cases:
Pattern 1: Parallel Specialists
Different agents work on different domains simultaneously. An orchestrator kicks them off and waits.
Orchestrator (release-prep)
├── Security agent → scan for vulnerabilities
├── Test agent → verify test coverage
├── Docs agent → update API documentation
└── [ waits for all three ] → synthesize report
Practical example: before a major release, you run /release-prep v2.4.0 and an orchestrator spins up three specialists in parallel. 20 minutes later you have a release checklist: security issues to fix, test coverage gap, outdated docs pages.
Pattern 2: Sequential Pipeline
Output of one agent becomes input to the next.
Spec agent → writes feature specification
↓
Implementation agent → writes the code to spec
↓
Test agent → writes tests against the implementation
↓
Review agent → checks implementation against spec
This mimics a mini development team. You write the user story; the pipeline handles the rest.
Pattern 3: Supervisor with Workers
One supervisor agent breaks a large task into subtasks and dispatches them to workers.
Supervisor: "Refactor the auth module"
├── Worker A: Extract JWT handling to separate file
├── Worker B: Update all import paths
├── Worker C: Update unit tests
└── Supervisor: verify all workers succeeded, clean up
Workers run in parallel; supervisor synthesizes. Especially powerful for large-scale refactors where tasks are independent but need coordination.
Real Example: Automated Security Response
Here’s a concrete agent workflow that runs in production at development teams:
The setup: A security-responder agent that monitors CI build output, classifies failures, and takes automated action on known vulnerability classes.
What it does autonomously:
- Reads the failing security scan output
- Classifies the vulnerability type and severity
- For known patterns (outdated dependencies, specific OWASP categories), applies the fix automatically
- Commits with a descriptive message and a link to the CVE
- Opens a draft PR titled “Auto-patch: [vulnerability type]”
What it escalates:
- Vulnerabilities requiring logic changes in business code
- CVEs with CVSS score > 8.0 (too risky to auto-patch)
- Any package update that might break other dependencies
The developer reviews the PR queue in the morning rather than spending time doing routine patching interactively.
Configuring Agent Permissions
Agents need explicit permissions to do useful work. In ~/.claude/settings.json or .claude/settings.json:
{
"permissions": {
"allow": [
"Bash(npm test:*)",
"Bash(npm run coverage:*)",
"Write(*.test.ts)",
"Write(*.spec.ts)",
"Edit(__tests__/*)"
],
"deny": [
"Bash(git push:*)",
"Write(src/*)"
]
}
}
The principle: give the agent write access where it should be working, deny access to things that should require human review (pushing code, modifying production source).
Use hooks to add guardrails at key decision points:
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [{
"type": "command",
"command": "echo \"Agent running: $CLAUDE_TOOL_INPUT\" >> ~/.claude/agent-audit.log"
}]
}
]
}
}
An audit log of every command an agent runs. Simple, effective.
GraphRAG: Agents with Memory
Standard Claude Code agents have context window memory — they remember the current session but not previous runs. For agents that need to accumulate knowledge across sessions (e.g., an agent tracking technical debt over weeks), you need persistent memory.
GraphRAG (Graph-based Retrieval-Augmented Generation) gives agents a knowledge graph that persists across sessions. An agent can write observations to the graph during a run and retrieve them in future runs:
Run 1: Agent discovers auth module has 3 security issues → writes to graph
Run 2: Agent picks up where it left off, knows auth issues are already documented
Run 3: Agent detects auth issues were fixed, updates graph
Without persistent memory, every agent run starts cold. With GraphRAG, agents compound knowledge over time.
Getting Started: Your First Autonomous Agent
If you’re new to agent workflows, start small. Pick one task you do repeatedly that’s mechanical and time-consuming.
Good first candidates:
- Dependency update reviewer — agent that runs weekly, checks for outdated packages, creates a summary with breaking change notes
- Test failure classifier — agent that reads CI failures and categorizes them (environment issues, logic bugs, flaky tests)
- Doc drift detector — agent that compares code comments/types against actual behavior and flags inconsistencies
For each: define the scope clearly, set conservative decision authority (lots of “escalate to human” cases early on), and expand autonomy once you trust the behavior.
The Agent Library in Claude Skills 360
Building agents from scratch requires careful design of their scope, decision boundaries, and coordination patterns. The Claude Skills 360 full bundle includes 45 pre-built autonomous agents spanning categories like:
- DevOps agents: CI/CD monitors, deployment validators, infrastructure drift detectors
- Security agents: Vulnerability scanners, dependency auditors, OWASP checkers
- Testing agents: Coverage generators, regression detectors, test suite maintainers
- Documentation agents: API doc syncs, changelog generators, onboarding guides
- Code quality agents: Refactor suggesters, dead code finders, complexity analyzers
Plus 12 multi-agent swarms that coordinate groups of these agents for bigger operations. The swarms include patterns like the “release readiness swarm” (security + testing + docs in parallel) and the “codebase health swarm” (quality + coverage + documentation all at once).
The free starter kit includes 360 skills and some single-agent workflows to get you started.
Further Reading
- How to Create Your Own Claude Code Skills — foundation for building custom agents
- Claude Code Hooks — automating behavior before/after agent actions
- CLAUDE.md Guide — giving agents project context that makes their decisions better
- Claude Code Slash Commands — the interactive side of the same system