Claude Code for Performance Optimization: Profiling, Bottlenecks, and Speed — Claude Skills 360 Blog
Blog / Development / Claude Code for Performance Optimization: Profiling, Bottlenecks, and Speed
Development

Claude Code for Performance Optimization: Profiling, Bottlenecks, and Speed

Published: May 22, 2026
Read time: 9 min read
By: Claude Skills 360

Performance optimization follows a pattern: measure, identify the bottleneck, fix the bottleneck, measure again. The mistake most developers make is guessing where the problem is and optimizing the wrong thing. Claude Code is effective for performance work because it reads profiler output and code together — it doesn’t guess where the bottleneck is, it reads the data.

This guide covers performance optimization with Claude Code: profiling Node.js and Python apps, memory leak detection, database query analysis, and frontend performance.

The Optimization Workflow

Before showing Claude code to optimize, get the profiler data:

Here's the flamegraph output from profiling this Node.js service 
under load for 60 seconds:
[paste flamegraph data or screenshot description]
The hottest path is: requestHandler → getUserOrders → formatOrders
Total time: getUserOrders (45%), formatOrders (38%), other (17%)
The service processes 200 req/s and we need 500 req/s.

With this context, Claude identifies where to focus:

Based on the flamegraph:
1. getUserOrders (45%) — likely N+1 queries or missing index
2. formatOrders (38%) — probably expensive transformation, try:
   - Move computation to DB (computed column or view)
   - Pre-format at write time instead of read time
   - Cache the formatted output

Let me see getUserOrders and formatOrders:

Lead with data, not code. Then show the code.

Node.js Performance

CPU Profiling

My API is CPU-bound and not IO-bound. Here's the 
V8 CPU profile (collected with --prof):
[paste profile output or describe what's hot]

To generate the profile:

# Run with CPU profiler
node --prof app.js

# Under load for 30s, then convert the profile
node --prof-process isolate-*.log > cpu-profile.txt

# Or use clinic.js (easier):
npx clinic flame -- node app.js

Claude reads the sorted function list (hotttest functions by % of samples) and identifies optimization targets. Common findings:

  • JSON.stringify called on large objects repeatedly → cache or reduce payload size
  • Regex operations compiled repeatedly → hoist to module level
  • String concatenation in hot loops → use Buffer.allocUnsafe + write

Memory Leak Detection

My Node.js service grows from 100MB to 800MB over 2 hours then crashes.
Here's the heap snapshot diff between startup and 30 minutes:
[paste or describe the diff]

To collect heap snapshots:

import { writeHeapSnapshot } from 'v8';

// Expose as a debug endpoint
app.get('/debug/heapdump', (req, res) => {
  if (req.ip !== '127.0.0.1') return res.status(403).end();
  const file = writeHeapSnapshot();
  res.json({ file });
});

Claude reads the heap diff and looks for: objects growing unboundedly (leak candidate), circular references preventing GC, large closure captures, undrained EventEmitter listeners. It identifies the most likely leak with specific line numbers.

Common Node.js Performance Patterns

This function processes a CSV with 100,000 rows.
It takes 45 seconds. Target is 5 seconds.
[paste function]

Claude analyzes and generates the optimized version using streaming:

// Before: Load entire file into memory
const rows = await fs.readFile(path, 'utf8').split('\n').map(parse);

// After: Stream processing with backpressure
import { createReadStream } from 'fs';
import { createInterface } from 'readline';

async function processCsvStream(path: string, onRow: (row: Row) => Promise<void>) {
  const stream = createReadStream(path);
  const rl = createInterface({ input: stream, crlfDelay: Infinity });
  
  const BATCH_SIZE = 1000;
  let batch: Row[] = [];
  
  for await (const line of rl) {
    batch.push(parseRow(line));
    
    if (batch.length >= BATCH_SIZE) {
      await processBatch(batch); // Process in chunks to control memory
      batch = [];
    }
  }
  
  if (batch.length > 0) await processBatch(batch);
}

Streaming + batching: processes 100K rows in ~2 seconds vs 45 seconds, using constant memory instead of O(n).

Python Performance

cProfile Analysis

This Python function takes 8 seconds on a 50K-record dataset.
Here's cProfile output:
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  50000    6.234    0.000    6.234    0.000 utils.py:45(format_record)
      1    0.123    0.123    7.891    7.891 processor.py:23(process_all)
Tell me: format_record takes 6.2s of 7.9s total.
Show me format_record so I can see what it's doing.

Claude identifies that format_record being called 50K times with 6s total means 0.12ms per call — typically string operations, regex, or dict/list construction that can be vectorized with pandas or NumPy instead of a Python for loop.

Vectorization with NumPy/Pandas

This loop processes 1M rows taking 120 seconds.
[paste for loop that does numeric operations]
Vectorize it.
# Before: Python loop (120s for 1M rows)
results = []
for row in df.iterrows():
    _, r = row
    result = (r['value'] * 1.1) + r['adjustment'] if r['flag'] else r['value']
    results.append(result)
df['result'] = results

# After: NumPy vectorized (0.3s for 1M rows — 400x faster)
df['result'] = np.where(
    df['flag'],
    df['value'] * 1.1 + df['adjustment'],
    df['value']
)

np.where vectorizes conditional operations across the entire array in C, bypassing Python’s interpreter overhead per element.

Async Concurrency for IO-Bound Python

My Python service makes 200 HTTP requests sequentially.
It takes 40 seconds. Speed it up.
import asyncio
import aiohttp

# Before: Sequential (40s for 200 requests)
for url in urls:
    response = requests.get(url)
    results.append(response.json())

# After: Concurrent with rate limiting (2s for 200 requests)
async def fetch_all(urls: list[str]) -> list[dict]:
    async with aiohttp.ClientSession() as session:
        # Semaphore: max 20 concurrent requests
        sem = asyncio.Semaphore(20)
        
        async def fetch_one(url: str) -> dict:
            async with sem:
                async with session.get(url, timeout=aiohttp.ClientTimeout(total=5)) as resp:
                    return await resp.json()
        
        return await asyncio.gather(*[fetch_one(url) for url in urls])

20 concurrent requests instead of 1: if each request takes 200ms, 200 requests takes 40s sequential vs 2s concurrent.

Database Query Performance

This query takes 8 seconds in production.
We have 5M rows in the orders table.
EXPLAIN ANALYZE output:
Seq Scan on orders (cost=0.00..94821.00 rows=4893 width=142)
  Filter: ((created_at > '2026-01-01') AND (user_id = 12345))

Claude reads the EXPLAIN ANALYZE and identifies: sequential scan on 5M rows, filtering afterwards. The fix:

-- Add composite index for the query pattern
CREATE INDEX CONCURRENTLY idx_orders_user_created 
ON orders (user_id, created_at DESC)
WHERE status != 'cancelled';  -- Partial index excludes cancelled orders

-- Expected plan after index:
-- Index Scan using idx_orders_user_created on orders
--   Index Cond: ((user_id = 12345) AND (created_at > '2026-01-01'))

CONCURRENTLY builds the index without locking production. The partial index (WHERE status != 'cancelled') is smaller and faster than a full index when most queries exclude cancelled orders. See the database guide for comprehensive query optimization patterns.

Frontend Performance

My React app has a 4-second Largest Contentful Paint.
Lighthouse report shows: render-blocking resources, 
large JS bundle, no lazy loading on images.

Claude reads the Lighthouse report and provides specific fixes:

// 1. Code splitting — lazy load routes
const DashboardPage = lazy(() => import('./pages/Dashboard'));
const ProfilePage = lazy(() => import('./pages/Profile'));

// 2. Lazy load heavy components
const HeavyChart = lazy(() => import('./components/HeavyChart'));

// 3. Preload critical routes
<link rel="preload" href="/fonts/main.woff2" as="font" crossOrigin="anonymous" />

// 4. Image lazy loading with proper sizing
<img 
  src="/hero.webp" 
  width={1200} 
  height={600}
  loading="eager"  // Hero image — load eagerly
  fetchPriority="high"
  alt="Hero"
/>
<img 
  src="/product.webp"
  width={400}
  height={300}
  loading="lazy"   // Below-fold — lazy load
  alt="Product"
/>

Bundle Size Analysis

My webpack bundle is 2.4MB. Here's the bundle analyzer output.
[paste or describe top modules by size]

Common findings and fixes:

  • moment.js (700KB) → replace with date-fns (tree-shakeable, 20KB)
  • Large icon libraries → import only used icons (import { XIcon } from 'lucide-react' not import * from 'lucide-react')
  • Dev-only packages in production bundle → check devDependencies vs dependencies
  • Duplicate packages (two versions of react-dom) → enforce single version

Caching Strategies

Which parts of this API should be cached and at which layer?
Show me the request path: client → CDN → API → DB

Claude analyzes your endpoint types and recommends:

  • CDN caching: public, read-heavy endpoints (product catalogs, blog posts) — Cache-Control: public, max-age=3600
  • Application cache (Redis): expensive computation results, session data, rate limit counters
  • Database query cache: queries with stable results (aggregations, counts)
  • No cache: user-specific data, mutations, real-time data
// Redis cache with TTL and cache-aside pattern
async function getCachedUserProfile(userId: string): Promise<UserProfile> {
  const key = `user:profile:${userId}`;
  
  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);
  
  const profile = await db.user.findUnique({ where: { id: userId }, include: { ... } });
  
  // Cache for 5 minutes — acceptable staleness for profile data
  await redis.setex(key, 300, JSON.stringify(profile));
  
  return profile;
}

Performance Work with Claude Code

The best performance workflow: collect real data (profiler output, slow query logs, Lighthouse scores), show it to Claude with the relevant code, and get targeted optimization rather than generic advice.

For the database layer see the database guide. For observability to collect the performance data in the first place, see the observability guide. The Claude Skills 360 bundle includes performance optimization skill sets for Node.js, Python, and database query analysis. Start with the free tier for the profiling and analysis patterns.

Put these ideas into practice

Claude Skills 360 gives you production-ready skills for everything in this article — and 2,350+ more. Start free or go all-in.

Back to Blog

Get 360 skills free