Prompt Caching Quick Reference Cheat Sheet

April 26, 2026
Anablock
AI Insights & Innovations

Anablock is a technology and AI systems company helping businesses automate workflows, connect tools, improve lead handling, and build smarter digital growth systems. The Anablock team writes about AI implementation, automation, CRM, lead generation, SEO/AEO, and practical ways businesses can use technology to operate better and grow.

Follow Anablock on LinkedIn

Claude Caching 2

Prompt Caching Quick Reference Cheat Sheet

πŸš€ Quick Start

Enable Caching (TypeScript)

const response = await client.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  system: [
    {
      type: 'text',
      text: 'Your system prompt...',
      cache_control: { type: 'ephemeral' }  // ← Add this
    }
  ],
  messages: [...]
});

Enable Caching (Python)

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "Your system prompt...",
            "cache_control": {"type": "ephemeral"}  # ← Add this
        }
    ],
    messages=[...]
)

πŸ“Š Key Metrics

MetricValue
Cache Lifetime60 minutes
Minimum Tokens1,024 tokens
Max Breakpoints4 per request
Speed ImprovementUp to 85% faster
Cost Reduction~90% on cached tokens

πŸ’° Pricing (Claude 3.5 Sonnet)

TypeCost per 1M tokens
Regular Input$3.00
Cache Write$3.75 (+25%)
Cache Read$0.30 (-90%)
Output$15.00

🎯 When to Use Caching

βœ… Good Use Cases:

  • Same document, multiple questions
  • Consistent system prompts
  • Repeated tool definitions
  • Multi-turn conversations
  • Batch processing with shared context

❌ Poor Use Cases:

  • One-off requests
  • Constantly changing content
  • Very short prompts (<1024 tokens)
  • Requests >1 hour apart

πŸ”§ Common Patterns

Pattern 1: Cache System Prompt + Tools

// TypeScript
const tools = [...];  // Your tools
const toolsWithCache = tools.map((tool, idx) => 
  idx === tools.length - 1 
    ? { ...tool, cache_control: { type: 'ephemeral' } }
    : tool
);

const response = await client.messages.create({
  system: [
    {
      type: 'text',
      text: systemPrompt,
      cache_control: { type: 'ephemeral' }
    }
  ],
  tools: toolsWithCache,
  messages: [...]
});
# Python
def add_cache_to_tools(tools):
    tools_clone = tools.copy()
    last_tool = tools_clone[-1].copy()
    last_tool["cache_control"] = {"type": "ephemeral"}
    tools_clone[-1] = last_tool
    return tools_clone

response = client.messages.create(
    system=[
        {
            "type": "text",
            "text": system_prompt,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    tools=add_cache_to_tools(tools),
    messages=[...]
)

Pattern 2: Cache Document in Messages

// TypeScript
const messages = [
  {
    role: 'user',
    content: [
      {
        type: 'text',
        text: `Document:\n\n${document}`,
        cache_control: { type: 'ephemeral' }  // Cached
      },
      {
        type: 'text',
        text: question  // Not cached - changes each request
      }
    ]
  }
];
# Python
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": f"Document:\n\n{document}",
                "cache_control": {"type": "ephemeral"}  # Cached
            },
            {
                "type": "text",
                "text": question  # Not cached
            }
        ]
    }
]

Pattern 3: Multi-Level Caching

// Cache hierarchy: Tools β†’ System β†’ Document β†’ Question
const response = await client.messages.create({
  tools: addCacheToLastTool(tools),  // Breakpoint 1
  system: [
    {
      type: 'text',
      text: systemPrompt,
      cache_control: { type: 'ephemeral' }  // Breakpoint 2
    }
  ],
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: document,
          cache_control: { type: 'ephemeral' }  // Breakpoint 3
        },
        {
          type: 'text',
          text: question  // Not cached
        }
      ]
    }
  ]
});

πŸ“ˆ Reading Cache Metrics

Response Usage Object

{
  "usage": {
    "input_tokens": 750,
    "cache_creation_input_tokens": 1772,  // First request
    "cache_read_input_tokens": 1772,      // Subsequent requests
    "output_tokens": 350
  }
}

What Each Means

| Field | Meaning | |-------|---------|| | input_tokens | Fresh tokens processed | | cache_creation_input_tokens | Tokens written to cache (first request) | | cache_read_input_tokens | Tokens read from cache (subsequent) | | output_tokens | Response length |

Calculate Cache Hit Rate

const cacheHitRate = 
  (usage.cache_read_input_tokens / 
   (usage.input_tokens + usage.cache_read_input_tokens)) * 100;

console.log(`Cache hit rate: ${cacheHitRate.toFixed(1)}%`);

⚠️ Common Pitfalls

❌ Pitfall 1: Content Changed Slightly

// Request 1
const doc = "Analyze this document.";

// Request 2
const doc = "Analyze this document!";  // Added "!" - cache miss!

Fix: Keep cached content identical.

❌ Pitfall 2: Using Shorthand Format

// Wrong - no place for cache_control
messages: [
  { role: 'user', content: 'Hello' }
]

// Correct - longhand format
messages: [
  {
    role: 'user',
    content: [
      {
        type: 'text',
        text: 'Hello',
        cache_control: { type: 'ephemeral' }
      }
    ]
  }
]

❌ Pitfall 3: Below Token Threshold

// Too short - won't cache
system: "You are helpful."  // ~50 tokens

// Better - combine elements to exceed 1024 tokens
system: [detailed_instructions] + [examples] + [guidelines]

❌ Pitfall 4: Reordering Tools

// Request 1
tools = [toolA, toolB, toolC]

// Request 2
tools = [toolB, toolA, toolC]  // Different order - cache miss!

Fix: Maintain consistent tool order.


πŸ” Debugging Cache Issues

Check if Cache is Working

// TypeScript
function analyzeCacheUsage(usage: Anthropic.Usage): void {
  if (usage.cache_creation_input_tokens) {
    console.log('βœ“ Cache created:', usage.cache_creation_input_tokens, 'tokens');
  }
  
  if (usage.cache_read_input_tokens) {
    console.log('βœ“ Cache hit:', usage.cache_read_input_tokens, 'tokens');
  }
  
  if (!usage.cache_creation_input_tokens && !usage.cache_read_input_tokens) {
    console.log('⚠️  No caching occurred - check:');
    console.log('  - Content is >1024 tokens');
    console.log('  - cache_control is set');
    console.log('  - Using longhand format');
  }
}
# Python
def analyze_cache_usage(usage):
    if usage.cache_creation_input_tokens:
        print(f"βœ“ Cache created: {usage.cache_creation_input_tokens} tokens")
    
    if usage.cache_read_input_tokens:
        print(f"βœ“ Cache hit: {usage.cache_read_input_tokens} tokens")
    
    if not usage.cache_creation_input_tokens and not usage.cache_read_input_tokens:
        print("⚠️  No caching occurred - check:")
        print("  - Content is >1024 tokens")
        print("  - cache_control is set")
        print("  - Using longhand format")

πŸ’‘ Pro Tips

  1. Cache Warming: Make a dummy request to populate cache before real traffic
  2. Refresh Before Expiry: Schedule cache refresh every 50 minutes
  3. Monitor Hit Rates: Track cache_read_input_tokens to measure effectiveness
  4. Use Defensive Copying: Clone tools/prompts before adding cache control
  5. Combine with Redis: Two-tier caching for maximum efficiency

πŸ“š Processing Order

Claude processes components in this order:

  1. Tools (if provided)
  2. System prompt (if provided)
  3. Messages (in order)

Place breakpoints strategically based on what changes:

  • Tools rarely change β†’ cache first
  • System prompt changes daily β†’ cache second
  • Messages change per request β†’ cache selectively

πŸŽ“ Quick Decision Tree

Do you send the same content repeatedly?
β”œβ”€ No β†’ Don't use caching
└─ Yes
   β”œβ”€ Is content >1024 tokens?
   β”‚  β”œβ”€ No β†’ Combine elements or skip caching
   β”‚  └─ Yes
   β”‚     β”œβ”€ Requests within 1 hour?
   β”‚     β”‚  β”œβ”€ No β†’ Consider cache warming
   β”‚     β”‚  └─ Yes β†’ βœ… Use caching!

πŸ”— Quick Links


πŸ“‹ Checklist for Production

  • System prompt uses longhand format with cache_control
  • Tools have cache control on last tool
  • Content exceeds 1,024 token minimum
  • Cached content is identical across requests
  • Monitoring cache hit rates
  • Handling cache expiry gracefully
  • Testing with real usage patterns
  • Documented caching strategy for team

Print this cheat sheet and keep it handy while implementing prompt caching!

Share this article:
View all articles

Related Articles

The Institutions That Move First Will Win: AI and the Future of Institutional Finance featured image
June 16, 2026
The financial services industry is at an AI inflection point. Institutions that act decisively on AI today will build compounding competitive advantages in deal velocity, research quality, and compliance efficiency. Anablock, an official Anthropic implementation partner, explains why Claude is the right foundation β€” and how to move fast.

Unlock the Full Power of AI-Driven Transformation

Schedule Demo

See how Anablock can automate and scale your business with AI.

Book Demo

Start a Support Agent

Talk directly with our AI experts and get real-time guidance.

Call Now

Send us a Message