Stop Paying Twice

Every time a user asks “What’s the capital of France?” you pay for a new API call. With Raptor’s semantic cache, you pay once.

How It Works

Request: "What is the capital of France?"
→ Compute embedding
→ Check cache for similar queries
→ HIT! Return cached response in ~5ms

Unlike traditional caches, we don’t require exact text matches:

Query	Cache Result
”What is the capital of France?”	Original query
”What’s France’s capital?”	Cache HIT
”capital of france?”	Cache HIT
”Tell me the capital of France”	Cache HIT

All return the same cached response. That’s 75% fewer API calls just from query variations.

Typical Savings

Use Case	Cache Hit Rate	Monthly Savings
FAQ bots	60-80%	~ $400 per$ 500 spend
Customer support	40-60%	~ $250 per$ 500 spend
Search/RAG	30-50%	~ $200 per$ 500 spend
Code assistants	20-40%	~ $150 per$ 500 spend

Response Headers

Check cache status in every response:

X-Raptor-Cache: hit              # or "miss"
X-Raptor-Latency-Ms: 5           # Total Raptor time
X-Raptor-Upstream-Latency-Ms: 0  # 0 on cache hit

Streaming Support

Cache hits work with streaming too. We replay cached responses with realistic timing (~15ms between words) so users still see the “typing” effect.

# This hits cache and streams just like the original
stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the capital of France?"}],
    stream=True
)

Cache Behavior

Request Type	Caches?	Serves from Cache?
Non-streaming	Yes	Yes
Streaming	Yes	Yes (replayed)
With tools	No	No
Image generation	No	No

Best Practices

Normalize inputs - Remove extra whitespace, lowercase when possible
Use specific system prompts - Helps cache recognize similar queries
Monitor hit rates - Check your dashboard for optimization opportunities

Your first request is always a cache miss. Test with realistic query patterns to see actual savings.

Get Started

Integrations

Features

Semantic Cache

Stop Paying Twice

How It Works

Typical Savings

Response Headers

Streaming Support

Cache Behavior

Best Practices

Get Started

Integrations

Features

​Stop Paying Twice

​How It Works

​Typical Savings

​Response Headers

​Streaming Support

​Cache Behavior

​Best Practices

Stop Paying Twice

How It Works

Typical Savings

Response Headers

Streaming Support

Cache Behavior

Best Practices