Welcome to Raptor Data
Raptor Data is the Version Control Layer for RAG. Stop re-embedding the whole document when one sentence changes. Raptor handles parsing, chunking, and diff-based updates to cut embedding costs by 60-90%.The Problem with Traditional RAG
When you update a document in traditional RAG systems, you have two bad options:- Re-embed everything - Expensive and wasteful when only a few sentences changed
- Skip updates - Your RAG system serves stale information
How Raptor Saves Costs
Raptor uses sentence-level deduplication to identify which chunks can be reused:- Exact Match (100% reuse): Same chunk from previous version → Reuse existing embedding
- High Reuse (>80% reuse): Most sentences unchanged → Consider reusing embedding
- New Content: Significant changes → Generate new embedding
- Document v1: 100 pages → 200 chunks → 200 embeddings
- Document v2: Added 2 pages, changed 5 paragraphs
- Traditional RAG: Re-embed all 210 chunks → 210 embeddings
- Raptor: Only embed 15 new chunks → 90% cost savings
Key Features
Auto-Linking
Automatically detect when uploaded documents are versions of existing documents based on filename patterns and content similarity
Intelligent Deduplication
Sentence-level diff analysis identifies reusable chunks, cutting embedding costs by 60-90%
Version Control
Full version history with lineage tracking, comparison tools, and revert capabilities
Duplicate Detection
Identify duplicate documents across your corpus to optimize storage and embeddings
Architecture
Raptor uses a three-level hierarchy:- Document: The logical entity (e.g., “User Manual”)
- Version: A content snapshot (e.g., v1.0, v1.1, v2.0)
- Variant: A processing configuration (different chunk sizes, strategies)
- Chunk: The processed content with deduplication metadata
Getting Started
Quickstart
Get up and running in 5 minutes with the TypeScript SDK
API Reference
Explore the REST API endpoints
Auto-Linking
Learn how auto-linking saves time and ensures consistency
Deduplication
Understand how deduplication cuts costs by 60-90%
Use Cases
Document Version Management
Keep track of policy documents, user manuals, or contracts as they evolve. Raptor maintains full version history and automatically links new versions.Cost-Optimized RAG
Reduce embedding costs by 60-90% by only re-embedding changed content. Perfect for large document corpora with frequent updates.Compliance & Audit
Track exactly what changed between document versions with detailed changelogs and diff analysis. Perfect for regulated industries.Multi-Variant Processing
Process the same document with different chunking strategies to find the optimal configuration without storing multiple copies.Next Steps
1
Install the SDK
Install the TypeScript SDK and configure your API key
2
Upload Your First Document
Process a document and get intelligent chunks with deduplication metadata
3
Enable Auto-Linking
Configure auto-linking to automatically build version history
4
Optimize Embeddings
Use deduplication metadata to reuse embeddings and cut costs