Skip to main content

Welcome to Raptor Data

Raptor Data is the Version Control Layer for RAG. Stop re-embedding the whole document when one sentence changes. Raptor handles parsing, chunking, and diff-based updates to cut embedding costs by 60-90%.

The Problem with Traditional RAG

When you update a document in traditional RAG systems, you have two bad options:
  1. Re-embed everything - Expensive and wasteful when only a few sentences changed
  2. Skip updates - Your RAG system serves stale information
Both approaches are suboptimal. Raptor solves this with intelligent version control.

How Raptor Saves Costs

Raptor uses sentence-level deduplication to identify which chunks can be reused:
  • Exact Match (100% reuse): Same chunk from previous version → Reuse existing embedding
  • High Reuse (>80% reuse): Most sentences unchanged → Consider reusing embedding
  • New Content: Significant changes → Generate new embedding
Real Example:
  • Document v1: 100 pages → 200 chunks → 200 embeddings
  • Document v2: Added 2 pages, changed 5 paragraphs
  • Traditional RAG: Re-embed all 210 chunks → 210 embeddings
  • Raptor: Only embed 15 new chunks → 90% cost savings

Key Features

Auto-Linking

Automatically detect when uploaded documents are versions of existing documents based on filename patterns and content similarity

Intelligent Deduplication

Sentence-level diff analysis identifies reusable chunks, cutting embedding costs by 60-90%

Version Control

Full version history with lineage tracking, comparison tools, and revert capabilities

Duplicate Detection

Identify duplicate documents across your corpus to optimize storage and embeddings

Architecture

Raptor uses a three-level hierarchy:
Document (logical entity)
  └─ Version (content snapshot)
      └─ Variant (processing configuration)
          └─ Chunks (processed content)
  • Document: The logical entity (e.g., “User Manual”)
  • Version: A content snapshot (e.g., v1.0, v1.1, v2.0)
  • Variant: A processing configuration (different chunk sizes, strategies)
  • Chunk: The processed content with deduplication metadata

Getting Started

Use Cases

Document Version Management

Keep track of policy documents, user manuals, or contracts as they evolve. Raptor maintains full version history and automatically links new versions.

Cost-Optimized RAG

Reduce embedding costs by 60-90% by only re-embedding changed content. Perfect for large document corpora with frequent updates.

Compliance & Audit

Track exactly what changed between document versions with detailed changelogs and diff analysis. Perfect for regulated industries.

Multi-Variant Processing

Process the same document with different chunking strategies to find the optimal configuration without storing multiple copies.

Next Steps

1

Install the SDK

Install the TypeScript SDK and configure your API key
npm install @raptor-data/ts-sdk
2

Upload Your First Document

Process a document and get intelligent chunks with deduplication metadata
3

Enable Auto-Linking

Configure auto-linking to automatically build version history
4

Optimize Embeddings

Use deduplication metadata to reuse embeddings and cut costs
Ready to get started? Check out the Quickstart Guide.