Welcome to Raptor Data

Raptor Data is the Version Control Layer for RAG. Stop re-embedding the whole document when one sentence changes. Raptor handles parsing, chunking, and diff-based updates to cut embedding costs by 60-90%.

The Problem with Traditional RAG

When you update a document in traditional RAG systems, you have two bad options:

Re-embed everything - Expensive and wasteful when only a few sentences changed
Skip updates - Your RAG system serves stale information

Both approaches are suboptimal. Raptor solves this with intelligent version control.

How Raptor Saves Costs

Raptor uses sentence-level deduplication to identify which chunks can be reused:

Exact Match (100% reuse): Same chunk from previous version → Reuse existing embedding
High Reuse (>80% reuse): Most sentences unchanged → Consider reusing embedding
New Content: Significant changes → Generate new embedding

Real Example:

Document v1: 100 pages → 200 chunks → 200 embeddings
Document v2: Added 2 pages, changed 5 paragraphs
Traditional RAG: Re-embed all 210 chunks → 210 embeddings
Raptor: Only embed 15 new chunks → 90% cost savings

Key Features

Auto-Linking

Automatically detect when uploaded documents are versions of existing documents based on filename patterns and content similarity

Intelligent Deduplication

Sentence-level diff analysis identifies reusable chunks, cutting embedding costs by 60-90%

Version Control

Full version history with lineage tracking, comparison tools, and revert capabilities

Duplicate Detection

Identify duplicate documents across your corpus to optimize storage and embeddings

Architecture

Raptor uses a three-level hierarchy:

Document (logical entity)
  └─ Version (content snapshot)
      └─ Variant (processing configuration)
          └─ Chunks (processed content)

Document: The logical entity (e.g., “User Manual”)
Version: A content snapshot (e.g., v1.0, v1.1, v2.0)
Variant: A processing configuration (different chunk sizes, strategies)
Chunk: The processed content with deduplication metadata

Getting Started

Quickstart

Get up and running in 5 minutes with the TypeScript SDK

API Reference

Explore the REST API endpoints

Auto-Linking

Learn how auto-linking saves time and ensures consistency

Deduplication

Understand how deduplication cuts costs by 60-90%

Use Cases

Document Version Management

Keep track of policy documents, user manuals, or contracts as they evolve. Raptor maintains full version history and automatically links new versions.

Cost-Optimized RAG

Reduce embedding costs by 60-90% by only re-embedding changed content. Perfect for large document corpora with frequent updates.

Compliance & Audit

Track exactly what changed between document versions with detailed changelogs and diff analysis. Perfect for regulated industries.

Multi-Variant Processing

Process the same document with different chunking strategies to find the optimal configuration without storing multiple copies.

Next Steps

Install the SDK

Install the TypeScript SDK and configure your API key

npm install @raptor-data/ts-sdk

Upload Your First Document

Process a document and get intelligent chunks with deduplication metadata

Enable Auto-Linking

Configure auto-linking to automatically build version history

Optimize Embeddings

Use deduplication metadata to reuse embeddings and cut costs

Ready to get started? Check out the Quickstart Guide.

Get Started

Core Concepts

TypeScript SDK

API Reference

Introduction

Welcome to Raptor Data

The Problem with Traditional RAG

How Raptor Saves Costs

Key Features

Auto-Linking

Intelligent Deduplication

Version Control

Duplicate Detection

Architecture

Getting Started

Quickstart

API Reference

Auto-Linking

Deduplication

Use Cases

Document Version Management

Cost-Optimized RAG

Compliance & Audit

Multi-Variant Processing

Next Steps

Get Started

Core Concepts

TypeScript SDK

API Reference

​Welcome to Raptor Data

​The Problem with Traditional RAG

​How Raptor Saves Costs

​Key Features

Auto-Linking

Intelligent Deduplication

Version Control

Duplicate Detection

​Architecture

​Getting Started

Quickstart

API Reference

Auto-Linking

Deduplication

​Use Cases

​Document Version Management

​Cost-Optimized RAG

​Compliance & Audit

​Multi-Variant Processing

​Next Steps

Welcome to Raptor Data

The Problem with Traditional RAG

How Raptor Saves Costs

Key Features

Architecture

Getting Started

Use Cases

Document Version Management

Cost-Optimized RAG

Compliance & Audit

Multi-Variant Processing

Next Steps