Skip to main content

Quickstart Guide

This guide will help you upload your first document, enable auto-linking, and understand the deduplication metadata.

Installation

Install the TypeScript SDK:
npm install @raptor-data/ts-sdk

Get Your API Key

  1. Sign up at raptordata.dev/app
  2. Navigate to Settings → API Keys
  3. Click Create API Key and copy the key
Store your API key securely. It’s only shown once during creation.

Basic Usage

Initialize the Client

import Raptor from '@raptor-data/ts-sdk';

const raptor = new Raptor({
  apiKey: 'your-api-key-here'
});

Upload Your First Document

// From a File object (browser)
const file = document.querySelector('input[type="file"]').files[0];

const result = await raptor.process(file, {
  chunkSize: 512,
  strategy: 'semantic',
  storeContent: false // Set to true to retrieve original content later
});

console.log('Document processed:', result.document_id);
console.log('Chunks created:', result.chunks_count);

From File Path (Node.js)

import fs from 'fs';

const fileBuffer = fs.readFileSync('./document.pdf');
const blob = new Blob([fileBuffer]);

const result = await raptor.process(blob, {
  chunkSize: 512,
  strategy: 'semantic'
});

Enable Auto-Linking

Auto-linking automatically detects when uploaded documents are new versions of existing documents:
// Upload first version
const v1 = await raptor.process(file1, {
  versionLabel: 'v1.0'
});

// Upload second version - Raptor auto-detects it's a new version
const v2 = await raptor.process(file2, {
  autoLink: true,
  autoLinkThreshold: 0.85, // 85% similarity threshold
  versionLabel: 'v2.0'
});

if (v2.auto_linked) {
  console.log('Auto-linked to:', v2.parent_document_id);
  console.log('Confidence:', v2.auto_link_confidence);
  console.log('Method:', v2.auto_link_method); // 'metadata' or 'content'
}
Auto-linking uses filename patterns and content similarity to detect versions. The default threshold is 85%.

Understanding Deduplication

When you upload a new version, Raptor analyzes each chunk to determine reusability:
// Get chunks from the new version
const chunks = await raptor.getDocumentChunks(v2.document_id, {
  includeFullMetadata: true
});

chunks.forEach(chunk => {
  console.log('Chunk:', chunk.chunk_index);
  console.log('Strategy:', chunk.dedup_strategy); // 'exact', 'high_reuse', 'new'
  console.log('Reuse ratio:', chunk.content_reuse_ratio); // 0.0 - 1.0
  console.log('Recommendation:', chunk.embedding_recommendation); // 'reuse', 'consider_reuse', 'regenerate'

  if (chunk.embedding_recommendation === 'reuse') {
    // Reuse embedding from source chunk
    console.log('Reuse embedding from:', chunk.dedup_source_chunk_id);
  } else if (chunk.embedding_recommendation === 'consider_reuse') {
    console.log('80%+ reused - consider reusing embedding');
  } else {
    // Generate new embedding
    console.log('Generate new embedding');
  }
});

Deduplication Summary

Get aggregate statistics about content reuse:
const summary = await raptor.getDedupSummary(v2.variant_id);

console.log('Total chunks:', summary.total_chunks);
console.log('Reusable embeddings:', summary.embedding_recommendations.reuse);
console.log('Consider reuse:', summary.embedding_recommendations.consider_reuse);
console.log('New embeddings needed:', summary.embedding_recommendations.regenerate);
console.log('Sentence reuse ratio:', summary.sentence_reuse_ratio); // 0.7 = 70% reused

Working with Versions

List Document Versions

const versions = await raptor.listVersions(documentId);

versions.forEach(version => {
  console.log('Version:', version.version_number);
  console.log('Label:', version.version_label);
  console.log('Variants:', version.variants.length);
});

Compare Versions

const comparison = await raptor.compareDocuments(v1.document_id, v2.document_id);

console.log('Similarity:', comparison.similarity_score);
console.log('Changes:', comparison.changes);

Get Document Lineage

const lineage = await raptor.getDocumentLineage(documentId);

console.log('Total versions:', lineage.total_versions);
lineage.documents.forEach(doc => {
  console.log(doc.version_label, '→', doc.filename);
});

Stream Processing (Long Documents)

For large documents, stream progress updates:
for await (const update of raptor.processStream(file)) {
  if (update.status === 'processing') {
    console.log('Processing...');
  } else if (update.status === 'completed') {
    console.log('Completed!', update.data);
  } else if (update.status === 'failed') {
    console.error('Failed:', update.error);
  }
}

React Integration

Use React hooks for easy integration:
import { useProcessingStatus } from '@raptor-data/ts-sdk/hooks';

function DocumentStatus({ variantId }) {
  const { variant, loading, error } = useProcessingStatus(raptor, variantId);

  if (loading) return <div>Processing...</div>;
  if (error) return <div>Error: {error.message}</div>;

  return (
    <div>
      Status: {variant.status}
      Chunks: {variant.chunks_count}
    </div>
  );
}

Next Steps

Common Patterns

Pattern: Version Control for User Manuals

// Upload first version
const v1 = await raptor.process(manual_v1, {
  versionLabel: 'v1.0',
  chunkSize: 512
});

// Upload updated version with auto-linking
const v2 = await raptor.process(manual_v2, {
  versionLabel: 'v2.0',
  autoLink: true,
  parentDocumentId: v1.document_id // Optional: explicit parent
});

// Get deduplication summary
const savings = await raptor.getDedupSummary(v2.variant_id);
console.log(`Cost savings: ${(savings.sentence_reuse_ratio * 100).toFixed(0)}%`);
// Find similar documents
const similar = await raptor.findSimilarDocuments(documentId, 0.8);

// Manually link as parent-child
if (similar.suggestions.length > 0) {
  await raptor.linkToParent(
    documentId,
    similar.suggestions[0].documentId,
    'v2.1'
  );
}

Pattern: Reprocess with Different Configuration

// Create a new variant with different chunking
const variant = await raptor.reprocessDocument(documentId, {
  chunkSize: 1024,
  strategy: 'recursive',
  tableExtraction: true
});

console.log('New variant created:', variant.variant_id);