Quickstart Guide
This guide will help you upload your first document, enable auto-linking, and understand the deduplication metadata.
Installation
Install the TypeScript SDK:
npm install @raptor-data/ts-sdk
Get Your API Key
- Sign up at raptordata.dev/app
- Navigate to Settings → API Keys
- Click Create API Key and copy the key
Store your API key securely. It’s only shown once during creation.
Basic Usage
Initialize the Client
import Raptor from '@raptor-data/ts-sdk';
const raptor = new Raptor({
apiKey: 'your-api-key-here'
});
Upload Your First Document
// From a File object (browser)
const file = document.querySelector('input[type="file"]').files[0];
const result = await raptor.process(file, {
chunkSize: 512,
strategy: 'semantic',
storeContent: false // Set to true to retrieve original content later
});
console.log('Document processed:', result.document_id);
console.log('Chunks created:', result.chunks_count);
From File Path (Node.js)
import fs from 'fs';
const fileBuffer = fs.readFileSync('./document.pdf');
const blob = new Blob([fileBuffer]);
const result = await raptor.process(blob, {
chunkSize: 512,
strategy: 'semantic'
});
Enable Auto-Linking
Auto-linking automatically detects when uploaded documents are new versions of existing documents:
// Upload first version
const v1 = await raptor.process(file1, {
versionLabel: 'v1.0'
});
// Upload second version - Raptor auto-detects it's a new version
const v2 = await raptor.process(file2, {
autoLink: true,
autoLinkThreshold: 0.85, // 85% similarity threshold
versionLabel: 'v2.0'
});
if (v2.auto_linked) {
console.log('Auto-linked to:', v2.parent_document_id);
console.log('Confidence:', v2.auto_link_confidence);
console.log('Method:', v2.auto_link_method); // 'metadata' or 'content'
}
Auto-linking uses filename patterns and content similarity to detect versions. The default threshold is 85%.
Understanding Deduplication
When you upload a new version, Raptor analyzes each chunk to determine reusability:
// Get chunks from the new version
const chunks = await raptor.getDocumentChunks(v2.document_id, {
includeFullMetadata: true
});
chunks.forEach(chunk => {
console.log('Chunk:', chunk.chunk_index);
console.log('Strategy:', chunk.dedup_strategy); // 'exact', 'high_reuse', 'new'
console.log('Reuse ratio:', chunk.content_reuse_ratio); // 0.0 - 1.0
console.log('Recommendation:', chunk.embedding_recommendation); // 'reuse', 'consider_reuse', 'regenerate'
if (chunk.embedding_recommendation === 'reuse') {
// Reuse embedding from source chunk
console.log('Reuse embedding from:', chunk.dedup_source_chunk_id);
} else if (chunk.embedding_recommendation === 'consider_reuse') {
console.log('80%+ reused - consider reusing embedding');
} else {
// Generate new embedding
console.log('Generate new embedding');
}
});
Deduplication Summary
Get aggregate statistics about content reuse:
const summary = await raptor.getDedupSummary(v2.variant_id);
console.log('Total chunks:', summary.total_chunks);
console.log('Reusable embeddings:', summary.embedding_recommendations.reuse);
console.log('Consider reuse:', summary.embedding_recommendations.consider_reuse);
console.log('New embeddings needed:', summary.embedding_recommendations.regenerate);
console.log('Sentence reuse ratio:', summary.sentence_reuse_ratio); // 0.7 = 70% reused
Working with Versions
List Document Versions
const versions = await raptor.listVersions(documentId);
versions.forEach(version => {
console.log('Version:', version.version_number);
console.log('Label:', version.version_label);
console.log('Variants:', version.variants.length);
});
Compare Versions
const comparison = await raptor.compareDocuments(v1.document_id, v2.document_id);
console.log('Similarity:', comparison.similarity_score);
console.log('Changes:', comparison.changes);
Get Document Lineage
const lineage = await raptor.getDocumentLineage(documentId);
console.log('Total versions:', lineage.total_versions);
lineage.documents.forEach(doc => {
console.log(doc.version_label, '→', doc.filename);
});
Stream Processing (Long Documents)
For large documents, stream progress updates:
for await (const update of raptor.processStream(file)) {
if (update.status === 'processing') {
console.log('Processing...');
} else if (update.status === 'completed') {
console.log('Completed!', update.data);
} else if (update.status === 'failed') {
console.error('Failed:', update.error);
}
}
React Integration
Use React hooks for easy integration:
import { useProcessingStatus } from '@raptor-data/ts-sdk/hooks';
function DocumentStatus({ variantId }) {
const { variant, loading, error } = useProcessingStatus(raptor, variantId);
if (loading) return <div>Processing...</div>;
if (error) return <div>Error: {error.message}</div>;
return (
<div>
Status: {variant.status}
Chunks: {variant.chunks_count}
</div>
);
}
Next Steps
Common Patterns
Pattern: Version Control for User Manuals
// Upload first version
const v1 = await raptor.process(manual_v1, {
versionLabel: 'v1.0',
chunkSize: 512
});
// Upload updated version with auto-linking
const v2 = await raptor.process(manual_v2, {
versionLabel: 'v2.0',
autoLink: true,
parentDocumentId: v1.document_id // Optional: explicit parent
});
// Get deduplication summary
const savings = await raptor.getDedupSummary(v2.variant_id);
console.log(`Cost savings: ${(savings.sentence_reuse_ratio * 100).toFixed(0)}%`);
// Find similar documents
const similar = await raptor.findSimilarDocuments(documentId, 0.8);
// Manually link as parent-child
if (similar.suggestions.length > 0) {
await raptor.linkToParent(
documentId,
similar.suggestions[0].documentId,
'v2.1'
);
}
Pattern: Reprocess with Different Configuration
// Create a new variant with different chunking
const variant = await raptor.reprocessDocument(documentId, {
chunkSize: 1024,
strategy: 'recursive',
tableExtraction: true
});
console.log('New variant created:', variant.variant_id);