Version Control
Raptor provides built-in version control for documents, similar to Git for code. Track document evolution, compare versions, and manage document lineage.
Core Concepts
Documents, Versions, and Variants
- Document: Represents a file and its complete version history
- Version: A specific iteration of a document (e.g., v1, v2, v3)
- Variant: Different processing configurations of the same version (e.g., different chunk sizes)
Document (contract.pdf)
├── Version 1 (original)
│ ├── Variant A (chunk_size=512)
│ └── Variant B (chunk_size=1024)
└── Version 2 (revised)
└── Variant A (chunk_size=512)
Creating Versions
Manual Linking
Link a new upload to an existing document:
import Raptor from '@raptor-data/ts-sdk';
const raptor = new Raptor({ apiKey: process.env.RAPTOR_API_KEY });
// Upload initial version
const v1 = await raptor.process('contract_v1.pdf', {
versionLabel: 'Initial Draft'
});
console.log(`Document ID: ${v1.documentId}`);
console.log(`Version: ${v1.versionNumber}`); // 1
// Upload new version with manual link
const v2 = await raptor.process('contract_v2.pdf', {
parentDocumentId: v1.documentId,
versionLabel: 'Client Revisions'
});
console.log(`Version: ${v2.versionNumber}`); // 2
console.log(`Parent: ${v2.parentDocumentId}`); // Same as v1.documentId
Auto-Linking
Raptor can automatically detect new versions:
// Upload original
const v1 = await raptor.process('contract_2024.pdf');
// Upload updated version (different filename)
const v2 = await raptor.process('contract_2024_revised.pdf');
if (v2.autoLinked) {
console.log('Auto-linked to parent!');
console.log(`Confidence: ${v2.autoLinkConfidence}`); // 0.92
console.log(`Method: ${v2.autoLinkMethod}`); // "metadata_and_content"
// Explanations
v2.autoLinkExplanation?.forEach(reason => {
console.log(` - ${reason}`);
});
// Output:
// - High filename similarity: 0.95
// - Upload time proximity: 2 hours apart
// - Content similarity: 94% overlap
}
See Auto-Linking for more details.
Listing Versions
Get All Versions
const versions = await raptor.listVersions('document-id');
console.log(`Total versions: ${versions.total_count}`);
versions.versions.forEach(version => {
console.log(`Version ${version.version_number}: ${version.filename}`);
console.log(` Created: ${version.created_at}`);
console.log(` Variants: ${version.variants.length}`);
// Primary variant info
const primary = version.variants.find(v => v.is_primary);
if (primary) {
console.log(` Chunks: ${primary.chunks_count}`);
console.log(` Status: ${primary.status}`);
}
});
Get Specific Version
const version = await raptor.getVersion('document-id', 2);
console.log(`Version 2: ${version.filename}`);
console.log(`Variants: ${version.variants.length}`);
version.variants.forEach(variant => {
console.log(`Variant ${variant.config_hash.substring(0, 8)}:`);
console.log(` Status: ${variant.status}`);
console.log(` Chunks: ${variant.chunks_count}`);
console.log(` Primary: ${variant.is_primary}`);
});
Version Labels
Add human-readable labels to versions:
// Set label during upload
const result = await raptor.process('contract.pdf', {
versionLabel: 'Final - Approved by Legal'
});
// Update label later
await raptor.updateVersionLabel('document-id', 'v2.1 - Bug Fixes');
console.log('Label updated');
Default Versions
Set which version is returned by default:
// Get current default
const doc = await raptor.getDocument('document-id');
console.log(`Default version: ${doc.version_number}`);
// Set version 3 as default
await raptor.setDefaultVersion('document-id', 3);
// Now queries return version 3
const updated = await raptor.getDocument('document-id');
console.log(`New default: ${updated.version_number}`); // 3
Deleting Versions
Delete Specific Version
await raptor.deleteVersion('document-id', 2);
console.log('Version 2 deleted');
// If deleted version was latest, previous version is promoted
const doc = await raptor.getDocument('document-id');
console.log(`New latest: ${doc.version_number}`); // 1 (promoted)
Delete Document
Deleting a document deletes all versions and variants:
await raptor.deleteDocument('document-id');
console.log('Document and all versions deleted');
Soft deletes: Raptor uses soft deletes. Deleted content is marked as deleted but metadata is preserved for audit trails and analytics.
Reverting Versions
Create a new version based on an old version:
// Revert to version 1
const reverted = await raptor.revertToVersion('document-id', 1);
console.log(`Reverted from version ${reverted.revertedFromVersion}`);
console.log(`New version: ${reverted.newVersionNumber}`);
console.log(`New version ID: ${reverted.newVersionId}`);
Comparing Versions
See what changed between versions:
const diff = await raptor.compareDocuments('doc-v1-id', 'doc-v2-id');
console.log(diff.summary);
console.log(`Similarity: ${(diff.similarityScore * 100).toFixed(1)}%`);
console.log(`\nAdded: ${diff.diff.addedCount} chunks`);
diff.diff.addedChunks.forEach(chunk => {
console.log(`+ [${chunk.chunk_index}] ${chunk.text.substring(0, 100)}...`);
});
console.log(`\nRemoved: ${diff.diff.removedCount} chunks`);
diff.diff.removedChunks.forEach(chunk => {
console.log(`- [${chunk.chunk_index}] ${chunk.text.substring(0, 100)}...`);
});
console.log(`\nModified: ${diff.diff.modifiedCount} chunks`);
Processing Variants
Multiple processing configurations for the same version:
Reprocess with Different Config
// Original processing
const v1 = await raptor.process('document.pdf', {
chunkSize: 512
});
// Reprocess with different chunk size (creates new variant)
const v2 = await raptor.process('document.pdf', {
chunkSize: 1024
});
// Same document, same version, different variants
console.log(v1.documentId === v2.documentId); // true
console.log(v1.versionId === v2.versionId); // true
console.log(v1.variantId === v2.variantId); // false (different variants)
List Variants
const version = await raptor.getVersion('document-id', 1);
version.variants.forEach(variant => {
console.log(`Variant ${variant.config_hash.substring(0, 8)}:`);
console.log(` Chunk size: ${variant.config.chunk_size}`);
console.log(` Strategy: ${variant.config.strategy}`);
console.log(` Chunks: ${variant.chunks_count}`);
console.log(` Primary: ${variant.is_primary}`);
});
Delete Variant
await raptor.deleteVariant('variant-id');
console.log('Variant deleted');
// If deleted variant was primary, another variant is promoted
Advanced: Document Lineage
Get Lineage
Get complete version history:
const lineage = await raptor.getDocumentLineage('document-id');
console.log(`Lineage ID: ${lineage.lineage_id}`);
console.log(`Total versions: ${lineage.total_versions}`);
lineage.documents.forEach(doc => {
console.log(`${doc.version_label || doc.filename}`);
console.log(` Parent: ${doc.parent_document_id || 'None'}`);
console.log(` Similarity: ${doc.similarity_score}`);
console.log(` Created: ${doc.created_at}`);
});
Lineage Tree
Visualize version relationships:
const tree = await raptor.getDocumentLineageTree('document-id');
function printTree(node, indent = '') {
console.log(`${indent}${node.filename}`);
node.children.forEach(child => printTree(child, indent + ' '));
}
tree.roots.forEach(root => printTree(root));
// Output:
// contract_v1.pdf
// contract_v2.pdf
// contract_v3.pdf
// contract_v2_alt.pdf
Lineage Stats
const stats = await raptor.getLineageStats('document-id');
console.log(`Total versions: ${stats.total_versions}`);
console.log(`Oldest version: ${stats.oldest_version.filename}`);
console.log(`Created: ${stats.oldest_version.created_at}`);
Lineage Changelog
Generate a changelog for entire lineage:
const changelog = await raptor.getLineageChangelog('document-id');
console.log(`${changelog.total_transitions} version transitions`);
changelog.changelogs.forEach(entry => {
console.log(`\n${entry.from_label} → ${entry.to_label}`);
console.log(`Similarity: ${entry.similarity_score}`);
console.log(`Added: ${entry.diff.added_count} chunks`);
console.log(`Removed: ${entry.diff.removed_count} chunks`);
console.log(`Modified: ${entry.diff.modified_count} chunks`);
});
Best Practices
Use version labels: Add meaningful labels like “v1.0”, “Draft”, “Final” to make versions easily identifiable.
Auto-linking is smart: Raptor uses filename similarity, upload time proximity, file size, and content sampling to detect versions with 85%+ accuracy.
Deleting is soft: Deleted versions are soft-deleted (marked as deleted but not removed). This preserves audit trails and billing records.