Skip to main content

Version Control

Raptor provides Git-like version control for documents, tracking changes over time and enabling comparisons, rollbacks, and lineage visualization.

Hierarchy

Raptor uses a three-level hierarchy:
Document (logical entity)
  └─ Version (content snapshot)
      └─ Variant (processing configuration)
          └─ Chunks (processed content)

Document

The logical entity representing a conceptual document:
  • Unique ID: Persistent identifier across all versions
  • Latest Version: Points to the current version
  • Lineage: Links to related documents (parent/children)
{
  id: "doc-uuid",
  filename: "user-manual.pdf",
  latest_version_number: 3,
  lineage_id: "lineage-uuid",
  created_at: "2024-01-01T00:00:00Z"
}

Version

A content snapshot of the document at a point in time:
  • Version Number: Auto-incremented (1, 2, 3…)
  • Version Label: Human-readable label (“v1.0”, “draft”, “final”)
  • Content Hash: SHA-256 hash of file content
  • Parent Version: Link to previous version
{
  id: "version-uuid",
  document_id: "doc-uuid",
  version_number: 2,
  version_label: "v2.0",
  parent_version_id: "version-1-uuid",
  content_hash: "abc123...",
  filename: "user-manual-v2.pdf",
  file_size_bytes: 1024000,
  created_at: "2024-02-01T00:00:00Z"
}

Variant

A processing configuration applied to a version:
  • Config Hash: SHA-256 hash of processing parameters
  • Status: pending, processing, completed, failed
  • Primary Flag: Whether this is the default variant
  • Deduplication Stats: Reuse metrics
{
  id: "variant-uuid",
  version_id: "version-uuid",
  config_hash: "def456...",
  config: {
    chunk_size: 512,
    chunk_overlap: 50,
    strategy: "semantic",
    table_extraction: true
  },
  status: "completed",
  chunks_count: 105,
  total_tokens: 25000,
  is_primary: true,
  dedup_stats: {
    sentence_reuse_ratio: 0.87
  }
}

Creating Versions

Let Raptor detect versions automatically:
// Upload first version
const v1 = await raptor.process(manual_v1, {
  versionLabel: 'v1.0'
});

// Upload second version - auto-detected as new version
const v2 = await raptor.process(manual_v2, {
  autoLink: true,
  versionLabel: 'v2.0'
});

console.log(v2.version_number); // 2
console.log(v2.parent_document_id); // v1.document_id

Manual Linking

Explicitly specify the parent:
const v2 = await raptor.process(manual_v2, {
  parentDocumentId: v1.document_id,
  versionLabel: 'v2.0'
});

Retroactive Linking

Link documents after upload:
// Find similar documents
const similar = await raptor.findSimilarDocuments(docId, 0.85);

// Link to parent
await raptor.linkToParent(
  docId,
  similar.suggestions[0].documentId,
  'v2.1'
);

Listing Versions

Get all versions of a document:
const { versions } = await raptor.listVersions(documentId);

versions.forEach(version => {
  console.log(`Version ${version.version_number}: ${version.version_label}`);
  console.log(`  Variants: ${version.variants.length}`);
  console.log(`  Created: ${version.created_at}`);
});
Response:
{
  "versions": [
    {
      "id": "version-1-uuid",
      "version_number": 1,
      "version_label": "v1.0",
      "filename": "manual-v1.pdf",
      "created_at": "2024-01-01T00:00:00Z",
      "variants": [
        {
          "id": "variant-uuid",
          "status": "completed",
          "chunks_count": 100,
          "is_primary": true
        }
      ]
    },
    {
      "id": "version-2-uuid",
      "version_number": 2,
      "version_label": "v2.0",
      "filename": "manual-v2.pdf",
      "created_at": "2024-02-01T00:00:00Z",
      "variants": [
        {
          "id": "variant-2-uuid",
          "status": "completed",
          "chunks_count": 105,
          "is_primary": true
        }
      ]
    }
  ],
  "total_count": 2
}

Managing Versions

Set Default Version

Change which version is considered “current”:
await raptor.setDefaultVersion(documentId, 2);

Update Version Label

Add or change version labels:
await raptor.updateVersionLabel(documentId, 'v2.1-beta');

Delete Version

Remove a version (soft delete with auto-promotion):
await raptor.deleteVersion(documentId, 2);
// If v2 was the latest, v1 becomes the new latest

Revert to Previous Version

Create a new version from an old one:
await raptor.revertToVersion(documentId, 1);
// Creates new version 3 with content from version 1

Version Comparison

Compare two versions to see changes:
const comparison = await raptor.compareDocuments(
  v1.document_id,
  v2.document_id
);

console.log(comparison);
Response:
{
  "doc1": {
    "id": "doc-1-uuid",
    "filename": "manual-v1.pdf",
    "version_number": 1,
    "chunks_count": 100
  },
  "doc2": {
    "id": "doc-2-uuid",
    "filename": "manual-v2.pdf",
    "version_number": 2,
    "chunks_count": 105
  },
  "similarity_score": 0.89,
  "changes": {
    "added_chunks": 10,
    "removed_chunks": 5,
    "modified_chunks": 8,
    "unchanged_chunks": 87
  }
}

Lineage

View Lineage

Get the complete version history:
const lineage = await raptor.getDocumentLineage(documentId);

console.log(`Total versions: ${lineage.total_versions}`);
lineage.documents.forEach(doc => {
  console.log(`${doc.version_label}: ${doc.filename}`);
});

Lineage Tree

Visualize parent-child relationships:
const tree = await raptor.getDocumentLineageTree(documentId);

console.log(tree);
Response:
{
  "totalVersions": 4,
  "roots": [
    {
      "id": "doc-1-uuid",
      "filename": "manual-v1.0.pdf",
      "versionLabel": "v1.0",
      "children": [
        {
          "id": "doc-2-uuid",
          "filename": "manual-v1.1.pdf",
          "versionLabel": "v1.1",
          "children": [
            {
              "id": "doc-3-uuid",
              "filename": "manual-v2.0.pdf",
              "versionLabel": "v2.0",
              "children": []
            }
          ]
        }
      ]
    }
  ]
}

Lineage Stats

Get statistics about the lineage:
const stats = await raptor.getLineageStats(documentId);

console.log(stats);
Response:
{
  "totalVersions": 4,
  "oldestVersion": {
    "id": "doc-1-uuid",
    "filename": "manual-v1.0.pdf",
    "createdAt": "2024-01-01T00:00:00Z"
  },
  "averageSimilarity": 0.91,
  "totalChunks": 420
}

Lineage Changelog

Generate a changelog across all versions:
const changelog = await raptor.getLineageChangelog(documentId);

console.log(changelog);

Unlinking

Remove a document from its lineage:
await raptor.unlinkFromLineage(documentId);
// Document becomes its own root with new lineage_id

Multi-Variant Processing

Process the same version with different configurations:

Create Variants

// Primary variant with semantic chunking
const result1 = await raptor.process(file, {
  chunkSize: 512,
  strategy: 'semantic'
});

// Alternative variant with recursive chunking
const result2 = await raptor.reprocessDocument(result1.document_id, {
  chunkSize: 1024,
  strategy: 'recursive'
});

console.log('Primary variant:', result1.variant_id);
console.log('Alternative variant:', result2.variant_id);

Get Variant

Retrieve specific variant details:
const variant = await raptor.getVariant(variantId);

console.log('Config:', variant.config);
console.log('Status:', variant.status);
console.log('Chunks:', variant.chunks_count);

Get Variant Chunks

Retrieve chunks from a specific variant:
const { chunks } = await raptor.getChunks(variantId, {
  limit: 100,
  offset: 0
});

chunks.forEach(chunk => {
  console.log(chunk.text);
});

Delete Variant

Remove a variant (auto-promotes if primary):
await raptor.deleteVariant(variantId);
// If this was the primary variant, most recent becomes primary

Use Cases

Use Case 1: Policy Document Timeline

// Track policy changes over years
const policies = [
  { file: policy_2022, label: '2022' },
  { file: policy_2023, label: '2023' },
  { file: policy_2024, label: '2024' }
];

let previousId = null;
for (const { file, label } of policies) {
  const result = await raptor.process(file, {
    parentDocumentId: previousId,
    versionLabel: label
  });
  previousId = result.document_id;
}

// View complete history
const lineage = await raptor.getDocumentLineage(previousId);
console.log(`${lineage.total_versions} years of policy changes`);

Use Case 2: A/B Testing Chunking Strategies

// Test different chunking on same document
const doc = await raptor.process(file, {
  chunkSize: 512,
  strategy: 'semantic'
});

// Create alternative variants
const variant1 = await raptor.reprocessDocument(doc.document_id, {
  chunkSize: 1024,
  strategy: 'recursive'
});

const variant2 = await raptor.reprocessDocument(doc.document_id, {
  chunkSize: 256,
  strategy: 'semantic',
  tableExtraction: false
});

// Compare RAG performance
// ... then delete non-optimal variants
await raptor.deleteVariant(variant2.variant_id);

Use Case 3: Draft → Final Workflow

// Upload draft
const draft = await raptor.process(draft_file, {
  versionLabel: 'draft'
});

// Upload final version
const final = await raptor.process(final_file, {
  parentDocumentId: draft.document_id,
  versionLabel: 'final'
});

// Get deduplication summary
const dedupSummary = await raptor.getDedupSummary(final.variant_id);
console.log(`${dedupSummary.sentence_reuse_ratio * 100}% unchanged from draft`);

// Set final as default
await raptor.setDefaultVersion(final.document_id, 2);

Best Practices

Always add descriptive version labels:
Good labels:
- 'v1.0', 'v1.1', 'v2.0'
- '2024-Q1', '2024-Q2'
- 'draft', 'review', 'final'
- 'pre-release', 'release', 'patch-1'

Avoid:
- No label (confusing)
- Generic labels like 'new', 'updated'
Use the same chunking config across versions for better deduplication:
const config = {
  chunkSize: 512,
  chunkOverlap: 50,
  strategy: 'semantic'
};

// Apply to all versions
const v1 = await raptor.process(file1, config);
const v2 = await raptor.process(file2, { ...config, parentDocumentId: v1.document_id });
Long lineages (>10 versions) may slow processing:
const stats = await raptor.getLineageStats(docId);

if (stats.totalVersions > 10) {
  console.warn('Consider creating a new lineage for major revisions');
}
Delete unused variants to reduce storage:
const { versions } = await raptor.listVersions(docId);

for (const version of versions) {
  // Keep only primary variant
  const nonPrimaryVariants = version.variants.filter(v => !v.is_primary);
  for (const variant of nonPrimaryVariants) {
    await raptor.deleteVariant(variant.id);
  }
}

Next Steps