Skip to main content

Document Lineage

Document lineage tracks the complete evolution history of documents, similar to Git tracking code history. Understand parent-child relationships, compare versions, and visualize document evolution over time.

Core Concepts

Lineage Structure

Lineage (UUID-based group)
├── Document A (root)
│   ├── Document B (child of A)
│   │   └── Document D (child of B)
│   └── Document C (child of A)
└── Document E (root - different branch)
Each document can have:
  • One parent (except roots)
  • Multiple children
  • Version label (e.g., “v1.0”, “Draft”)
  • Similarity score to parent (0.0-1.0)

Get Lineage

Linear Lineage

Get all documents in chronological order:
import Raptor from '@raptor-data/ts-sdk';

const raptor = new Raptor({ apiKey: process.env.RAPTOR_API_KEY });

const lineage = await raptor.getDocumentLineage('document-id');

console.log(`Lineage ID: ${lineage.lineage_id}`);
console.log(`Total versions: ${lineage.total_versions}`);

lineage.documents.forEach(doc => {
  const label = doc.version_label || doc.filename;
  const parent = doc.parent_document_id ? `→ child of ${doc.parent_document_id}` : '(root)';

  console.log(`${label} ${parent}`);
  console.log(`  Similarity: ${(doc.similarity_score * 100).toFixed(1)}%`);
  console.log(`  Created: ${doc.created_at}`);
  console.log(`  Current: ${doc.is_current ? 'Yes' : 'No'}`);
});
Example output:
contract_v1.pdf (root)
  Similarity: N/A
  Created: 2024-01-15T10:00:00Z
  Current: No

contract_v2.pdf → child of doc-abc
  Similarity: 87.3%
  Created: 2024-01-20T14:30:00Z
  Current: No

contract_final.pdf → child of doc-def
  Similarity: 94.1%
  Created: 2024-01-25T09:15:00Z
  Current: Yes

Include Deleted Documents

const lineage = await raptor.getDocumentLineage('document-id', {
  includeDeleted: true
});

lineage.documents.forEach(doc => {
  const deleted = doc.deleted_at ? '(deleted)' : '';
  console.log(`${doc.filename} ${deleted}`);
});

Lineage Tree

Visualize parent-child relationships as a tree:
const tree = await raptor.getDocumentLineageTree('document-id');

console.log(`Total versions: ${tree.totalVersions}`);

function printTree(node, indent = '') {
  console.log(`${indent}${node.filename} (${node.versionLabel || 'no label'})`);
  node.children.forEach(child => printTree(child, indent + '  '));
}

tree.roots.forEach(root => printTree(root));
Example output:
contract_v1.pdf (Initial Draft)
  contract_v2.pdf (Client Review)
    contract_v3.pdf (Final)
  contract_v2_alt.pdf (Alternative Approach)
template.pdf (Template)
  instance_1.pdf (Instance 1)
  instance_2.pdf (Instance 2)

Tree Structure

interface LineageTreeNode {
  id: string;
  filename: string;
  contentHash: string;
  versionLabel: string | null;
  createdAt: string;
  children: LineageTreeNode[];
}

interface DocumentLineageTreeResponse {
  totalVersions: number;
  roots: LineageTreeNode[];
}

Lineage Stats

Get summary statistics:
const stats = await raptor.getLineageStats('document-id');

console.log(`Total versions: ${stats.total_versions}`);
console.log(`Oldest version: ${stats.oldest_version.filename}`);
console.log(`Created: ${stats.oldest_version.created_at}`);

Finding Similar Documents

Discover documents that might be related:
const similar = await raptor.findSimilarDocuments('document-id', {
  minSimilarity: 0.7,  // Minimum 70% similarity
  limit: 10
});

console.log(`Found ${similar.count} similar documents`);

similar.suggestions.forEach(doc => {
  console.log(`${doc.filename}`);
  console.log(`  Similarity: ${(doc.similarityScore * 100).toFixed(1)}%`);
  console.log(`  Created: ${doc.createdAt}`);
  console.log(`  Label: ${doc.versionLabel || 'N/A'}`);
});

Manual Linking

Create parent-child relationship manually:
const result = await raptor.linkToParent('child-id', {
  parentDocumentId: 'parent-id',
  versionLabel: 'v2.0 - Updated'
});

console.log(`Linked ${result.documentId} to ${result.parentDocumentId}`);
console.log(`Lineage ID: ${result.lineageId}`);
console.log(`Similarity: ${result.similarityScore}`);
console.log(`Label: ${result.versionLabel}`);
Remove document from its lineage (creates new lineage):
const result = await raptor.unlinkFromLineage('document-id');

console.log(`Document ${result.documentId} unlinked`);
console.log(`New lineage ID: ${result.newLineageId}`);
console.log(`Parent: ${result.parentDocumentId}`); // null

Lineage Changelog

Generate a changelog showing all version transitions:
const changelog = await raptor.getLineageChangelog('document-id');

console.log(`${changelog.totalTransitions} version transitions\n`);

changelog.changelogs.forEach(entry => {
  console.log(`${entry.from_label || entry.from_filename}${entry.to_label || entry.to_filename}`);
  console.log(`  Similarity: ${(entry.similarity_score * 100).toFixed(1)}%`);
  console.log(`  Date: ${entry.created_at}`);

  const diff = entry.diff;
  console.log(`  Changes:`);
  console.log(`    + ${diff.added_count} chunks added`);
  console.log(`    - ${diff.removed_count} chunks removed`);
  console.log(`    ~ ${diff.modified_count} chunks modified`);

  if (diff.summary) {
    console.log(`  Summary: ${diff.summary}`);
  }
  console.log();
});
Example output:
Initial Draft → Client Review
  Similarity: 87.3%
  Date: 2024-01-20T14:30:00Z
  Changes:
    + 12 chunks added
    - 5 chunks removed
    ~ 8 chunks modified
  Summary: Added client feedback sections

Client Review → Final
  Similarity: 94.1%
  Date: 2024-01-25T09:15:00Z
  Changes:
    + 3 chunks added
    - 2 chunks removed
    ~ 4 chunks modified
  Summary: Minor corrections and formatting

Comparing Documents

See detailed differences between any two documents:
const diff = await raptor.compareDocuments('doc1-id', 'doc2-id');

console.log(diff.summary);
console.log(`Similarity: ${(diff.similarityScore * 100).toFixed(1)}%`);

// Added chunks
console.log(`\nAdded (${diff.diff.addedCount}):`);
diff.diff.addedChunks.forEach(chunk => {
  console.log(`+ [Chunk ${chunk.chunk_index}, Page ${chunk.page_number}]`);
  console.log(`  ${chunk.text.substring(0, 150)}...`);
});

// Removed chunks
console.log(`\nRemoved (${diff.diff.removedCount}):`);
diff.diff.removedChunks.forEach(chunk => {
  console.log(`- [Chunk ${chunk.chunk_index}, Page ${chunk.page_number}]`);
  console.log(`  ${chunk.text.substring(0, 150)}...`);
});

// Modified chunks
console.log(`\nModified (${diff.diff.modifiedCount}):`);
diff.diff.modifiedChunks.forEach(pair => {
  console.log(`~ [Chunk ${pair.old.chunk_index}]`);
  console.log(`  Before: ${pair.old.text.substring(0, 100)}...`);
  console.log(`  After:  ${pair.new.text.substring(0, 100)}...`);
  console.log(`  Similarity: ${(pair.similarity * 100).toFixed(1)}%`);
});

Practical Examples

Track Contract Evolution

// Upload contract versions
const v1 = await raptor.process('contract_draft.pdf', {
  versionLabel: 'Initial Draft'
});

const v2 = await raptor.process('contract_client_review.pdf', {
  parentDocumentId: v1.documentId,
  versionLabel: 'Client Review'
});

const v3 = await raptor.process('contract_final.pdf', {
  parentDocumentId: v2.documentId,
  versionLabel: 'Final - Signed'
});

// Get complete history
const lineage = await raptor.getDocumentLineage(v3.documentId);

lineage.documents.forEach(doc => {
  console.log(`${doc.version_label}: ${doc.filename}`);
});

// Generate changelog
const changelog = await raptor.getLineageChangelog(v3.documentId);
console.log(`\nComplete changelog:`);
changelog.changelogs.forEach(entry => {
  console.log(`\n${entry.from_label}${entry.to_label}`);
  console.log(`Changes: +${entry.diff.added_count}, -${entry.diff.removed_count}`);
});

Visualize Document Tree

async function visualizeLineage(documentId: string) {
  const tree = await raptor.getDocumentLineageTree(documentId);

  console.log(`\nDocument Lineage (${tree.totalVersions} versions):`);
  console.log('='.repeat(50));

  function printNode(node, prefix = '', isLast = true) {
    const connector = isLast ? '└── ' : '├── ';
    const label = node.versionLabel || 'Unlabeled';

    console.log(`${prefix}${connector}${node.filename} (${label})`);

    const newPrefix = prefix + (isLast ? '    ' : '│   ');
    node.children.forEach((child, index) => {
      const childIsLast = index === node.children.length - 1;
      printNode(child, newPrefix, childIsLast);
    });
  }

  tree.roots.forEach((root, index) => {
    const isLast = index === tree.roots.length - 1;
    printNode(root, '', isLast);
  });
}

await visualizeLineage('document-id');
Output:
Document Lineage (7 versions):
==================================================
└── contract_v1.pdf (Initial Draft)
    ├── contract_v2.pdf (Client Review)
    │   └── contract_v3.pdf (Final)
    └── contract_v2_alt.pdf (Alternative)
└── template.pdf (Template)
    ├── instance_1.pdf (Instance 1)
    └── instance_2.pdf (Instance 2)
async function findRelatedDocs(documentId: string) {
  // Get documents in same lineage
  const lineage = await raptor.getDocumentLineage(documentId);

  console.log('Documents in lineage:');
  lineage.documents.forEach(doc => {
    console.log(`  - ${doc.filename} (${doc.similarity_score}% similar)`);
  });

  // Find similar documents outside lineage
  const similar = await raptor.findSimilarDocuments(documentId, {
    minSimilarity: 0.7,
    limit: 5
  });

  console.log('\nSimilar documents (outside lineage):');
  similar.suggestions.forEach(doc => {
    console.log(`  - ${doc.filename} (${doc.similarityScore * 100}% similar)`);
  });
}

await findRelatedDocs('document-id');

Audit Document History

async function auditHistory(documentId: string) {
  const lineage = await raptor.getDocumentLineage(documentId, {
    includeDeleted: true
  });

  console.log('Document History Audit');
  console.log('='.repeat(50));

  lineage.documents.forEach((doc, index) => {
    console.log(`\n${index + 1}. ${doc.filename}`);
    console.log(`   ID: ${doc.id}`);
    console.log(`   Label: ${doc.version_label || 'N/A'}`);
    console.log(`   Parent: ${doc.parent_document_id || 'None (root)'}`);
    console.log(`   Similarity: ${doc.similarity_score ? `${doc.similarity_score * 100}%` : 'N/A'}`);
    console.log(`   Created: ${doc.created_at}`);
    console.log(`   Status: ${doc.deleted_at ? 'Deleted' : 'Active'}`);
    console.log(`   Current: ${doc.is_current ? 'Yes' : 'No'}`);
  });

  // Get stats
  const stats = await raptor.getLineageStats(documentId);
  console.log(`\nTotal versions: ${stats.total_versions}`);
  console.log(`Oldest: ${stats.oldest_version.filename} (${stats.oldest_version.created_at})`);
}

await auditHistory('document-id');

Best Practices

Use version labels: Add meaningful labels to make lineage easier to understand.
Similarity scores: Scores above 0.85 (85%) indicate strong version relationships.
Circular references: You cannot create circular lineage (A → B → A). Raptor prevents this automatically.