Skip to main content

Auto-Linking

Auto-linking is Raptor’s intelligent feature that automatically detects when an uploaded document is a new version of an existing document, eliminating manual parent-child linking.

How It Works

When you upload a document with autoLink: true, Raptor uses a two-stage matching algorithm:

Stage 1: Metadata Matching (Fast)

Raptor extracts and compares filename patterns:
user-manual-v1.pdf → user-manual-v2.pdf (MATCH)
contract_2023.docx → contract_2024.docx (MATCH)
report-draft.pdf → report-final.pdf (MATCH)
Common patterns detected:
  • Version numbers: v1.0, v2.0, 2023, 2024
  • Status indicators: draft, final, revised
  • Date patterns: 2024-01-15, jan-2024

Stage 2: Content Matching (Fallback)

If metadata matching fails or confidence is low, Raptor compares the first 2KB of extracted text content using fuzzy matching. Example:
Doc A: "Introduction\n\nThis user manual covers the installation..."
Doc B: "Introduction\n\nThis user manual covers installation and..."

Similarity: 92% → AUTO-LINKED

Configuration

Global Settings (Account Level)

Set default auto-linking preferences in your dashboard:
const raptor = new Raptor({ apiKey: 'your-key' });

// Update account defaults
await raptor.updateAutoLinkSettings({
  autoLinkEnabled: true,
  autoLinkThreshold: 0.85 // 85% similarity required
});

// Get current settings
const settings = await raptor.getAutoLinkSettings();
console.log(settings);
// { autoLinkEnabled: true, autoLinkThreshold: 0.85 }

Per-Request Override

Override account settings for specific uploads:
// Enable auto-linking for this upload only
const result = await raptor.process(file, {
  autoLink: true,
  autoLinkThreshold: 0.9 // Higher threshold for this document
});

// Disable auto-linking for this upload
const result2 = await raptor.process(file2, {
  autoLink: false // Skip auto-linking even if enabled globally
});

Response Format

When auto-linking succeeds, the response includes:
{
  document_id: "new-doc-uuid",
  parent_document_id: "parent-doc-uuid",
  auto_linked: true,
  auto_link_confidence: 0.92,
  auto_link_method: "metadata", // or "content"
  auto_link_explanation: "Matched filename pattern: contract_v1.pdf → contract_v2.pdf"
}
  • "metadata": Matched based on filename patterns (faster, more reliable)
  • "content": Matched based on content similarity (slower, used as fallback)
A score from 0.0 to 1.0 indicating match confidence:
  • 0.95+: Very high confidence (near-identical filenames or content)
  • 0.85-0.95: High confidence (clear version relationship)
  • 0.70-0.85: Medium confidence (similar but less obvious)
  • Below 0.70: Low confidence (auto-linking skipped)

Examples

Example 1: Version-Numbered Documents

// Upload v1.0
const v1 = await raptor.process(manual_v1_pdf, {
  versionLabel: 'v1.0',
  autoLink: true
});

// Upload v2.0 - automatically links to v1.0
const v2 = await raptor.process(manual_v2_pdf, {
  versionLabel: 'v2.0',
  autoLink: true
});

console.log(v2.auto_linked); // true
console.log(v2.auto_link_method); // "metadata"
console.log(v2.parent_document_id); // v1.document_id

Example 2: Manual Parent Override

If auto-linking picks the wrong parent, override it:
const result = await raptor.process(file, {
  autoLink: true,
  parentDocumentId: 'specific-parent-uuid' // Override auto-detection
});

Example 3: Finding Suggestions

Get suggestions before uploading:
// Upload document first
const doc = await raptor.process(file);

// Find similar documents
const similar = await raptor.findSimilarDocuments(doc.document_id, 0.8);

if (similar.suggestions.length > 0) {
  console.log('Potential parents:');
  similar.suggestions.forEach(s => {
    console.log(`- ${s.filename} (${s.similarityScore})`);
  });

  // Manually link to best match
  await raptor.linkToParent(
    doc.document_id,
    similar.suggestions[0].documentId,
    'v2.1'
  );
}

Benefits

1. Automatic Version History

No manual tracking needed:
// Just upload sequentially
await raptor.process(contract_2023, { autoLink: true });
await raptor.process(contract_2024, { autoLink: true });

// Lineage is automatically built
const lineage = await raptor.getDocumentLineage(contract_2024.document_id);
console.log(lineage.total_versions); // 2

2. Deduplication

Auto-linked documents benefit from deduplication:
const v2 = await raptor.process(manual_v2, { autoLink: true });

if (v2.auto_linked) {
  const dedupSummary = await raptor.getDedupSummary(v2.variant_id);
  console.log(`${dedupSummary.sentence_reuse_ratio * 100}% reused`);
}

3. Consistency

Ensures related documents are properly linked:
// Upload multiple related docs
const docs = [
  'policy-2023.pdf',
  'policy-2024.pdf',
  'policy-2024-revised.pdf'
];

for (const filename of docs) {
  await raptor.process(filename, { autoLink: true });
}

// All automatically linked in sequence

Troubleshooting

Auto-Linking Not Working

Issue: auto_linked: false even though documents are related Solutions:
  1. Lower the threshold:
    autoLinkThreshold: 0.75 // More lenient
    
  2. Check filename patterns:
    // Good patterns (will match)
    report-v1.pdfreport-v2.pdf
    manual_2023.docxmanual_2024.docx
    
    // Poor patterns (may not match)
    abc123.pdfxyz789.pdf
    document.pdffile.pdf
    
  3. Use manual linking:
    await raptor.linkToParent(childId, parentId, 'v2.0');
    

Wrong Parent Detected

Issue: Auto-linking chose the wrong parent document Solution: Override with explicit parent:
const result = await raptor.process(file, {
  autoLink: true,
  parentDocumentId: 'correct-parent-uuid' // Explicit override
});

Performance Concerns

Issue: Uploads slow when auto-linking enabled Explanation: Content-based matching can take 1-2 seconds for large corpora. Solutions:
  1. Use metadata matching (ensure good filename patterns)
  2. Reduce candidate pool (manually link if you know the parent)
  3. Disable for non-version uploads:
    autoLink: false // Skip auto-linking for unrelated docs
    

Best Practices

Adopt a naming convention for versions:
✅ Good patterns:
- product-manual-v1.0.pdf, product-manual-v1.1.pdf
- report-2024-01.docx, report-2024-02.docx
- contract-draft.pdf, contract-final.pdf

❌ Poor patterns:
- doc1.pdf, doc2.pdf
- untitled.pdf, untitled (1).pdf
Adjust based on your use case:
  • High precision (few false positives): 0.90+
  • Balanced (recommended): 0.85
  • High recall (catch more matches): 0.75
Always include version labels for clarity:
await raptor.process(file, {
  autoLink: true,
  versionLabel: 'v2.1' // Clear version identification
});

Next Steps