Skip to main content

Auto-Linking

Auto-linking automatically detects when you’re uploading a new version of an existing document—no manual tracking required. Raptor achieves 85%+ accuracy using metadata and content analysis.

How It Works

When you upload a document, Raptor analyzes:
  1. Metadata signals (fast)
    • Filename similarity
    • Upload time proximity
    • File size range
    • User upload patterns
  2. Content sampling (medium speed)
    • First 2KB text comparison
    • Trigram similarity
    • Chunk overlap detection
  3. Confidence scoring
    • Combines signals into 0-100% confidence
    • Auto-links if above threshold (default: 85%)

Basic Usage

Auto-linking is enabled by default:
import Raptor from '@raptor-data/ts-sdk';

const raptor = new Raptor({ apiKey: process.env.RAPTOR_API_KEY });

// Upload original contract
const v1 = await raptor.process('contract_2024.pdf');
// documentId: "doc-abc"

// Upload updated contract (different filename!)
const v2 = await raptor.process('contract_2024_revised.pdf');

// Auto-linking detected the relationship
if (v2.autoLinked) {
  console.log('Automatically linked to parent!');
  console.log(`Parent: ${v2.parentDocumentId}`); // "doc-abc"
  console.log(`Confidence: ${(v2.autoLinkConfidence * 100).toFixed(0)}%`); // "92%"
  console.log(`Method: ${v2.autoLinkMethod}`); // "metadata_and_content"

  // See why it was linked
  v2.autoLinkExplanation?.forEach(reason => console.log(`  - ${reason}`));
  // Output:
  //   - High filename similarity: 0.95
  //   - Upload time proximity: 2 hours apart
  //   - Same file size range: within 10%
  //   - Content similarity: 94% chunk overlap
}

Get Current Settings

const settings = await raptor.getAutoLinkSettings();

console.log(`Enabled: ${settings.autoLinkEnabled}`);
console.log(`Threshold: ${settings.autoLinkThreshold}`); // 0.85 (85%)

Update Settings

Change auto-linking behavior for your account:
await raptor.updateAutoLinkSettings({
  autoLinkEnabled: true,
  autoLinkThreshold: 0.90  // Require 90% confidence
});

console.log('Auto-link settings updated');

Per-Upload Override

Override settings for a specific upload:
// Require very high confidence for this upload
const result = await raptor.process('sensitive-doc.pdf', {
  autoLink: true,
  autoLinkThreshold: 0.95  // Require 95% confidence
});

// Disable auto-link for this upload
const standalone = await raptor.process('new-doc.pdf', {
  autoLink: false  // Don't auto-link this one
});
interface ProcessResult {
  // Auto-linking metadata
  autoLinked: boolean;
  autoLinkConfidence?: number;
  autoLinkExplanation?: string[];
  autoLinkMethod?: 'metadata' | 'metadata_and_content' | 'content_only' | 'none';
  parentDocumentId?: string;

  // ... other fields
}

Response Fields

autoLinked
boolean
Whether auto-linking detected and linked to a parent
Final confidence score (0.0-1.0)
Human-readable reasons for the linking decision
Detection method used:
  • metadata: Linked based on metadata only
  • metadata_and_content: Combined metadata and content analysis
  • content_only: Linked based on content similarity (low metadata confidence)
  • none: No parent detected
parentDocumentId
string
ID of the detected parent document

Detection Methods

Metadata-Only Linking

Very high confidence from metadata alone (95%+):
// Same filename, uploaded 10 minutes apart, same size
const v1 = await raptor.process('contract.pdf');
const v2 = await raptor.process('contract.pdf');

console.log(v2.autoLinkMethod); // "metadata"
console.log(v2.autoLinkConfidence); // 0.98

Metadata + Content Linking

Medium metadata confidence, boosted by content analysis:
// Different filename, but similar content
const v1 = await raptor.process('contract_draft.pdf');
const v2 = await raptor.process('contract_final.pdf');

console.log(v2.autoLinkMethod); // "metadata_and_content"
console.log(v2.autoLinkConfidence); // 0.87

v2.autoLinkExplanation?.forEach(e => console.log(e));
// Output:
//   - Metadata confidence: 0.78
//   - Content overlap: 0.95
//   - Content boost: +0.09
//   - Final confidence: 0.87

Content-Only Linking

Low metadata confidence, but very high content overlap:
// Completely different filename, but identical content
const v1 = await raptor.process('old_name.pdf');
const v2 = await raptor.process('completely_different_name.pdf');

console.log(v2.autoLinkMethod); // "content_only"
console.log(v2.autoLinkConfidence); // 0.94

Examples

Track Contract Revisions

const raptor = new Raptor({ apiKey: process.env.RAPTOR_API_KEY });

// Initial draft
const draft = await raptor.process('contract_draft_v1.pdf', {
  versionLabel: 'Initial Draft'
});

// Client revisions (auto-linked)
const revised = await raptor.process('contract_revised_by_client.pdf', {
  versionLabel: 'Client Revisions'
});

if (revised.autoLinked) {
  console.log(`Auto-linked with ${revised.autoLinkConfidence * 100}% confidence`);

  // Get version history
  const lineage = await raptor.getDocumentLineage(revised.documentId);
  console.log(`Version ${lineage.total_versions} of ${lineage.total_versions}`);
}

Handle High-Confidence Linking

const result = await raptor.process('document.pdf');

if (result.autoLinked && result.autoLinkConfidence >= 0.95) {
  console.log('Very confident auto-link!');
  console.log('Explanation:', result.autoLinkExplanation);

  // Safe to assume this is a new version
  await sendNotification({
    message: `New version uploaded: v${result.versionNumber}`,
    confidence: result.autoLinkConfidence
  });
}
const result = await raptor.process('updated-doc.pdf');

if (result.autoLinked) {
  console.log(`Linked to: ${result.parentDocumentId}`);
  console.log(`Confidence: ${result.autoLinkConfidence}`);
  console.log(`Method: ${result.autoLinkMethod}`);

  console.log('\nReasons:');
  result.autoLinkExplanation?.forEach((reason, i) => {
    console.log(`${i + 1}. ${reason}`);
  });

  // Verify the link is correct
  const parent = await raptor.getDocument(result.parentDocumentId);
  console.log(`\nParent document: ${parent.filename}`);

  // If incorrect, you can unlink
  if (needsManualReview) {
    await raptor.unlinkFromLineage(result.documentId);
    console.log('Unlinked from incorrect parent');
  }
}

Disable for Specific Use Cases

// Don't auto-link for template documents
const template = await raptor.process('template.pdf', {
  autoLink: false,
  versionLabel: 'Template'
});

// Don't auto-link for bulk imports
async function bulkImport(files: File[]) {
  for (const file of files) {
    await raptor.process(file, {
      autoLink: false  // Treat each as independent
    });
  }
}

Confidence Thresholds

ThresholdUse CaseBehavior
0.95+Very high confidenceMetadata-only linking
0.85-0.95High confidenceMetadata + content linking (default)
0.70-0.85Medium confidenceContent-only fallback
Below 0.70Low confidenceNo link created

Choosing a Threshold

// Conservative (fewer false positives)
await raptor.updateAutoLinkSettings({
  autoLinkThreshold: 0.95  // Only link if very confident
});

// Balanced (recommended)
await raptor.updateAutoLinkSettings({
  autoLinkThreshold: 0.85  // Default
});

// Aggressive (more links, some false positives)
await raptor.updateAutoLinkSettings({
  autoLinkThreshold: 0.75  // Link more liberally
});

Troubleshooting

Document Not Auto-Linked

If a document should have been linked but wasn’t:
const result = await raptor.process('document.pdf');

if (!result.autoLinked) {
  console.log('Not auto-linked');
  console.log(`Confidence: ${result.autoLinkConfidence || 'N/A'}`);

  // Check threshold
  const settings = await raptor.getAutoLinkSettings();
  console.log(`Threshold: ${settings.autoLinkThreshold}`);

  // Manual link if needed
  if (result.autoLinkConfidence >= 0.70) {
    // Link manually
    await raptor.linkToParent(result.documentId, 'parent-id', 'v2.0');
  }
}

Wrong Parent Linked

If auto-linking detected the wrong parent:
const result = await raptor.process('document.pdf');

if (result.autoLinked) {
  // Verify parent
  const parent = await raptor.getDocument(result.parentDocumentId);
  console.log(`Linked to: ${parent.filename}`);

  if (isWrongParent(parent)) {
    // Unlink from wrong parent
    await raptor.unlinkFromLineage(result.documentId);

    // Link to correct parent
    await raptor.linkToParent(result.documentId, 'correct-parent-id', 'v2.0');
  }
}

Too Many False Positives

Increase the threshold:
await raptor.updateAutoLinkSettings({
  autoLinkThreshold: 0.95  // Require higher confidence
});
Lower the threshold or check file naming:
// Lower threshold
await raptor.updateAutoLinkSettings({
  autoLinkThreshold: 0.75
});

// Or use consistent naming
// Good: contract_v1.pdf, contract_v2.pdf
// Bad: abc.pdf, xyz.pdf

Best Practices

Consistent naming helps: Use patterns like contract_v1.pdf, contract_v2.pdf for higher confidence detection.
Upload timing matters: Files uploaded close together (within 24 hours) get a confidence boost.
Review high-stakes links: For critical documents, review auto-link decisions before proceeding.