Skip to content

File Lineage & Version Tracking: Complete Recovery from Any Point in Time #44

@djdarcy

Description

@djdarcy

Summary

As preserve evolves beyond simple COPY/MOVE operations, we need to address the core promise: help users get a file back to its original source - or to where it was at any point in its journey.

This is the essence of preserve's value proposition:

"Track files and always get them back to where you need them."

Problem Statement

Currently, preserve tracks source→destination mappings within a single operation. But real-world file management involves:

  1. Multi-step workflows: Files move through multiple locations over time
  2. Content evolution: Files get modified between operations
  3. Path structure changes: Flat backups lose original directory structure
  4. Cross-manifest relationships: Each operation creates a new manifest with no link to previous

Example Workflow That Breaks Today

1. MOVE orig/ → backup/ (--flat)     # Path structure lost
2. RESTORE backup/ → restored/       # New location created
3. User modifies files in restored/  # Content changes
4. MOVE restored/ → backup/ (--flat) # Overwrites previous backup

After step 4, the manifest only knows restored/ → backup/. The original path orig/projects/foo/main.py is lost. The pre-modification version is lost. The connection between manifests is lost.

Vision: File Journey Tracking

Preserve should become a system that tracks a file's complete journey:

File Identity (hash-based)
    ├── Original Location: orig/projects/foo/main.py
    ├── First Move: backup/main.py (manifest #1, 2025-01-01)
    ├── Restored To: restored/main.py (manifest #1, 2025-01-02)
    ├── Modified: hash changed ABC→XYZ (2025-01-03)
    └── Second Move: backup/main.py (manifest #2, 2025-01-04)

With this, users can:

  • "Show me where this file originally came from"
  • "Restore this file to where it was on January 2nd"
  • "What happened to files from my orig/ folder?"

Proposed Features (0.8.x)

Phase 1: Manifest Chaining

  • Manifests reference parent manifests
  • lineage field tracks original source through chain
  • Query: "trace file back to origin"

Phase 2: Content-Addressed Identity

  • File identity = content hash (not path)
  • Moving a file doesn't change its identity
  • Modifying creates new identity with link to previous

Phase 3: Version Tracking (Optional)

  • Store previous versions in .preserve/versions/
  • Or: track that version existed without storing content
  • User choice: full versioning vs. metadata-only

Phase 4: Relink Resolution

  • Given a file, find all its historical locations
  • "Relink" a file back to any previous location
  • Integration point with dazzlelink's relinker vision

Relationship to 0.7.x Work

This builds on the foundation from 0.7.x:

The 0.7.x work addresses "what files exist now and how do I handle conflicts."
The 0.8.x work addresses "what is a file's identity and history across time."

Technical Considerations

Manifest Schema Evolution

{
  "schema_version": "3.0",
  "parent_manifests": [...],
  "files": {
    "main.py": {
      "content_hash": "sha256:abc123...",
      "source_path": "restored/main.py",
      "dest_path": "backup/main.py",
      "lineage": {
        "original_source": "orig/projects/foo/main.py",
        "version_history": [
          {"hash": "sha256:xyz789...", "timestamp": "2025-01-01", "manifest": "001"}
        ]
      }
    }
  }
}

Storage Considerations

  • Metadata tracking: minimal overhead
  • Full version storage: significant for large files
  • Hybrid: store versions for small files, metadata for large

Dazzlelink Integration

Dazzlelink's "relinker" vision aligns with this:

  • Content-addressed storage
  • Cross-location file resolution
  • Similarity mapping for related versions

Analysis

See these analysis documents for detailed exploration:

  • 2025-12-18__04-10-38__file-versioning-lineage-tracking-analysis.md
  • 2025-12-18__03-50-08__destination-aware-ops-collision-handling-analysis.md
  • 2025-12-16__18-44-05__manifest-incorporate-split-location-recovery.md
  • 2025-12-17__17-15-00__move-rollback-resume-analysis.md

Success Criteria

  1. User can trace any file back to its original location
  2. User can see a file's journey through the system
  3. User can restore to any point in the file's history
  4. System handles content changes gracefully
  5. Works across machines (portable manifests)

Related Work

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    ⚙️ 0.8.x: Lineage

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions