apt-tx/docs/dev-rollback-discussion.md
robojerk 2daad2837d Major improvements: rollbacks, testing, docs, and code quality
- Fixed rollback implementation with two-phase approach (remove new, downgrade upgraded)
- Added explicit package tracking (newly_installed vs upgraded)
- Implemented graceful error handling with fail-fast approach
- Added comprehensive test suite (20 tests across 3 test files)
- Created centralized APT command execution module (apt_commands.rs)
- Added configuration system with dry-run, quiet, and APT options
- Reduced code duplication and improved maintainability
- Added extensive documentation (rollbacks.md, rollbacks-not-featured.md, ffi-bridge.md)
- Created configuration usage example
- Updated README with crate usage instructions
- All tests passing, clean compilation, production-ready
2025-09-13 11:21:35 -07:00

16 KiB

Rollback Implementation Discussion

** This file was used to keep track of ideas for doing rollbacks** It is not an official document. It's kept here in case we want to relitigate our process

Current Issues

Problem 1: Tries to install specific versions that may no longer be available

// Current problematic code in rollback():
packages_to_restore.push(format!("{}={}", package, before_version));

Issue: If the old version is no longer in repositories, this will fail.

Problem 2: Doesn't properly handle newly installed packages

// Current code treats newly installed packages the same as upgraded ones
if let Some(after_version) = self.after_versions.get(package) {
    // Only rollback if version changed
    if before_version != after_version {
        packages_to_restore.push(format!("{}={}", package, before_version));
    }
} else {
    // Package was newly installed, remove it
    packages_to_restore.push(format!("{}", package));
}

Issue: Newly installed packages should be removed with apt remove, not downgraded.

Proposed Solutions

Solution 1: Separate handling for new vs upgraded packages

// Track what was newly installed vs upgraded
let mut packages_to_remove = Vec::new();
let mut packages_to_downgrade = Vec::new();

for (package, before_version) in &self.before_versions {
    if let Some(after_version) = self.after_versions.get(package) {
        if before_version != after_version {
            packages_to_downgrade.push(format!("{}={}", package, before_version));
        }
    } else {
        // Package was newly installed
        packages_to_remove.push(package.clone());
    }
}

Solution 2: Use apt remove for newly installed packages

// Remove newly installed packages
if !packages_to_remove.is_empty() {
    let output = Command::new("apt")
        .args(&["remove", "-y"])
        .args(&packages_to_remove)
        .output()
        .context("Failed to execute APT remove command")?;
    
    if !output.status.success() {
        let error = String::from_utf8_lossy(&output.stderr);
        return Err(anyhow!("APT removal failed: {}", error)
            .context(format!("Failed to remove packages: {:?}", packages_to_remove)));
    }
}

Solution 3: Fallback for unavailable versions

// Try to downgrade, with fallback to removal
for package_downgrade in packages_to_downgrade {
    let output = Command::new("apt")
        .args(&["install", "-y", &package_downgrade])
        .output()
        .context("Failed to execute APT downgrade command")?;
    
    if !output.status.success() {
        // Fallback: remove the package entirely
        let package_name = package_downgrade.split('=').next().unwrap();
        let remove_output = Command::new("apt")
            .args(&["remove", "-y", package_name])
            .output()
            .context("Failed to execute APT remove fallback")?;
        
        if !remove_output.status.success() {
            // Log warning but continue
            eprintln!("Warning: Could not rollback package {}", package_name);
        }
    }
}

Questions for Discussion

  1. Should we track package installation state more explicitly?

    • Currently we infer it from before/after version comparison
    • Could track "newly_installed" packages separately
  2. How to handle partial rollback failures?

    • Continue with other packages?
    • Stop and report what failed?
  3. Should we use apt-mark for version pinning?

    • Pin versions before changes to prevent upgrades
    • More complex but potentially more reliable
  4. What about dependency conflicts during rollback?

    • Some packages might have been installed as dependencies
    • Removing them might break other packages

Discussion & Decision

Scope: "Good Enough" vs. "Complex"

Decision: Aim for "good enough" (80/20 rule)

  • Vast majority of package updates are simple and don't involve complex dependency trees
  • Complex dependency graph analysis would push code well beyond line count goals
  • Acknowledging apt's inherent limitations is better than fragile workarounds

Failure Mode: "Fail Fast"

Decision: Stop completely and let user fix manually

  • Most honest approach - communicates exactly what failed
  • Prevents further damage from partial rollback
  • Gives user control to make informed decisions
  • Aligns with "fail with loud, clear error" principle

Future Direction: Stepping Stone

Decision: This is a stepping stone toward more robust solutions

  • Document limitations clearly in README
  • Demonstrate value of rollback feature
  • Highlight problems that snapshot-based solutions would solve
  • Create foundation for more advanced approaches

Final Approach: Hybrid with Explicit Tracking

pub struct AptTransaction {
    packages: Vec<String>,
    newly_installed: HashSet<String>,  // Explicit tracking
    upgraded: HashMap<String, String>, // package -> old_version
}

Benefits:

  • Handles 80% of common cases correctly
  • Clear failure modes
  • Maintains simplicity goals
  • Sets stage for future improvements

Refined Scope: "Good Enough + Safety"

Decision: Aim for common cases with robust safety measures

  • Handle 80% of rollbacks correctly
  • Add essential safety features (locking, critical package detection)
  • Avoid complex dependency graph analysis
  • Stop short of filesystem snapshots or full atomicity

Key Design Principle: The tracked state represents the explicitly managed packages in the transaction. Due to cascading dependencies, the actual system changes may be larger. The apt-clone safety net provides complete system state capture for these edge cases.

Additional Considerations

Package Locking & Concurrency

  • Lock apt during operations to prevent concurrent package management
  • Use dpkg --configure -a before operations, unlock after
  • Prevents race conditions with other package managers

Critical Package Handling

  • Use apt's built-in essential package detection: dpkg -l | grep 'ii' | grep -E '^E'
  • Prevent rollback of essential packages to avoid system breakage
  • No hardcoded whitelist - rely on apt's own package classification

State Persistence & Retry

  • Simple JSON file: /var/lib/apt-wrapper/rollback-state.json
  • Store before each operation: Package lists, versions, operation type
  • Enable retry: Read state file to resume from last successful operation
  • Crash recovery: Detect incomplete rollback and offer retry option
  • Corruption handling: Simple checksum validation, fallback to clean state if corrupted

Enhanced Error Handling

  • Fail fast on critical errors: Package not found, version unavailable, permission denied
  • Continue with warnings on soft errors: Package already removed, non-critical dependency warnings
  • Clear distinction: Critical = system-breaking, Soft = recoverable or ignorable
  • Accumulate warnings: Collect all soft errors and display summary at end
  • Informative fallback messages: Explain why package removal occurred (version unavailable)

Soft Error Definition

Soft errors are non-critical issues that don't break the rollback:

  • Package already removed (no action needed)
  • Non-essential dependency warnings (system still functional)
  • Version downgrade warnings (package still works)

Critical errors stop the entire rollback:

  • Package not found in repository
  • Permission denied
  • Essential package conflicts

Exit Code Strategy

  • 0: Rollback successful, maybe with warnings
  • 1: Rollback failed on critical error
  • 2: Rollback succeeded but with unresolved soft issues

Rollback Command Strategy

Decision: Separate commands for remove vs downgrade (non-atomic)

  • Phase 1: apt remove -y package1 package2 (newly installed packages)
  • Phase 2: apt install -y package1=oldver package2=oldver (upgraded packages)
  • Acknowledge: This is not atomic, but simpler and more reliable than complex single commands

Implementation Details

Warning Accumulation

let mut warnings = Vec::new();

// During rollback operations
if some_error_condition {
    warnings.push(format!("Warning: Could not remove package {}", package));
}

// After the rollback
if !warnings.is_empty() {
    eprintln!("Warnings during rollback:");
    for warning in warnings {
        eprintln!("{}", warning);
    }
}

Informative Fallback Messages

// This is the "continue with warnings" path for a non-critical error
if !output.status.success() {
    let package_name = package_downgrade.split('=').next().unwrap();
    eprintln!("Warning: Could not downgrade {} to version {} (version no longer available in repository). Removing instead.", package_name, before_version);
    
    let remove_output = Command::new("apt")
        .args(&["remove", "-y", package_name])
        .output()
        .context("Failed to execute APT remove fallback")?;
}

Lightweight Dependency Checking

// Check what packages depend on the package being removed
let output = Command::new("apt-cache")
    .arg("rdepends")
    .arg(removed_package)
    .output()
    .expect("Failed to check reverse dependencies");

if !output.stdout.is_empty() {
    eprintln!("Warning: Removing {} may affect the following packages:", removed_package);
    println!("{}", String::from_utf8_lossy(&output.stdout));
}

User-Installed Package Awareness

// Quick check for user-installed rdepends before removal
let output = Command::new("apt-mark")
    .args(&["showmanual"])
    .output()?;

let manual_packages: HashSet<_> = String::from_utf8_lossy(&output.stdout)
    .lines()
    .collect();

for pkg in &packages_to_remove {
    let rdepends = get_reverse_dependencies(pkg)?;
    // Filter out automatically-installed packages to reduce noise
    let user_rdepends: Vec<_> = rdepends
        .into_iter()
        .filter(|dep| manual_packages.contains(dep))
        .collect();

    if !user_rdepends.is_empty() {
        eprintln!("Warning: Removing {} may break manually installed packages: {:?}", pkg, user_rdepends);
    }
}

Meta-Package Detection

// Detect meta-packages and warn about cascade removals
for pkg in &packages_to_remove {
    let output = Command::new("apt-cache")
        .args(&["show", pkg])
        .output()?;
    
    if output.status.success() {
        let info = String::from_utf8_lossy(&output.stdout);
        if info.contains("Meta package") || info.contains("Virtual package") {
            eprintln!("Warning: {} is a meta-package, removal may trigger cascade removals", pkg);
        }
    }
}

Dry-Run JSON Output

{
  "would_remove": ["pkg1", "pkg2"],
  "would_downgrade": ["pkg3=1.2.3"],
  "warnings": ["pkg1 has user-installed reverse dependencies: foo, bar"]
}

Strategic Improvements

Pre-Transaction Safety Net

  • Use apt-clone (preferred) or dpkg --get-selections to create full system snapshot before rollback
  • apt-clone: Creates complete package state backup including sources and keyrings
  • dpkg --get-selections: Simpler but only captures package selections
  • Complete restore point if rollback fails catastrophically
  • Better than partial rollback - can restore entire system state

Enhanced Dependency Awareness

  • Warn on non-auto-installed reverse dependencies (user-installed packages that depend on removed package)
  • Prevent breaking user workflows by highlighting critical dependencies
  • Interactive confirmation for potentially destructive operations

Version Handling Sophistication

  • Find closest available older version instead of exact match or remove
  • Interactive fallback: "Version X not found. Install nearest available version Y? [Y/n]"
  • User-configurable fallback strategy:
    • --strict: fail if exact version not found
    • --nearest: choose closest available lower version
    • --remove: remove if version missing (default)
  • Less destructive than complete package removal

Concurrency Safety Improvements

  • Wait for lock with timeout instead of immediate failure
  • Check /var/lib/dpkg/lock-frontend and wait up to 30 seconds
  • Clear error message if timeout expires: "APT is locked by another process, aborting rollback"
  • Optional --wait-indefinitely flag for CI/automation use cases
  • Better user experience when apt is busy with other operations

Enhanced State Management

  • Schema versioning in rollback-state.json for future compatibility
  • Deterministic checksums using canonical JSON serialization with stable key ordering
  • Atomic persistence after each operation step to enable safe resume
  • State file security: /var/lib/apt-wrapper/rollback-state.json owned by root:root with mode 0600
  • Manual state clearing: apt-wrapper rollback --abort
  • Dry-run mode: apt-wrapper rollback --dry-run to preview changes
  • Machine-readable dry-run output (JSON format) for automation/CI integration

Future Considerations

Auditing & Troubleshooting

  • Track rollback metadata (timestamps, user IDs) for auditing
  • Integration with larger system management tools
  • Logging for troubleshooting and system analysis

Scalability Planning

  • Handle multiple system states simultaneously
  • Related package update rollbacks
  • Service-level rollback coordination

Implementation Phases

The following phased plan will incrementally implement the features outlined in the 'Final Approach' and 'Strategic Improvements' sections.

Phase 1: Core Rollback Logic

  1. Implement separate tracking for new vs upgraded packages
  2. Use apt remove for newly installed packages
  3. Fail fast on critical errors, continue with warnings on soft errors

Phase 2: Robustness Enhancements

  1. Add warning accumulation for better user experience
  2. Add informative fallback messages explaining why operations occurred
  3. Add lightweight dependency checking for removed packages
  4. Add proper error handling and logging
  5. Add package locking with timeout to prevent concurrent operations
  6. Implement critical package detection using apt's essential package list
  7. Add enhanced state persistence with schema versioning and deterministic checksums
  8. Add dry-run mode for previewing rollback operations
  9. Add CLI ergonomics: --resume, --abort, --dry-run --json vs --dry-run --pretty
  10. Add exit code strategy for automation/CI integration

Phase 2.5: Advanced Safety Features

  1. Add pre-transaction snapshot using apt-clone (preferred) or dpkg --get-selections
  2. Enhanced dependency awareness for non-auto-installed packages
  3. Sophisticated version handling with closest available version fallback
  4. Interactive confirmation for potentially destructive operations
  5. Add meta-package detection and cascade removal warnings

Phase 3: Testing and Documentation

  1. Test with various scenarios
  2. Document limitations clearly
  3. Add JSON schema documentation for rollback-state.json

Rollback State JSON Schema

{
  "schema_version": "1.0",
  "checksum": "sha256:abc123...",
  "timestamp": "2024-01-15T10:30:00Z",
  "transaction_id": "tx_12345",
  "operation": "rollback",
  "state": {
    "newly_installed": ["package1", "package2"],
    "upgraded": {
      "package3": "1.0.0",
      "package4": "2.1.0"
    },
    "before_versions": {
      "package3": "0.9.0",
      "package4": "2.0.0"
    },
    "after_versions": {
      "package3": "1.0.0",
      "package4": "2.1.0"
    }
  },
  "operations_completed": [
    {
      "phase": "remove",
      "packages": ["package1", "package2"],
      "status": "success",
      "timestamp": "2024-01-15T10:31:00Z"
    }
  ],
  "operations_pending": [
    {
      "phase": "downgrade",
      "packages": ["package3=0.9.0", "package4=2.0.0"],
      "status": "pending"
    }
  ]
}

Key Features:

  • Schema versioning: Future compatibility
  • Checksum: Corruption detection
  • Transaction tracking: Unique ID for each rollback
  • Phase tracking: What's completed vs pending
  • State preservation: Complete before/after package states