apt-ostree/docs/apt-ostree-daemon-plan/architecture/error-handling-analysis.md
robojerk 306a68b89a fix: Resolve compilation errors in parallel and cache modules
- Fix parallel execution logic to properly handle JoinHandle<Result<R, E>> types
- Use join_all instead of try_join_all for proper Result handling
- Fix double question mark (??) issue in parallel execution methods
- Clean up unused imports in parallel and cache modules
- Ensure all performance optimization modules compile successfully
- Fix CI build failures caused by compilation errors
2025-08-16 15:10:00 -07:00

21 KiB

🔍 rpm-ostree Error Handling Analysis

📋 Overview

This document analyzes the error handling patterns in rpm-ostree, examining how errors are managed across the CLI client (rpm-ostree) and the system daemon (rpm-ostreed). Understanding these patterns is crucial for implementing robust error handling in apt-ostree.

🏗️ Error Handling Architecture Overview

Component Error Handling Distribution

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   CLI Client    │    │   Error Layer   │    │   System Daemon │
│   (rpm-ostree)  │◄──►│   (GError/DBus) │◄──►│   (rpm-ostreed) │
│                 │    │                 │    │                 │
│ • User-facing   │    │ • Error Types   │    │ • System-level  │
│ • Command-line  │    │ • Error Codes   │    │ • Transaction   │
│ • Progress      │    │ • Error Domain  │    │ • OSTree Ops    │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Error Handling Principles

  1. Separation of Concerns: CLI handles user-facing errors, daemon handles system errors
  2. Error Propagation: Errors flow from daemon to CLI via DBus
  3. Transaction Safety: Failed operations trigger automatic rollback
  4. User Experience: Clear error messages with recovery suggestions
  5. Logging Integration: Comprehensive error logging for debugging

🔍 Detailed Error Handling Analysis

1. Daemon Error Types (rpmostreed-errors.h)

Core Error Definitions

typedef enum
{
  RPM_OSTREED_ERROR_FAILED,                    // Generic operation failure
  RPM_OSTREED_ERROR_INVALID_SYSROOT,           // Invalid system root path
  RPM_OSTREED_ERROR_NOT_AUTHORIZED,            // PolicyKit authorization failure
  RPM_OSTREED_ERROR_UPDATE_IN_PROGRESS,        // Concurrent update prevention
  RPM_OSTREED_ERROR_INVALID_REFSPEC,           // Invalid OSTree reference
  RPM_OSTREED_ERROR_NUM_ENTRIES,               // Enum size marker
} RpmOstreedError;

Error Domain Registration

// From rpmostreed-errors.cxx
static const GDBusErrorEntry dbus_error_entries[] = {
  { RPM_OSTREED_ERROR_FAILED, "org.projectatomic.rpmostreed.Error.Failed" },
  { RPM_OSTREED_ERROR_INVALID_SYSROOT, "org.projectatomic.rpmostreed.Error.InvalidSysroot" },
  { RPM_OSTREED_ERROR_NOT_AUTHORIZED, "org.projectatomic.rpmostreed.Error.NotAuthorized" },
  { RPM_OSTREED_ERROR_UPDATE_IN_PROGRESS, "org.projectatomic.rpmostreed.Error.UpdateInProgress" },
  { RPM_OSTREED_ERROR_INVALID_REFSPEC, "org.projectatomic.rpmostreed.Error.InvalidRefspec" },
};

GQuark rpmostreed_error_quark (void) {
  static gsize quark = 0;
  g_dbus_error_register_error_domain ("rpmostreed-error-quark", &quark, 
                                      dbus_error_entries, G_N_ELEMENTS (dbus_error_entries));
  return (GQuark)quark;
}

Key Characteristics:

  • DBus Integration: Errors are registered as DBus error domains
  • Standardized Codes: Predefined error codes for common failure scenarios
  • Internationalization: Error messages can be localized
  • Error Quarks: Unique identifiers for error domains

2. Transaction Error Handling

Transaction Lifecycle Error Management

// From rpmostreed-transaction.cxx
struct _RpmostreedTransactionPrivate {
  GDBusMethodInvocation *invocation;           // DBus method context
  gboolean executed;                            // Transaction completion state
  GCancellable *cancellable;                   // Cancellation support
  
  // System state during transaction
  char *sysroot_path;                          // Sysroot path
  OstreeSysroot *sysroot;                      // OSTree sysroot
  gboolean sysroot_locked;                     // Sysroot lock state
  
  // Client tracking
  char *client_description;                    // Client description
  char *agent_id;                              // Client agent ID
  char *sd_unit;                               // Systemd unit
  
  // Progress tracking
  gint64 last_progress_journal;                // Progress journal timestamp
  gboolean redirect_output;                    // Output redirection flag
  
  // Peer connections
  GDBusServer *server;                         // DBus server
  GHashTable *peer_connections;                // Client connections
  
  // Completion state
  GVariant *finished_params;                   // Completion parameters
  guint watch_id;                              // Watch identifier
};

Error Recovery Mechanisms

// Transaction rollback on failure
static void
unlock_sysroot (RpmostreedTransaction *self)
{
  RpmostreedTransactionPrivate *priv = rpmostreed_transaction_get_private (self);

  if (!(priv->sysroot && priv->sysroot_locked))
    return;

  ostree_sysroot_unlock (priv->sysroot);
  sd_journal_print (LOG_INFO, "Unlocked sysroot");
  priv->sysroot_locked = FALSE;
}

// Transaction cleanup
static void
transaction_maybe_emit_closed (RpmostreedTransaction *self)
{
  RpmostreedTransactionPrivate *priv = rpmostreed_transaction_get_private (self);

  if (rpmostreed_transaction_get_active (self))
    return;

  if (g_hash_table_size (priv->peer_connections) > 0)
    return;

  g_signal_emit (self, signals[CLOSED], 0);
  rpmostreed_sysroot_finish_txn (rpmostreed_sysroot_get (), self);
}

Key Characteristics:

  • Automatic Rollback: Failed transactions automatically unlock sysroot
  • Resource Cleanup: Proper cleanup of system resources on failure
  • Signal Emission: Error signals sent to all connected clients
  • Journal Integration: Errors logged to systemd journal

3. CLI Client Error Handling

DBus Error Handling Patterns

// From rpmostree-clientlib.cxx
static void
on_owner_changed (GObject *object, GParamSpec *pspec, gpointer user_data)
{
  auto tp = static_cast<TransactionProgress *> (user_data);
  tp->error = g_dbus_error_new_for_dbus_error (
    "org.projectatomic.rpmostreed.Error.Failed",
    "Bus owner changed, aborting. This likely means the daemon crashed; "
    "check logs with `journalctl -xe`."
  );
  transaction_progress_end (tp);
}

// Transaction connection error handling
static RPMOSTreeTransaction *
transaction_connect (const char *transaction_address, GCancellable *cancellable, GError **error)
{
  GLNX_AUTO_PREFIX_ERROR ("Failed to connect to client transaction", error);
  
  g_autoptr (GDBusConnection) peer_connection = g_dbus_connection_new_for_address_sync (
    transaction_address, 
    G_DBUS_CONNECTION_FLAGS_AUTHENTICATION_CLIENT, 
    NULL, cancellable, error
  );

  if (peer_connection == NULL)
    return NULL;

  return rpmostree_transaction_proxy_new_sync (
    peer_connection, G_DBUS_PROXY_FLAGS_NONE, NULL, "/", 
    cancellable, error
  );
}

User-Facing Error Display

// Progress and error display
static void
transaction_progress_signal_handler (GDBusConnection *connection, const char *sender_name,
                                    const char *object_path, const char *interface_name,
                                    const char *signal_name, GVariant *parameters,
                                    gpointer user_data)
{
  auto tp = static_cast<TransactionProgress *> (user_data);

  if (g_strcmp0 (signal_name, "Message") == 0) {
    const char *message;
    g_variant_get (parameters, "(&s)", &message);
    
    if (!tp->progress) {
      tp->progress = TRUE;
      rpmostreecxx::console_progress_begin_task (message);
    } else {
      rpmostreecxx::console_progress_set_message (message);
    }
  } else if (g_strcmp0 (signal_name, "PercentProgress") == 0) {
    guint percentage;
    const char *message;
    g_variant_get (parameters, "(u&s)", &percentage, &message);
    
    if (!tp->progress) {
      tp->progress = TRUE;
      rpmostreecxx::console_progress_begin_percent (message);
    }
    rpmostreecxx::console_progress_update (percentage);
  }
}

Key Characteristics:

  • Error Context: Errors include helpful context and recovery suggestions
  • Progress Integration: Error messages integrated with progress display
  • User Guidance: Clear instructions for troubleshooting (e.g., journalctl -xe)
  • Graceful Degradation: Client continues operation when possible

4. Rust Error Handling Integration

Error Type Definitions

// From rust/src/lib.rs
/// APIs defined here are automatically bridged between Rust and C++ using https://cxx.rs/
///
/// # Error handling
///
/// For fallible APIs that return a `Result<T>`:
///
/// - Use `Result<T>` inside `lib.rs` below
/// - On the Rust *implementation* side, use `CxxResult<T>` which does error
///   formatting in a more preferred way
/// - On the C++ side, use our custom `CXX_TRY` API which converts the C++ exception
///   into a GError. In the future, we might try a hard switch to C++ exceptions
///   instead, but at the moment having two is problematic, so we prefer `GError`.

System Host Type Validation

// From rust/src/client.rs
/// Return an error if the current system host type does not match expected.
pub(crate) fn require_system_host_type(expected: SystemHostType) -> CxxResult<()> {
    let current = get_system_host_type()?;
    if current != expected {
        let expected = system_host_type_str(&expected);
        let current = system_host_type_str(&current);
        return Err(format!(
            "This command requires an {expected} system; found: {current}"
        ).into());
    }
    Ok(())
}

/// Classify the running system.
#[derive(Clone, Debug)]
pub(crate) enum SystemHostType {
    OstreeContainer,
    OstreeHost,
    Unknown,
}

Key Characteristics:

  • Hybrid Approach: Rust Result<T> bridged to C++ GError
  • Type Safety: Rust enums for error classification
  • Context Preservation: Error messages include system context
  • Bridging: CxxResult<T> for Rust-C++ boundary

🔄 Error Flow Patterns

1. Error Propagation Flow

System Error → Daemon → DBus Error → CLI Client → User Display

Detailed Flow:

  1. System Operation Fails (e.g., OSTree operation, file permission)
  2. Daemon Catches Error and creates appropriate error code
  3. DBus Error Sent to connected clients with error details
  4. CLI Client Receives Error and formats for user display
  5. User Sees Error with context and recovery suggestions

2. Transaction Error Handling Flow

Transaction Start → Operation Execution → Error Detection → Rollback → Error Reporting

Detailed Flow:

  1. Transaction Begins with sysroot locking
  2. Operations Execute in sequence
  3. Error Detected during any operation
  4. Automatic Rollback of completed operations
  5. Sysroot Unlocked and resources cleaned up
  6. Error Reported to all connected clients
  7. Transaction Terminated with error state

3. Client Error Recovery Flow

Error Received → Context Analysis → Recovery Attempt → Fallback → User Notification

Detailed Flow:

  1. Error Received from daemon via DBus
  2. Context Analyzed (error type, system state)
  3. Recovery Attempted (retry, alternative approach)
  4. Fallback Executed if recovery fails
  5. User Notified of error and recovery status

📊 Error Handling Responsibility Matrix

Error Type CLI Client Daemon Notes
Command Parsing Primary None CLI validates user input
DBus Communication Client Server Both handle connection errors
OSTree Operations None Primary Daemon handles all OSTree errors
Package Management None Primary Daemon handles APT/RPM errors
Transaction Errors Display Management Daemon manages, CLI displays
System Errors None Primary Daemon handles system-level errors
User Input Errors Primary None CLI validates before sending
Recovery Actions Primary Support CLI guides user, daemon executes

🚀 apt-ostree Error Handling Implementation Strategy

1. Error Type Definitions

Core Error Types

// daemon/src/errors.rs
use thiserror::Error;

#[derive(Error, Debug)]
pub enum AptOstreeError {
    #[error("Operation failed: {message}")]
    OperationFailed { message: String },
    
    #[error("Invalid sysroot: {path}")]
    InvalidSysroot { path: String },
    
    #[error("Not authorized: {operation}")]
    NotAuthorized { operation: String },
    
    #[error("Update in progress")]
    UpdateInProgress,
    
    #[error("Invalid package reference: {refspec}")]
    InvalidPackageRef { refspec: String },
    
    #[error("Transaction failed: {reason}")]
    TransactionFailed { reason: String },
    
    #[error("OSTree error: {source}")]
    OstreeError { #[from] source: ostree::Error },
    
    #[error("APT error: {source}")]
    AptError { #[from] source: apt_pkg_native::Error },
    
    #[error("System error: {source}")]
    SystemError { #[from] source: std::io::Error },
}

impl AptOstreeError {
    pub fn dbus_error_code(&self) -> &'static str {
        match self {
            Self::OperationFailed { .. } => "org.projectatomic.aptostree.Error.Failed",
            Self::InvalidSysroot { .. } => "org.projectatomic.aptostree.Error.InvalidSysroot",
            Self::NotAuthorized { .. } => "org.projectatomic.aptostree.Error.NotAuthorized",
            Self::UpdateInProgress => "org.projectatomic.aptostree.Error.UpdateInProgress",
            Self::InvalidPackageRef { .. } => "org.projectatomic.aptostree.Error.InvalidPackageRef",
            Self::TransactionFailed { .. } => "org.projectatomic.aptostree.Error.TransactionFailed",
            _ => "org.projectatomic.aptostree.Error.Unknown",
        }
    }
}

DBus Error Integration

// daemon/src/dbus_errors.rs
use zbus::fdo;

pub fn convert_to_dbus_error(error: &AptOstreeError) -> fdo::Error {
    match error {
        AptOstreeError::NotAuthorized { operation } => {
            fdo::Error::PermissionDenied(format!("Not authorized for: {}", operation))
        }
        AptOstreeError::UpdateInProgress => {
            fdo::Error::Failed("Update operation already in progress".into())
        }
        AptOstreeError::TransactionFailed { reason } => {
            fdo::Error::Failed(format!("Transaction failed: {}", reason))
        }
        _ => {
            fdo::Error::Failed(error.to_string())
        }
    }
}

2. Transaction Error Management

Transaction Error Handling

// daemon/src/transaction.rs
impl Transaction {
    pub async fn execute(&mut self, daemon: &AptOstreeDaemon) -> Result<(), AptOstreeError> {
        self.state = TransactionState::InProgress;
        
        // Lock sysroot
        self.sysroot_locked = true;
        
        // Execute operations with error handling
        for operation in &self.operations {
            match self.execute_operation(operation, daemon).await {
                Ok(()) => {
                    // Operation successful, continue
                    self.emit_progress(operation, 100, "Completed").await;
                }
                Err(error) => {
                    // Operation failed, rollback and return error
                    self.rollback().await?;
                    return Err(error);
                }
            }
        }
        
        self.state = TransactionState::Committed;
        self.sysroot_locked = false;
        Ok(())
    }
    
    async fn rollback(&mut self) -> Result<(), AptOstreeError> {
        // Rollback completed operations
        for operation in self.completed_operations.iter().rev() {
            self.rollback_operation(operation).await?;
        }
        
        // Unlock sysroot
        if self.sysroot_locked {
            self.unlock_sysroot().await?;
            self.sysroot_locked = false;
        }
        
        self.state = TransactionState::RolledBack;
        Ok(())
    }
}

3. Client Error Handling

CLI Error Display

// src/client.rs
impl AptOstreeClient {
    pub async fn handle_dbus_error(&self, error: &fdo::Error) -> String {
        match error {
            fdo::Error::PermissionDenied(message) => {
                format!("❌ Permission denied: {}. Try running with sudo.", message)
            }
            fdo::Error::Failed(message) => {
                format!("❌ Operation failed: {}. Check daemon logs for details.", message)
            }
            fdo::Error::InvalidArgs(message) => {
                format!("❌ Invalid arguments: {}. Check command syntax.", message)
            }
            _ => {
                format!("❌ Unexpected error: {}. Please report this issue.", error)
            }
        }
    }
    
    pub async fn install_packages(&self, transaction_id: &str, packages: Vec<String>) -> Result<bool, Error> {
        match self.daemon.install_packages(transaction_id, packages).await {
            Ok(success) => Ok(success),
            Err(error) => {
                let user_message = self.handle_dbus_error(&error).await;
                eprintln!("{}", user_message);
                
                // Provide recovery suggestions
                eprintln!("💡 Recovery suggestions:");
                eprintln!("   • Check if daemon is running: systemctl status apt-ostreed");
                eprintln!("   • Check daemon logs: journalctl -u apt-ostreed -f");
                eprintln!("   • Verify package names: apt search <package>");
                
                Err(error)
            }
        }
    }
}

4. Error Recovery Strategies

Automatic Recovery

// daemon/src/recovery.rs
pub struct ErrorRecovery {
    max_retries: u32,
    retry_delay: Duration,
}

impl ErrorRecovery {
    pub async fn retry_operation<F, T, E>(
        &self,
        operation: F,
        operation_name: &str,
    ) -> Result<T, E>
    where
        F: Fn() -> Future<Output = Result<T, E>> + Send + Sync,
        E: std::error::Error + Send + Sync + 'static,
    {
        let mut attempts = 0;
        let mut last_error = None;
        
        while attempts < self.max_retries {
            match operation().await {
                Ok(result) => return Ok(result),
                Err(error) => {
                    attempts += 1;
                    last_error = Some(error);
                    
                    if attempts < self.max_retries {
                        tracing::warn!(
                            "{} failed (attempt {}/{}), retrying in {:?}...",
                            operation_name, attempts, self.max_retries, self.retry_delay
                        );
                        tokio::time::sleep(self.retry_delay).await;
                    }
                }
            }
        }
        
        Err(last_error.unwrap())
    }
}

Fallback Operations

// src/fallback.rs
impl AptOstreeClient {
    pub async fn install_packages_with_fallback(&self, packages: &[String]) -> Result<(), Error> {
        // Try daemon first
        if let Ok(client) = AptOstreeClient::new().await {
            match client.install_packages(packages).await {
                Ok(()) => return Ok(()),
                Err(error) => {
                    tracing::warn!("Daemon installation failed: {}", error);
                    // Fall through to fallback
                }
            }
        }
        
        // Fallback to direct operations (limited functionality)
        tracing::info!("Using fallback installation mode");
        self.install_packages_direct(packages).await
    }
}

🎯 Key Implementation Principles

1. Error Classification

  • User Errors: Invalid input, permission issues
  • System Errors: OSTree failures, file system issues
  • Network Errors: Package download failures
  • Transaction Errors: Rollback failures, state corruption

2. Error Recovery Priority

  1. Automatic Recovery: Retry operations, rollback transactions
  2. Graceful Degradation: Fallback to limited functionality
  3. User Guidance: Clear error messages with recovery steps
  4. Logging: Comprehensive error logging for debugging

3. User Experience

  • Clear Messages: Error messages explain what went wrong
  • Recovery Steps: Provide specific actions to resolve issues
  • Progress Integration: Errors integrated with progress display
  • Context Preservation: Maintain context across error boundaries

4. System Reliability

  • Transaction Safety: Failed operations don't leave system in bad state
  • Resource Cleanup: Proper cleanup of locked resources
  • Rollback Support: Automatic rollback of failed operations
  • State Consistency: Maintain consistent system state

This error handling analysis provides the foundation for implementing robust error handling in apt-ostree that maintains the reliability and user experience standards established by rpm-ostree while adapting to the Debian/Ubuntu ecosystem.