# 🔍 **rpm-ostree Error Handling Analysis** ## 📋 **Overview** This document analyzes the error handling patterns in rpm-ostree, examining how errors are managed across the CLI client (`rpm-ostree`) and the system daemon (`rpm-ostreed`). Understanding these patterns is crucial for implementing robust error handling in apt-ostree. ## 🏗️ **Error Handling Architecture Overview** ### **Component Error Handling Distribution** ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ CLI Client │ │ Error Layer │ │ System Daemon │ │ (rpm-ostree) │◄──►│ (GError/DBus) │◄──►│ (rpm-ostreed) │ │ │ │ │ │ │ │ • User-facing │ │ • Error Types │ │ • System-level │ │ • Command-line │ │ • Error Codes │ │ • Transaction │ │ • Progress │ │ • Error Domain │ │ • OSTree Ops │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ``` ### **Error Handling Principles** 1. **Separation of Concerns**: CLI handles user-facing errors, daemon handles system errors 2. **Error Propagation**: Errors flow from daemon to CLI via DBus 3. **Transaction Safety**: Failed operations trigger automatic rollback 4. **User Experience**: Clear error messages with recovery suggestions 5. **Logging Integration**: Comprehensive error logging for debugging ## 🔍 **Detailed Error Handling Analysis** ### **1. Daemon Error Types (`rpmostreed-errors.h`)** #### **Core Error Definitions** ```cpp typedef enum { RPM_OSTREED_ERROR_FAILED, // Generic operation failure RPM_OSTREED_ERROR_INVALID_SYSROOT, // Invalid system root path RPM_OSTREED_ERROR_NOT_AUTHORIZED, // PolicyKit authorization failure RPM_OSTREED_ERROR_UPDATE_IN_PROGRESS, // Concurrent update prevention RPM_OSTREED_ERROR_INVALID_REFSPEC, // Invalid OSTree reference RPM_OSTREED_ERROR_NUM_ENTRIES, // Enum size marker } RpmOstreedError; ``` #### **Error Domain Registration** ```cpp // From rpmostreed-errors.cxx static const GDBusErrorEntry dbus_error_entries[] = { { RPM_OSTREED_ERROR_FAILED, "org.projectatomic.rpmostreed.Error.Failed" }, { RPM_OSTREED_ERROR_INVALID_SYSROOT, "org.projectatomic.rpmostreed.Error.InvalidSysroot" }, { RPM_OSTREED_ERROR_NOT_AUTHORIZED, "org.projectatomic.rpmostreed.Error.NotAuthorized" }, { RPM_OSTREED_ERROR_UPDATE_IN_PROGRESS, "org.projectatomic.rpmostreed.Error.UpdateInProgress" }, { RPM_OSTREED_ERROR_INVALID_REFSPEC, "org.projectatomic.rpmostreed.Error.InvalidRefspec" }, }; GQuark rpmostreed_error_quark (void) { static gsize quark = 0; g_dbus_error_register_error_domain ("rpmostreed-error-quark", &quark, dbus_error_entries, G_N_ELEMENTS (dbus_error_entries)); return (GQuark)quark; } ``` **Key Characteristics**: - **DBus Integration**: Errors are registered as DBus error domains - **Standardized Codes**: Predefined error codes for common failure scenarios - **Internationalization**: Error messages can be localized - **Error Quarks**: Unique identifiers for error domains ### **2. Transaction Error Handling** #### **Transaction Lifecycle Error Management** ```cpp // From rpmostreed-transaction.cxx struct _RpmostreedTransactionPrivate { GDBusMethodInvocation *invocation; // DBus method context gboolean executed; // Transaction completion state GCancellable *cancellable; // Cancellation support // System state during transaction char *sysroot_path; // Sysroot path OstreeSysroot *sysroot; // OSTree sysroot gboolean sysroot_locked; // Sysroot lock state // Client tracking char *client_description; // Client description char *agent_id; // Client agent ID char *sd_unit; // Systemd unit // Progress tracking gint64 last_progress_journal; // Progress journal timestamp gboolean redirect_output; // Output redirection flag // Peer connections GDBusServer *server; // DBus server GHashTable *peer_connections; // Client connections // Completion state GVariant *finished_params; // Completion parameters guint watch_id; // Watch identifier }; ``` #### **Error Recovery Mechanisms** ```cpp // Transaction rollback on failure static void unlock_sysroot (RpmostreedTransaction *self) { RpmostreedTransactionPrivate *priv = rpmostreed_transaction_get_private (self); if (!(priv->sysroot && priv->sysroot_locked)) return; ostree_sysroot_unlock (priv->sysroot); sd_journal_print (LOG_INFO, "Unlocked sysroot"); priv->sysroot_locked = FALSE; } // Transaction cleanup static void transaction_maybe_emit_closed (RpmostreedTransaction *self) { RpmostreedTransactionPrivate *priv = rpmostreed_transaction_get_private (self); if (rpmostreed_transaction_get_active (self)) return; if (g_hash_table_size (priv->peer_connections) > 0) return; g_signal_emit (self, signals[CLOSED], 0); rpmostreed_sysroot_finish_txn (rpmostreed_sysroot_get (), self); } ``` **Key Characteristics**: - **Automatic Rollback**: Failed transactions automatically unlock sysroot - **Resource Cleanup**: Proper cleanup of system resources on failure - **Signal Emission**: Error signals sent to all connected clients - **Journal Integration**: Errors logged to systemd journal ### **3. CLI Client Error Handling** #### **DBus Error Handling Patterns** ```cpp // From rpmostree-clientlib.cxx static void on_owner_changed (GObject *object, GParamSpec *pspec, gpointer user_data) { auto tp = static_cast (user_data); tp->error = g_dbus_error_new_for_dbus_error ( "org.projectatomic.rpmostreed.Error.Failed", "Bus owner changed, aborting. This likely means the daemon crashed; " "check logs with `journalctl -xe`." ); transaction_progress_end (tp); } // Transaction connection error handling static RPMOSTreeTransaction * transaction_connect (const char *transaction_address, GCancellable *cancellable, GError **error) { GLNX_AUTO_PREFIX_ERROR ("Failed to connect to client transaction", error); g_autoptr (GDBusConnection) peer_connection = g_dbus_connection_new_for_address_sync ( transaction_address, G_DBUS_CONNECTION_FLAGS_AUTHENTICATION_CLIENT, NULL, cancellable, error ); if (peer_connection == NULL) return NULL; return rpmostree_transaction_proxy_new_sync ( peer_connection, G_DBUS_PROXY_FLAGS_NONE, NULL, "/", cancellable, error ); } ``` #### **User-Facing Error Display** ```cpp // Progress and error display static void transaction_progress_signal_handler (GDBusConnection *connection, const char *sender_name, const char *object_path, const char *interface_name, const char *signal_name, GVariant *parameters, gpointer user_data) { auto tp = static_cast (user_data); if (g_strcmp0 (signal_name, "Message") == 0) { const char *message; g_variant_get (parameters, "(&s)", &message); if (!tp->progress) { tp->progress = TRUE; rpmostreecxx::console_progress_begin_task (message); } else { rpmostreecxx::console_progress_set_message (message); } } else if (g_strcmp0 (signal_name, "PercentProgress") == 0) { guint percentage; const char *message; g_variant_get (parameters, "(u&s)", &percentage, &message); if (!tp->progress) { tp->progress = TRUE; rpmostreecxx::console_progress_begin_percent (message); } rpmostreecxx::console_progress_update (percentage); } } ``` **Key Characteristics**: - **Error Context**: Errors include helpful context and recovery suggestions - **Progress Integration**: Error messages integrated with progress display - **User Guidance**: Clear instructions for troubleshooting (e.g., `journalctl -xe`) - **Graceful Degradation**: Client continues operation when possible ### **4. Rust Error Handling Integration** #### **Error Type Definitions** ```rust // From rust/src/lib.rs /// APIs defined here are automatically bridged between Rust and C++ using https://cxx.rs/ /// /// # Error handling /// /// For fallible APIs that return a `Result`: /// /// - Use `Result` inside `lib.rs` below /// - On the Rust *implementation* side, use `CxxResult` which does error /// formatting in a more preferred way /// - On the C++ side, use our custom `CXX_TRY` API which converts the C++ exception /// into a GError. In the future, we might try a hard switch to C++ exceptions /// instead, but at the moment having two is problematic, so we prefer `GError`. ``` #### **System Host Type Validation** ```rust // From rust/src/client.rs /// Return an error if the current system host type does not match expected. pub(crate) fn require_system_host_type(expected: SystemHostType) -> CxxResult<()> { let current = get_system_host_type()?; if current != expected { let expected = system_host_type_str(&expected); let current = system_host_type_str(¤t); return Err(format!( "This command requires an {expected} system; found: {current}" ).into()); } Ok(()) } /// Classify the running system. #[derive(Clone, Debug)] pub(crate) enum SystemHostType { OstreeContainer, OstreeHost, Unknown, } ``` **Key Characteristics**: - **Hybrid Approach**: Rust `Result` bridged to C++ `GError` - **Type Safety**: Rust enums for error classification - **Context Preservation**: Error messages include system context - **Bridging**: `CxxResult` for Rust-C++ boundary ## 🔄 **Error Flow Patterns** ### **1. Error Propagation Flow** ``` System Error → Daemon → DBus Error → CLI Client → User Display ``` **Detailed Flow**: 1. **System Operation Fails** (e.g., OSTree operation, file permission) 2. **Daemon Catches Error** and creates appropriate error code 3. **DBus Error Sent** to connected clients with error details 4. **CLI Client Receives Error** and formats for user display 5. **User Sees Error** with context and recovery suggestions ### **2. Transaction Error Handling Flow** ``` Transaction Start → Operation Execution → Error Detection → Rollback → Error Reporting ``` **Detailed Flow**: 1. **Transaction Begins** with sysroot locking 2. **Operations Execute** in sequence 3. **Error Detected** during any operation 4. **Automatic Rollback** of completed operations 5. **Sysroot Unlocked** and resources cleaned up 6. **Error Reported** to all connected clients 7. **Transaction Terminated** with error state ### **3. Client Error Recovery Flow** ``` Error Received → Context Analysis → Recovery Attempt → Fallback → User Notification ``` **Detailed Flow**: 1. **Error Received** from daemon via DBus 2. **Context Analyzed** (error type, system state) 3. **Recovery Attempted** (retry, alternative approach) 4. **Fallback Executed** if recovery fails 5. **User Notified** of error and recovery status ## 📊 **Error Handling Responsibility Matrix** | Error Type | CLI Client | Daemon | Notes | |------------|------------|---------|-------| | **Command Parsing** | ✅ Primary | ❌ None | CLI validates user input | | **DBus Communication** | ✅ Client | ✅ Server | Both handle connection errors | | **OSTree Operations** | ❌ None | ✅ Primary | Daemon handles all OSTree errors | | **Package Management** | ❌ None | ✅ Primary | Daemon handles APT/RPM errors | | **Transaction Errors** | ✅ Display | ✅ Management | Daemon manages, CLI displays | | **System Errors** | ❌ None | ✅ Primary | Daemon handles system-level errors | | **User Input Errors** | ✅ Primary | ❌ None | CLI validates before sending | | **Recovery Actions** | ✅ Primary | ✅ Support | CLI guides user, daemon executes | ## 🚀 **apt-ostree Error Handling Implementation Strategy** ### **1. Error Type Definitions** #### **Core Error Types** ```rust // daemon/src/errors.rs use thiserror::Error; #[derive(Error, Debug)] pub enum AptOstreeError { #[error("Operation failed: {message}")] OperationFailed { message: String }, #[error("Invalid sysroot: {path}")] InvalidSysroot { path: String }, #[error("Not authorized: {operation}")] NotAuthorized { operation: String }, #[error("Update in progress")] UpdateInProgress, #[error("Invalid package reference: {refspec}")] InvalidPackageRef { refspec: String }, #[error("Transaction failed: {reason}")] TransactionFailed { reason: String }, #[error("OSTree error: {source}")] OstreeError { #[from] source: ostree::Error }, #[error("APT error: {source}")] AptError { #[from] source: apt_pkg_native::Error }, #[error("System error: {source}")] SystemError { #[from] source: std::io::Error }, } impl AptOstreeError { pub fn dbus_error_code(&self) -> &'static str { match self { Self::OperationFailed { .. } => "org.projectatomic.aptostree.Error.Failed", Self::InvalidSysroot { .. } => "org.projectatomic.aptostree.Error.InvalidSysroot", Self::NotAuthorized { .. } => "org.projectatomic.aptostree.Error.NotAuthorized", Self::UpdateInProgress => "org.projectatomic.aptostree.Error.UpdateInProgress", Self::InvalidPackageRef { .. } => "org.projectatomic.aptostree.Error.InvalidPackageRef", Self::TransactionFailed { .. } => "org.projectatomic.aptostree.Error.TransactionFailed", _ => "org.projectatomic.aptostree.Error.Unknown", } } } ``` #### **DBus Error Integration** ```rust // daemon/src/dbus_errors.rs use zbus::fdo; pub fn convert_to_dbus_error(error: &AptOstreeError) -> fdo::Error { match error { AptOstreeError::NotAuthorized { operation } => { fdo::Error::PermissionDenied(format!("Not authorized for: {}", operation)) } AptOstreeError::UpdateInProgress => { fdo::Error::Failed("Update operation already in progress".into()) } AptOstreeError::TransactionFailed { reason } => { fdo::Error::Failed(format!("Transaction failed: {}", reason)) } _ => { fdo::Error::Failed(error.to_string()) } } } ``` ### **2. Transaction Error Management** #### **Transaction Error Handling** ```rust // daemon/src/transaction.rs impl Transaction { pub async fn execute(&mut self, daemon: &AptOstreeDaemon) -> Result<(), AptOstreeError> { self.state = TransactionState::InProgress; // Lock sysroot self.sysroot_locked = true; // Execute operations with error handling for operation in &self.operations { match self.execute_operation(operation, daemon).await { Ok(()) => { // Operation successful, continue self.emit_progress(operation, 100, "Completed").await; } Err(error) => { // Operation failed, rollback and return error self.rollback().await?; return Err(error); } } } self.state = TransactionState::Committed; self.sysroot_locked = false; Ok(()) } async fn rollback(&mut self) -> Result<(), AptOstreeError> { // Rollback completed operations for operation in self.completed_operations.iter().rev() { self.rollback_operation(operation).await?; } // Unlock sysroot if self.sysroot_locked { self.unlock_sysroot().await?; self.sysroot_locked = false; } self.state = TransactionState::RolledBack; Ok(()) } } ``` ### **3. Client Error Handling** #### **CLI Error Display** ```rust // src/client.rs impl AptOstreeClient { pub async fn handle_dbus_error(&self, error: &fdo::Error) -> String { match error { fdo::Error::PermissionDenied(message) => { format!("❌ Permission denied: {}. Try running with sudo.", message) } fdo::Error::Failed(message) => { format!("❌ Operation failed: {}. Check daemon logs for details.", message) } fdo::Error::InvalidArgs(message) => { format!("❌ Invalid arguments: {}. Check command syntax.", message) } _ => { format!("❌ Unexpected error: {}. Please report this issue.", error) } } } pub async fn install_packages(&self, transaction_id: &str, packages: Vec) -> Result { match self.daemon.install_packages(transaction_id, packages).await { Ok(success) => Ok(success), Err(error) => { let user_message = self.handle_dbus_error(&error).await; eprintln!("{}", user_message); // Provide recovery suggestions eprintln!("💡 Recovery suggestions:"); eprintln!(" • Check if daemon is running: systemctl status apt-ostreed"); eprintln!(" • Check daemon logs: journalctl -u apt-ostreed -f"); eprintln!(" • Verify package names: apt search "); Err(error) } } } } ``` ### **4. Error Recovery Strategies** #### **Automatic Recovery** ```rust // daemon/src/recovery.rs pub struct ErrorRecovery { max_retries: u32, retry_delay: Duration, } impl ErrorRecovery { pub async fn retry_operation( &self, operation: F, operation_name: &str, ) -> Result where F: Fn() -> Future> + Send + Sync, E: std::error::Error + Send + Sync + 'static, { let mut attempts = 0; let mut last_error = None; while attempts < self.max_retries { match operation().await { Ok(result) => return Ok(result), Err(error) => { attempts += 1; last_error = Some(error); if attempts < self.max_retries { tracing::warn!( "{} failed (attempt {}/{}), retrying in {:?}...", operation_name, attempts, self.max_retries, self.retry_delay ); tokio::time::sleep(self.retry_delay).await; } } } } Err(last_error.unwrap()) } } ``` #### **Fallback Operations** ```rust // src/fallback.rs impl AptOstreeClient { pub async fn install_packages_with_fallback(&self, packages: &[String]) -> Result<(), Error> { // Try daemon first if let Ok(client) = AptOstreeClient::new().await { match client.install_packages(packages).await { Ok(()) => return Ok(()), Err(error) => { tracing::warn!("Daemon installation failed: {}", error); // Fall through to fallback } } } // Fallback to direct operations (limited functionality) tracing::info!("Using fallback installation mode"); self.install_packages_direct(packages).await } } ``` ## 🎯 **Key Implementation Principles** ### **1. Error Classification** - **User Errors**: Invalid input, permission issues - **System Errors**: OSTree failures, file system issues - **Network Errors**: Package download failures - **Transaction Errors**: Rollback failures, state corruption ### **2. Error Recovery Priority** 1. **Automatic Recovery**: Retry operations, rollback transactions 2. **Graceful Degradation**: Fallback to limited functionality 3. **User Guidance**: Clear error messages with recovery steps 4. **Logging**: Comprehensive error logging for debugging ### **3. User Experience** - **Clear Messages**: Error messages explain what went wrong - **Recovery Steps**: Provide specific actions to resolve issues - **Progress Integration**: Errors integrated with progress display - **Context Preservation**: Maintain context across error boundaries ### **4. System Reliability** - **Transaction Safety**: Failed operations don't leave system in bad state - **Resource Cleanup**: Proper cleanup of locked resources - **Rollback Support**: Automatic rollback of failed operations - **State Consistency**: Maintain consistent system state This error handling analysis provides the foundation for implementing robust error handling in apt-ostree that maintains the reliability and user experience standards established by rpm-ostree while adapting to the Debian/Ubuntu ecosystem.