# APT-OSTree Monitoring and Logging ## Overview APT-OSTree includes a comprehensive monitoring and logging system that provides: - **Structured Logging**: JSON-formatted logs with timestamps and context - **Metrics Collection**: System, performance, and transaction metrics - **Health Checks**: Automated health monitoring of system components - **Real-time Monitoring**: Background service for continuous monitoring - **Export Capabilities**: Metrics export in JSON format ## Architecture ### Components 1. **Monitoring Manager** (`src/monitoring.rs`) - Core monitoring functionality - Metrics collection and storage - Health check execution - Performance monitoring 2. **Monitoring Service** (`src/bin/monitoring-service.rs`) - Background service for continuous monitoring - Automated metrics collection - Health check scheduling - Metrics export 3. **CLI Integration** (`src/main.rs`) - Monitoring commands - Real-time status display - Metrics export ### Data Flow ``` ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ CLI Commands │───▶│ Monitoring Manager│───▶│ Metrics Storage │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ ▼ ┌──────────────────┐ │ Monitoring Service│ └──────────────────┘ │ ▼ ┌──────────────────┐ │ Health Checks │ └──────────────────┘ ``` ## Configuration ### Monitoring Configuration ```rust pub struct MonitoringConfig { pub log_level: String, // "trace", "debug", "info", "warn", "error" pub log_file: Option, // Optional log file path pub structured_logging: bool, // Enable JSON logging pub enable_metrics: bool, // Enable metrics collection pub metrics_interval: u64, // Metrics collection interval (seconds) pub enable_health_checks: bool, // Enable health checks pub health_check_interval: u64, // Health check interval (seconds) pub enable_performance_monitoring: bool, // Enable performance monitoring pub enable_transaction_monitoring: bool, // Enable transaction monitoring pub enable_system_monitoring: bool, // Enable system resource monitoring } ``` ### Environment Variables ```bash # Log level export RUST_LOG=info # Monitoring configuration export APT_OSTREE_MONITORING_ENABLED=1 export APT_OSTREE_METRICS_INTERVAL=60 export APT_OSTREE_HEALTH_CHECK_INTERVAL=300 ``` ## Usage ### CLI Commands #### Show Monitoring Status ```bash # Show general monitoring status apt-ostree monitoring # Export metrics as JSON apt-ostree monitoring --export # Run health checks apt-ostree monitoring --health # Show performance metrics apt-ostree monitoring --performance ``` #### Monitoring Service ```bash # Start monitoring service apt-ostree-monitoring start # Stop monitoring service apt-ostree-monitoring stop # Show service status apt-ostree-monitoring status # Run health check cycle apt-ostree-monitoring health-check # Export metrics apt-ostree-monitoring export-metrics ``` ### Systemd Service ```bash # Enable and start monitoring service sudo systemctl enable apt-ostree-monitoring sudo systemctl start apt-ostree-monitoring # Check service status sudo systemctl status apt-ostree-monitoring # View service logs sudo journalctl -u apt-ostree-monitoring -f # Stop service sudo systemctl stop apt-ostree-monitoring ``` ## Metrics ### System Metrics ```json { "timestamp": "2024-12-19T10:30:00Z", "cpu_usage": 15.5, "memory_usage": 8589934592, "total_memory": 17179869184, "disk_usage": 107374182400, "total_disk": 1073741824000, "active_transactions": 0, "pending_deployments": 1, "ostree_repo_size": 5368709120, "apt_cache_size": 1073741824, "uptime": 86400, "load_average": [1.2, 1.1, 1.0] } ``` ### Performance Metrics ```json { "timestamp": "2024-12-19T10:30:00Z", "operation_type": "package_installation", "duration_ms": 1500, "success": true, "error_message": null, "context": { "packages_count": "5", "total_size": "52428800" } } ``` ### Transaction Metrics ```json { "transaction_id": "tx-12345", "transaction_type": "install", "start_time": "2024-12-19T10:25:00Z", "end_time": "2024-12-19T10:26:30Z", "duration_ms": 90000, "success": true, "error_message": null, "packages_count": 5, "packages_size": 52428800, "progress": 1.0 } ``` ## Health Checks ### Available Health Checks 1. **OSTree Repository Health** - Repository integrity - Commit accessibility - Storage space 2. **APT Database Health** - Database integrity - Package cache status - Repository connectivity 3. **System Resources** - Memory availability - Disk space - CPU usage 4. **Daemon Health** - Service status - D-Bus connectivity - Authentication ### Health Check Results ```json { "check_name": "ostree_repository", "status": "healthy", "message": "OSTree repository is healthy", "timestamp": "2024-12-19T10:30:00Z", "duration_ms": 150, "details": { "repo_size": "5368709120", "commit_count": "1250", "integrity_check": "passed" } } ``` ## Logging ### Log Levels - **TRACE**: Detailed debugging information - **DEBUG**: Debugging information - **INFO**: General information - **WARN**: Warning messages - **ERROR**: Error messages ### Log Format #### Standard Format ``` 2024-12-19T10:30:00.123Z INFO apt_ostree::monitoring: Health check passed: ostree_repository ``` #### JSON Format (Structured Logging) ```json { "timestamp": "2024-12-19T10:30:00.123Z", "level": "INFO", "target": "apt_ostree::monitoring", "message": "Health check passed: ostree_repository", "fields": { "check_name": "ostree_repository", "duration_ms": 150 } } ``` ### Log Configuration ```bash # Set log level export RUST_LOG=info # Enable structured logging export APT_OSTREE_STRUCTURED_LOGGING=1 # Log to file export APT_OSTREE_LOG_FILE=/var/log/apt-ostree/app.log ``` ## Performance Monitoring ### Performance Wrappers ```rust use apt_ostree::monitoring::PerformanceMonitor; // Monitor an operation let monitor = PerformanceMonitor::new( monitoring_manager.clone(), "package_installation", context ); // Record success monitor.success().await?; // Record failure monitor.failure("Package not found".to_string()).await?; ``` ### Transaction Monitoring ```rust use apt_ostree::monitoring::TransactionMonitor; // Start transaction monitoring let monitor = TransactionMonitor::new( monitoring_manager.clone(), "tx-12345", "install", 5, 52428800 ); // Update progress monitor.update_progress(0.5).await?; // Complete transaction monitor.success().await?; ``` ## Integration ### With Package Manager The monitoring system integrates with the package manager to track: - Package installation/removal operations - Transaction progress - Performance metrics - Error tracking ### With OSTree Manager Integration with OSTree manager provides: - Commit metadata extraction - Repository health monitoring - Deployment tracking - Rollback monitoring ### With Daemon The monitoring system works with the daemon to provide: - Service health monitoring - D-Bus communication tracking - Authentication monitoring - Transaction state tracking ## Troubleshooting ### Common Issues #### Monitoring Service Not Starting ```bash # Check service status sudo systemctl status apt-ostree-monitoring # Check logs sudo journalctl -u apt-ostree-monitoring -f # Check permissions ls -la /usr/bin/apt-ostree-monitoring ls -la /var/log/apt-ostree/ ``` #### Metrics Not Being Collected ```bash # Check monitoring configuration apt-ostree monitoring --export # Verify service is running sudo systemctl is-active apt-ostree-monitoring # Check metrics file cat /var/log/apt-ostree/metrics.json ``` #### Health Checks Failing ```bash # Run health checks manually apt-ostree monitoring --health # Check specific components apt-ostree status ostree log debian/stable/x86_64 ``` ### Debug Mode ```bash # Enable debug logging export RUST_LOG=debug # Run with debug output apt-ostree-monitoring start # Check debug logs sudo journalctl -u apt-ostree-monitoring --log-level=debug ``` ## Best Practices ### Production Deployment 1. **Enable Structured Logging** ```bash export APT_OSTREE_STRUCTURED_LOGGING=1 ``` 2. **Configure Log Rotation** ```bash # Add to /etc/logrotate.d/apt-ostree /var/log/apt-ostree/*.log { daily rotate 30 compress delaycompress missingok notifempty } ``` 3. **Monitor Metrics Storage** ```bash # Check metrics file size du -sh /var/log/apt-ostree/metrics.json # Archive old metrics mv /var/log/apt-ostree/metrics.json /var/log/apt-ostree/metrics-$(date +%Y%m%d).json ``` 4. **Set Up Alerts** ```bash # Monitor health check failures journalctl -u apt-ostree-monitoring | grep "CRITICAL" # Monitor high resource usage apt-ostree monitoring --performance | grep -E "(cpu_usage|memory_usage)" ``` ### Development 1. **Use Performance Monitoring** ```rust let monitor = PerformanceMonitor::new(manager, "operation", context); // ... perform operation ... monitor.success().await?; ``` 2. **Add Health Checks** ```rust // Add custom health checks async fn check_custom_component(&self) -> HealthCheckResult { // Implementation } ``` 3. **Monitor Transactions** ```rust let monitor = TransactionMonitor::new(manager, id, type, count, size); // ... perform transaction ... monitor.success().await?; ``` ## API Reference ### MonitoringManager ```rust impl MonitoringManager { pub fn new(config: MonitoringConfig) -> AptOstreeResult pub fn init_logging(&self) -> AptOstreeResult<()> pub async fn record_system_metrics(&self) -> AptOstreeResult<()> pub async fn record_performance_metrics(&self, ...) -> AptOstreeResult<()> pub async fn start_transaction_monitoring(&self, ...) -> AptOstreeResult<()> pub async fn run_health_checks(&self) -> AptOstreeResult> pub async fn get_statistics(&self) -> AptOstreeResult pub async fn export_metrics(&self) -> AptOstreeResult } ``` ### PerformanceMonitor ```rust impl PerformanceMonitor { pub fn new(manager: Arc, operation: &str, context: HashMap) -> Self pub async fn success(self) -> AptOstreeResult<()> pub async fn failure(self, error_message: String) -> AptOstreeResult<()> } ``` ### TransactionMonitor ```rust impl TransactionMonitor { pub fn new(manager: Arc, id: &str, type: &str, count: u32, size: u64) -> Self pub async fn update_progress(&self, progress: f64) -> AptOstreeResult<()> pub async fn success(self) -> AptOstreeResult<()> pub async fn failure(self, error_message: String) -> AptOstreeResult<()> } ``` ## Conclusion The APT-OSTree monitoring and logging system provides comprehensive visibility into system operations, performance, and health. It enables proactive monitoring, troubleshooting, and optimization of the APT-OSTree system. For more information, see: - [System Administration Guide](system-admin.md) - [Troubleshooting Guide](troubleshooting.md) - [API Documentation](api.md)