apt-ostree/docs/monitoring.md
robojerk 0ba99d6195 OCI Integration & Container Image Generation Complete! 🎉
FEAT: Complete OCI integration with container image generation capabilities

- Add comprehensive OCI module (src/oci.rs) with full specification compliance
- Implement OciImageBuilder for OSTree commit to container image conversion
- Add OciRegistry for push/pull operations with authentication support
- Create OciUtils for image validation, inspection, and format conversion
- Support both OCI and Docker image formats with proper content addressing
- Add SHA256 digest calculation for all image components
- Implement gzip compression for filesystem layers

CLI: Add complete OCI command suite
- apt-ostree oci build - Build OCI images from OSTree commits
- apt-ostree oci push - Push images to container registries
- apt-ostree oci pull - Pull images from registries
- apt-ostree oci inspect - Inspect image information
- apt-ostree oci validate - Validate image integrity
- apt-ostree oci convert - Convert between image formats

COMPOSE: Enhance compose workflow with OCI integration
- apt-ostree compose build-image - Convert deployments to OCI images
- apt-ostree compose container-encapsulate - Generate container images from commits
- apt-ostree compose image - Generate container images from treefiles

ARCH: Add OCI layer to project architecture
- Integrate OCI manager into lib.rs and main.rs
- Add proper error handling and recovery mechanisms
- Include comprehensive testing and validation
- Create test script for OCI functionality validation

DEPS: Add sha256 crate for content addressing
- Update Cargo.toml with sha256 dependency
- Ensure proper async/await handling with tokio::process::Command
- Fix borrow checker issues and lifetime management

DOCS: Update project documentation
- Add OCI integration summary documentation
- Update todo.md with milestone 9 completion
- Include usage examples and workflow documentation
2025-07-19 23:05:39 +00:00

513 lines
No EOL
12 KiB
Markdown

# APT-OSTree Monitoring and Logging
## Overview
APT-OSTree includes a comprehensive monitoring and logging system that provides:
- **Structured Logging**: JSON-formatted logs with timestamps and context
- **Metrics Collection**: System, performance, and transaction metrics
- **Health Checks**: Automated health monitoring of system components
- **Real-time Monitoring**: Background service for continuous monitoring
- **Export Capabilities**: Metrics export in JSON format
## Architecture
### Components
1. **Monitoring Manager** (`src/monitoring.rs`)
- Core monitoring functionality
- Metrics collection and storage
- Health check execution
- Performance monitoring
2. **Monitoring Service** (`src/bin/monitoring-service.rs`)
- Background service for continuous monitoring
- Automated metrics collection
- Health check scheduling
- Metrics export
3. **CLI Integration** (`src/main.rs`)
- Monitoring commands
- Real-time status display
- Metrics export
### Data Flow
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ CLI Commands │───▶│ Monitoring Manager│───▶│ Metrics Storage │
└─────────────────┘ └──────────────────┘ └─────────────────┘
┌──────────────────┐
│ Monitoring Service│
└──────────────────┘
┌──────────────────┐
│ Health Checks │
└──────────────────┘
```
## Configuration
### Monitoring Configuration
```rust
pub struct MonitoringConfig {
pub log_level: String, // "trace", "debug", "info", "warn", "error"
pub log_file: Option<String>, // Optional log file path
pub structured_logging: bool, // Enable JSON logging
pub enable_metrics: bool, // Enable metrics collection
pub metrics_interval: u64, // Metrics collection interval (seconds)
pub enable_health_checks: bool, // Enable health checks
pub health_check_interval: u64, // Health check interval (seconds)
pub enable_performance_monitoring: bool, // Enable performance monitoring
pub enable_transaction_monitoring: bool, // Enable transaction monitoring
pub enable_system_monitoring: bool, // Enable system resource monitoring
}
```
### Environment Variables
```bash
# Log level
export RUST_LOG=info
# Monitoring configuration
export APT_OSTREE_MONITORING_ENABLED=1
export APT_OSTREE_METRICS_INTERVAL=60
export APT_OSTREE_HEALTH_CHECK_INTERVAL=300
```
## Usage
### CLI Commands
#### Show Monitoring Status
```bash
# Show general monitoring status
apt-ostree monitoring
# Export metrics as JSON
apt-ostree monitoring --export
# Run health checks
apt-ostree monitoring --health
# Show performance metrics
apt-ostree monitoring --performance
```
#### Monitoring Service
```bash
# Start monitoring service
apt-ostree-monitoring start
# Stop monitoring service
apt-ostree-monitoring stop
# Show service status
apt-ostree-monitoring status
# Run health check cycle
apt-ostree-monitoring health-check
# Export metrics
apt-ostree-monitoring export-metrics
```
### Systemd Service
```bash
# Enable and start monitoring service
sudo systemctl enable apt-ostree-monitoring
sudo systemctl start apt-ostree-monitoring
# Check service status
sudo systemctl status apt-ostree-monitoring
# View service logs
sudo journalctl -u apt-ostree-monitoring -f
# Stop service
sudo systemctl stop apt-ostree-monitoring
```
## Metrics
### System Metrics
```json
{
"timestamp": "2024-12-19T10:30:00Z",
"cpu_usage": 15.5,
"memory_usage": 8589934592,
"total_memory": 17179869184,
"disk_usage": 107374182400,
"total_disk": 1073741824000,
"active_transactions": 0,
"pending_deployments": 1,
"ostree_repo_size": 5368709120,
"apt_cache_size": 1073741824,
"uptime": 86400,
"load_average": [1.2, 1.1, 1.0]
}
```
### Performance Metrics
```json
{
"timestamp": "2024-12-19T10:30:00Z",
"operation_type": "package_installation",
"duration_ms": 1500,
"success": true,
"error_message": null,
"context": {
"packages_count": "5",
"total_size": "52428800"
}
}
```
### Transaction Metrics
```json
{
"transaction_id": "tx-12345",
"transaction_type": "install",
"start_time": "2024-12-19T10:25:00Z",
"end_time": "2024-12-19T10:26:30Z",
"duration_ms": 90000,
"success": true,
"error_message": null,
"packages_count": 5,
"packages_size": 52428800,
"progress": 1.0
}
```
## Health Checks
### Available Health Checks
1. **OSTree Repository Health**
- Repository integrity
- Commit accessibility
- Storage space
2. **APT Database Health**
- Database integrity
- Package cache status
- Repository connectivity
3. **System Resources**
- Memory availability
- Disk space
- CPU usage
4. **Daemon Health**
- Service status
- D-Bus connectivity
- Authentication
### Health Check Results
```json
{
"check_name": "ostree_repository",
"status": "healthy",
"message": "OSTree repository is healthy",
"timestamp": "2024-12-19T10:30:00Z",
"duration_ms": 150,
"details": {
"repo_size": "5368709120",
"commit_count": "1250",
"integrity_check": "passed"
}
}
```
## Logging
### Log Levels
- **TRACE**: Detailed debugging information
- **DEBUG**: Debugging information
- **INFO**: General information
- **WARN**: Warning messages
- **ERROR**: Error messages
### Log Format
#### Standard Format
```
2024-12-19T10:30:00.123Z INFO apt_ostree::monitoring: Health check passed: ostree_repository
```
#### JSON Format (Structured Logging)
```json
{
"timestamp": "2024-12-19T10:30:00.123Z",
"level": "INFO",
"target": "apt_ostree::monitoring",
"message": "Health check passed: ostree_repository",
"fields": {
"check_name": "ostree_repository",
"duration_ms": 150
}
}
```
### Log Configuration
```bash
# Set log level
export RUST_LOG=info
# Enable structured logging
export APT_OSTREE_STRUCTURED_LOGGING=1
# Log to file
export APT_OSTREE_LOG_FILE=/var/log/apt-ostree/app.log
```
## Performance Monitoring
### Performance Wrappers
```rust
use apt_ostree::monitoring::PerformanceMonitor;
// Monitor an operation
let monitor = PerformanceMonitor::new(
monitoring_manager.clone(),
"package_installation",
context
);
// Record success
monitor.success().await?;
// Record failure
monitor.failure("Package not found".to_string()).await?;
```
### Transaction Monitoring
```rust
use apt_ostree::monitoring::TransactionMonitor;
// Start transaction monitoring
let monitor = TransactionMonitor::new(
monitoring_manager.clone(),
"tx-12345",
"install",
5,
52428800
);
// Update progress
monitor.update_progress(0.5).await?;
// Complete transaction
monitor.success().await?;
```
## Integration
### With Package Manager
The monitoring system integrates with the package manager to track:
- Package installation/removal operations
- Transaction progress
- Performance metrics
- Error tracking
### With OSTree Manager
Integration with OSTree manager provides:
- Commit metadata extraction
- Repository health monitoring
- Deployment tracking
- Rollback monitoring
### With Daemon
The monitoring system works with the daemon to provide:
- Service health monitoring
- D-Bus communication tracking
- Authentication monitoring
- Transaction state tracking
## Troubleshooting
### Common Issues
#### Monitoring Service Not Starting
```bash
# Check service status
sudo systemctl status apt-ostree-monitoring
# Check logs
sudo journalctl -u apt-ostree-monitoring -f
# Check permissions
ls -la /usr/bin/apt-ostree-monitoring
ls -la /var/log/apt-ostree/
```
#### Metrics Not Being Collected
```bash
# Check monitoring configuration
apt-ostree monitoring --export
# Verify service is running
sudo systemctl is-active apt-ostree-monitoring
# Check metrics file
cat /var/log/apt-ostree/metrics.json
```
#### Health Checks Failing
```bash
# Run health checks manually
apt-ostree monitoring --health
# Check specific components
apt-ostree status
ostree log debian/stable/x86_64
```
### Debug Mode
```bash
# Enable debug logging
export RUST_LOG=debug
# Run with debug output
apt-ostree-monitoring start
# Check debug logs
sudo journalctl -u apt-ostree-monitoring --log-level=debug
```
## Best Practices
### Production Deployment
1. **Enable Structured Logging**
```bash
export APT_OSTREE_STRUCTURED_LOGGING=1
```
2. **Configure Log Rotation**
```bash
# Add to /etc/logrotate.d/apt-ostree
/var/log/apt-ostree/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
}
```
3. **Monitor Metrics Storage**
```bash
# Check metrics file size
du -sh /var/log/apt-ostree/metrics.json
# Archive old metrics
mv /var/log/apt-ostree/metrics.json /var/log/apt-ostree/metrics-$(date +%Y%m%d).json
```
4. **Set Up Alerts**
```bash
# Monitor health check failures
journalctl -u apt-ostree-monitoring | grep "CRITICAL"
# Monitor high resource usage
apt-ostree monitoring --performance | grep -E "(cpu_usage|memory_usage)"
```
### Development
1. **Use Performance Monitoring**
```rust
let monitor = PerformanceMonitor::new(manager, "operation", context);
// ... perform operation ...
monitor.success().await?;
```
2. **Add Health Checks**
```rust
// Add custom health checks
async fn check_custom_component(&self) -> HealthCheckResult {
// Implementation
}
```
3. **Monitor Transactions**
```rust
let monitor = TransactionMonitor::new(manager, id, type, count, size);
// ... perform transaction ...
monitor.success().await?;
```
## API Reference
### MonitoringManager
```rust
impl MonitoringManager {
pub fn new(config: MonitoringConfig) -> AptOstreeResult<Self>
pub fn init_logging(&self) -> AptOstreeResult<()>
pub async fn record_system_metrics(&self) -> AptOstreeResult<()>
pub async fn record_performance_metrics(&self, ...) -> AptOstreeResult<()>
pub async fn start_transaction_monitoring(&self, ...) -> AptOstreeResult<()>
pub async fn run_health_checks(&self) -> AptOstreeResult<Vec<HealthCheckResult>>
pub async fn get_statistics(&self) -> AptOstreeResult<MonitoringStatistics>
pub async fn export_metrics(&self) -> AptOstreeResult<String>
}
```
### PerformanceMonitor
```rust
impl PerformanceMonitor {
pub fn new(manager: Arc<MonitoringManager>, operation: &str, context: HashMap<String, String>) -> Self
pub async fn success(self) -> AptOstreeResult<()>
pub async fn failure(self, error_message: String) -> AptOstreeResult<()>
}
```
### TransactionMonitor
```rust
impl TransactionMonitor {
pub fn new(manager: Arc<MonitoringManager>, id: &str, type: &str, count: u32, size: u64) -> Self
pub async fn update_progress(&self, progress: f64) -> AptOstreeResult<()>
pub async fn success(self) -> AptOstreeResult<()>
pub async fn failure(self, error_message: String) -> AptOstreeResult<()>
}
```
## Conclusion
The APT-OSTree monitoring and logging system provides comprehensive visibility into system operations, performance, and health. It enables proactive monitoring, troubleshooting, and optimization of the APT-OSTree system.
For more information, see:
- [System Administration Guide](system-admin.md)
- [Troubleshooting Guide](troubleshooting.md)
- [API Documentation](api.md)