# Debian Forge Troubleshooting Guides ## Overview This document provides comprehensive troubleshooting information for Debian Forge, including common issues, diagnostic procedures, and step-by-step solutions. ## Quick Diagnostic Commands ### System Health Check ```bash # Check system resources htop df -h free -h iostat -x 1 # Check service status sudo systemctl status debian-forge-* sudo supervisorctl status # Check logs tail -f /var/log/debian-forge/*.log journalctl -u debian-forge-* -f # Check network ping -c 3 8.8.8.8 curl -I http://deb.debian.org/debian ``` ### Debian Forge Status Check ```bash # Check build queue curl -s http://localhost:8080/api/v1/queue/status | jq # Check active builds curl -s http://localhost:8080/api/v1/builds/active | jq # Check system health curl -s http://localhost:8080/api/v1/health | jq # Check OSTree repository ostree refs --repo=/var/lib/debian-forge/ostree ostree log --repo=/var/lib/debian-forge/ostree debian/bookworm/amd64 ``` ## Common Issues and Solutions ### 1. Build Failures #### Issue: Build Process Hangs **Symptoms**: Build appears to be running but no progress for extended periods **Diagnosis**: ```bash # Check build process ps aux | grep osbuild ps aux | grep debootstrap # Check system resources htop df -h /tmp df -h /var/lib/debian-forge # Check build logs tail -f /var/log/debian-forge/worker.log ``` **Solutions**: ```bash # Kill hanging processes sudo pkill -f osbuild sudo pkill -f debootstrap # Clean temporary files sudo rm -rf /tmp/osbuild-* sudo rm -rf /var/lib/debian-forge/tmp/* # Restart worker service sudo supervisorctl restart debian-forge-worker # Check for disk space issues sudo du -sh /var/lib/debian-forge/* | sort -hr ``` #### Issue: Package Installation Failures **Symptoms**: Build fails during package installation with APT errors **Diagnosis**: ```bash # Check APT configuration cat /etc/apt/sources.list ls -la /etc/apt/sources.list.d/ # Test package availability apt-cache policy package-name apt-cache search package-name # Check network connectivity curl -I http://deb.debian.org/debian ping -c 3 security.debian.org ``` **Solutions**: ```bash # Update package lists sudo apt update # Fix broken packages sudo apt --fix-broken install # Clear APT cache sudo apt clean sudo apt autoclean # Check for repository issues sudo apt update 2>&1 | grep -i error # Verify GPG keys sudo apt-key list sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys KEY_ID ``` #### Issue: OSTree Commit Failures **Symptoms**: Build fails during OSTree commit creation **Diagnosis**: ```bash # Check OSTree repository ostree refs --repo=/var/lib/debian-forge/ostree ostree log --repo=/var/lib/debian-forge/ostree # Check repository permissions ls -la /var/lib/debian-forge/ostree/ id debian-forge # Check disk space df -h /var/lib/debian-forge/ostree ``` **Solutions**: ```bash # Fix repository permissions sudo chown -R debian-forge:debian-forge /var/lib/debian-forge/ostree sudo chmod -R 755 /var/lib/debian-forge/ostree # Initialize repository if corrupted sudo -u debian-forge ostree init --mode=archive-z2 /var/lib/debian-forge/ostree # Clean old commits sudo -u debian-forge ostree refs --repo=/var/lib/debian-forge/ostree | xargs -I {} ostree delete --repo=/var/lib/debian-forge/ostree {} # Check for corrupted objects sudo -u debian-forge ostree fsck --repo=/var/lib/debian-forge/ostree ``` ### 2. Service Issues #### Issue: API Service Not Responding **Symptoms**: HTTP requests to API endpoints timeout or return errors **Diagnosis**: ```bash # Check service status sudo supervisorctl status debian-forge-api sudo systemctl status nginx # Check port binding sudo netstat -tlnp | grep :8080 sudo ss -tlnp | grep :8080 # Check firewall sudo ufw status sudo iptables -L # Test local connectivity curl -v http://localhost:8080/health ``` **Solutions**: ```bash # Restart API service sudo supervisorctl restart debian-forge-api # Check configuration sudo cat /etc/supervisor/conf.d/debian-forge.conf sudo cat /etc/nginx/sites-available/debian-forge # Verify Python environment sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c "import flask; print('OK')" # Check logs for errors sudo tail -f /var/log/debian-forge/api.log sudo tail -f /var/log/nginx/error.log ``` #### Issue: Worker Service Not Processing Builds **Symptoms**: Builds remain in QUEUED status indefinitely **Diagnosis**: ```bash # Check worker status sudo supervisorctl status debian-forge-worker # Check worker logs sudo tail -f /var/log/debian-forge/worker.log # Check build queue curl -s http://localhost:8080/api/v1/queue/status | jq # Check system resources htop df -h free -h ``` **Solutions**: ```bash # Restart worker service sudo supervisorctl restart debian-forge-worker # Check Python dependencies sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/pip list # Verify build environment sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c "import build_orchestrator; print('OK')" # Check for resource constraints sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python test-resource-allocation.py ``` ### 3. Resource Issues #### Issue: High CPU Usage **Symptoms**: System becomes unresponsive, builds slow down **Diagnosis**: ```bash # Check CPU usage htop top -p 1 iostat -x 1 # Identify high-CPU processes ps aux --sort=-%cpu | head -10 # Check build processes ps aux | grep osbuild ps aux | grep debootstrap ``` **Solutions**: ```bash # Reduce concurrent builds sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c " from build_orchestrator import BuildOrchestrator o = BuildOrchestrator() o.resource_manager.set_concurrent_builds(2) " # Kill runaway processes sudo pkill -f osbuild sudo pkill -f debootstrap # Check for infinite loops sudo tail -f /var/log/debian-forge/worker.log | grep -i "cpu\|loop" # Restart services sudo supervisorctl restart all ``` #### Issue: High Memory Usage **Symptoms**: Out of memory errors, system swapping **Diagnosis**: ```bash # Check memory usage free -h cat /proc/meminfo | grep -E "(MemTotal|MemFree|MemAvailable|SwapTotal|SwapFree)" # Check for memory leaks ps aux --sort=-%mem | head -10 # Check swap usage swapon --show cat /proc/swaps ``` **Solutions**: ```bash # Clear memory caches sudo sync && sudo echo 3 | sudo tee /proc/sys/vm/drop_caches # Restart memory-intensive services sudo supervisorctl restart debian-forge-worker # Check for memory leaks in logs sudo tail -f /var/log/debian-forge/worker.log | grep -i "memory\|leak" # Reduce memory limits sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c " from build_orchestrator import BuildOrchestrator o = BuildOrchestrator() o.resource_manager.set_build_limits(max_memory=70) " ``` #### Issue: Disk Space Exhaustion **Symptoms**: Builds fail with "no space left on device" errors **Diagnosis**: ```bash # Check disk usage df -h du -sh /var/lib/debian-forge/* du -sh /tmp/* du -sh .osbuild/ # Check for large files find /var/lib/debian-forge -type f -size +100M -exec ls -lh {} \; find /tmp -type f -size +100M -exec ls -lh {} \; # Check inode usage df -i ``` **Solutions**: ```bash # Clean build artifacts sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -m cleanup_manager --force # Clean temporary files sudo rm -rf /tmp/osbuild-* sudo rm -rf /var/lib/debian-forge/tmp/* # Clean old OSTree commits sudo -u debian-forge ostree refs --repo=/var/lib/debian-forge/ostree | xargs -I {} ostree delete --repo=/var/lib/debian-forge/ostree {} # Clean package cache sudo apt clean sudo apt autoclean # Check for log rotation sudo logrotate -f /etc/logrotate.d/debian-forge ``` ### 4. Network Issues #### Issue: Package Download Failures **Symptoms**: Builds fail when downloading packages from repositories **Diagnosis**: ```bash # Test network connectivity ping -c 3 8.8.8.8 ping -c 3 deb.debian.org ping -c 3 security.debian.org # Check DNS resolution nslookup deb.debian.org dig deb.debian.org # Test HTTP connectivity curl -I http://deb.debian.org/debian curl -I https://security.debian.org/debian-security # Check proxy configuration echo $http_proxy echo $https_proxy cat /etc/apt/apt.conf.d/*proxy* ``` **Solutions**: ```bash # Fix DNS issues echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf echo "nameserver 8.8.4.4" | sudo tee -a /etc/resolv.conf # Test without proxy unset http_proxy https_proxy sudo apt update # Check firewall rules sudo ufw status sudo iptables -L # Verify repository URLs sudo apt update 2>&1 | grep -i "failed\|error\|unreachable" ``` #### Issue: apt-cacher-ng Connection Problems **Symptoms**: Builds fail with proxy connection errors **Diagnosis**: ```bash # Check apt-cacher-ng status sudo systemctl status apt-cacher-ng sudo netstat -tlnp | grep :3142 # Test proxy connectivity curl -I http://192.168.1.101:3142 telnet 192.168.1.101 3142 # Check proxy configuration cat /etc/apt/apt.conf.d/99proxy ``` **Solutions**: ```bash # Restart apt-cacher-ng sudo systemctl restart apt-cacher-ng # Verify proxy configuration echo 'Acquire::http::Proxy "http://192.168.1.101:3142";' | sudo tee /etc/apt/apt.conf.d/99proxy echo 'Acquire::https::Proxy "http://192.168.1.101:3142";' | sudo tee -a /etc/apt/apt.conf.d/99proxy # Test proxy curl -x http://192.168.1.101:3142 http://deb.debian.org/debian # Check proxy logs sudo tail -f /var/log/apt-cacher-ng/apt-cacher.log ``` ### 5. Configuration Issues #### Issue: Invalid Manifest Format **Symptoms**: Builds fail with "invalid manifest" errors **Diagnosis**: ```bash # Validate manifest syntax python3 -m osbuild --libdir . --check-only manifest.json # Check JSON syntax python3 -c "import json; json.load(open('manifest.json')); print('Valid JSON')" # Check manifest schema python3 -c " import json schema = json.load(open('schemas/osbuild2.json')) manifest = json.load(open('manifest.json')) print('Schema validation needed') " ``` **Solutions**: ```bash # Fix JSON syntax python3 -m json.tool manifest.json > manifest_fixed.json mv manifest_fixed.json manifest.json # Validate against schema python3 -c " import json from jsonschema import validate schema = json.load(open('schemas/osbuild2.json')) manifest = json.load(open('manifest.json')) validate(instance=manifest, schema=schema) print('Valid manifest') " # Check stage names python3 -c " import json manifest = json.load(open('manifest.json')) stages = manifest.get('pipeline', {}).get('stages', []) for stage in stages: print(f'Stage: {stage.get(\"name\", \"unknown\")}') " ``` #### Issue: Missing Dependencies **Symptoms**: Builds fail with "command not found" or import errors **Diagnosis**: ```bash # Check Python dependencies pip list pip check # Check system packages which debootstrap which ostree which sbuild # Check Python path python3 -c "import sys; print('\n'.join(sys.path))" python3 -c "import build_orchestrator; print('OK')" ``` **Solutions**: ```bash # Install missing Python packages pip install -r requirements.txt # Install missing system packages sudo apt update sudo apt install -y debootstrap ostree sbuild pbuilder # Fix Python path export PYTHONPATH="${PYTHONPATH}:/home/debian-forge/debian-forge" echo 'export PYTHONPATH="${PYTHONPATH}:/home/debian-forge/debian-forge"' >> ~/.bashrc # Check virtual environment source venv/bin/activate pip list ``` ## Advanced Troubleshooting ### 1. Performance Analysis #### Build Performance Profiling ```bash # Profile build execution python3 -m cProfile -o build_profile.prof test-complete-pipeline.py # Analyze profile results python3 -c " import pstats p = pstats.Stats('build_profile.prof') p.sort_stats('cumulative') p.print_stats(20) " # Memory profiling python3 -m memory_profiler test-complete-pipeline.py ``` #### System Performance Monitoring ```bash # Monitor system during build sar -u 1 60 > cpu_usage.log & sar -r 1 60 > memory_usage.log & sar -d 1 60 > disk_usage.log & # Start build python3 -m osbuild --libdir . manifest.json # Stop monitoring pkill sar # Analyze results python3 -c " import pandas as pd cpu = pd.read_csv('cpu_usage.log', sep='\s+', skiprows=2) print(f'Average CPU: {cpu[\"%user\"].mean():.1f}%') print(f'Peak CPU: {cpu[\"%user\"].max():.1f}%') " ``` ### 2. Debug Mode #### Enable Debug Logging ```bash # Set debug environment variables export OSBUILD_DEBUG=1 export OSBUILD_VERBOSE=1 export DEBIAN_FORGE_DEBUG=1 # Run with debug output python3 -m osbuild --libdir . --verbose manifest.json # Check debug logs tail -f /var/log/debian-forge/debug.log ``` #### Python Debugger ```bash # Add breakpoints in code import pdb; pdb.set_trace() # Run with debugger python3 -m pdb test-complete-pipeline.py # Common debugger commands # n (next), s (step), c (continue), p variable_name, l (list), q (quit) ``` ### 3. Log Analysis #### Log Parsing and Analysis ```bash # Extract error patterns grep -i "error\|fail\|exception" /var/log/debian-forge/*.log | head -20 # Count error types grep -i "error" /var/log/debian-forge/*.log | cut -d: -f2 | sort | uniq -c | sort -nr # Extract build timing information grep "Build completed" /var/log/debian-forge/worker.log | awk '{print $1, $2, $NF}' | tail -10 # Analyze resource usage patterns grep "Resource usage" /var/log/debian-forge/worker.log | tail -20 ``` #### Log Correlation ```bash # Correlate errors across services echo "=== API Errors ===" grep -i "error" /var/log/debian-forge/api.log | tail -5 echo "=== Worker Errors ===" grep -i "error" /var/log/debian-forge/worker.log | tail -5 echo "=== System Errors ===" journalctl -u debian-forge-* --since "1 hour ago" | grep -i "error" | tail -5 ``` ## Recovery Procedures ### 1. Service Recovery #### Complete Service Restart ```bash #!/bin/bash # /home/debian-forge/debian-forge/scripts/service-recovery.sh echo "Starting complete service recovery..." # Stop all services sudo supervisorctl stop all sudo systemctl stop nginx # Clean up temporary files sudo rm -rf /tmp/debian-forge-* sudo rm -rf /var/lib/debian-forge/tmp/* # Restart system services sudo systemctl start postgresql sudo systemctl start redis-server sudo systemctl start nginx # Wait for services to be ready sleep 10 # Start application services sudo supervisorctl start all # Check status sudo supervisorctl status sudo systemctl status nginx echo "Service recovery completed" ``` #### Database Recovery ```bash #!/bin/bash # /home/debian-forge/debian-forge/scripts/db-recovery.sh echo "Starting database recovery..." # Check database status sudo systemctl status postgresql # Test database connection sudo -u debian-forge psql -d debian_forge -c "SELECT version();" if [ $? -ne 0 ]; then echo "Database connection failed, attempting recovery..." # Restart database sudo systemctl restart postgresql sleep 10 # Test connection again sudo -u debian-forge psql -d debian_forge -c "SELECT version();" if [ $? -eq 0 ]; then echo "Database recovery successful" else echo "Database recovery failed" exit 1 fi else echo "Database is healthy" fi ``` ### 2. Data Recovery #### Build Artifact Recovery ```bash #!/bin/bash # /home/debian-forge/debian-forge/scripts/artifact-recovery.sh echo "Starting artifact recovery..." # Check for corrupted objects sudo -u debian-forge ostree fsck --repo=/var/lib/debian-forge/ostree # Remove corrupted objects sudo -u debian-forge ostree prune --repo=/var/lib/debian-forge/ostree --refs-only # Rebuild object index sudo -u debian-forge ostree summary --repo=/var/lib/debian-forge/ostree --update # Verify repository integrity sudo -u debian-forge ostree fsck --repo=/var/lib/debian-forge/ostree echo "Artifact recovery completed" ``` #### Configuration Recovery ```bash #!/bin/bash # /home/debian-forge/debian-forge/scripts/config-recovery.sh BACKUP_DIR="/var/backups/debian-forge" LATEST_CONFIG=$(ls -t $BACKUP_DIR/config_*.tar.gz | head -1) if [ -n "$LATEST_CONFIG" ]; then echo "Restoring configuration from: $LATEST_CONFIG" # Stop services sudo supervisorctl stop all # Restore configuration sudo tar -xzf $LATEST_CONFIG -C / # Fix permissions sudo chown -R debian-forge:debian-forge /home/debian-forge/debian-forge/config # Restart services sudo supervisorctl start all echo "Configuration recovery completed" else echo "No configuration backup found" exit 1 fi ``` ## Prevention Strategies ### 1. Monitoring and Alerting #### Health Check Automation ```bash #!/bin/bash # /etc/cron.daily/debian-forge-health-check # Run health checks /home/debian-forge/debian-forge/venv/bin/python /home/debian-forge/debian-forge/health_check.py # Check for critical issues if [ $? -ne 0 ]; then # Send alert /home/debian-forge/debian-forge/scripts/alert.py "Health check failed" "Critical system issue detected" # Attempt auto-recovery /home/debian-forge/debian-forge/scripts/service-recovery.sh fi ``` #### Resource Monitoring ```bash #!/bin/bash # /etc/cron.hourly/debian-forge-resource-check # Check disk space DISK_USAGE=$(df /var/lib/debian-forge | tail -1 | awk '{print $5}' | sed 's/%//') if [ $DISK_USAGE -gt 85 ]; then /home/debian-forge/debian-forge/scripts/alert.py "High disk usage" "Disk usage is ${DISK_USAGE}%" # Trigger cleanup /home/debian-forge/debian-forge/venv/bin/python -m cleanup_manager --force fi # Check memory usage MEMORY_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}') if [ $MEMORY_USAGE -gt 90 ]; then /home/debian-forge/debian-forge/scripts/alert.py "High memory usage" "Memory usage is ${MEMORY_USAGE}%" fi ``` ### 2. Maintenance Windows #### Scheduled Maintenance ```bash #!/bin/bash # /etc/cron.weekly/debian-forge-maintenance echo "Starting scheduled maintenance..." # Stop services sudo supervisorctl stop all # Update system packages sudo apt update && sudo apt upgrade -y # Clean old artifacts /home/debian-forge/debian-forge/venv/bin/python -m cleanup_manager --force # Rotate logs sudo logrotate -f /etc/logrotate.d/debian-forge # Restart services sudo supervisorctl start all echo "Scheduled maintenance completed" ``` ## Getting Help ### 1. Self-Service Resources #### Documentation - **User Guide**: `/home/debian-forge/debian-forge-docs/user-documentation.md` - **Deployment Guide**: `/home/debian-forge/debian-forge-docs/deployment-documentation.md` - **Architecture Guide**: `/home/debian-forge/debian-forge-docs/osbuild-architecture.md` #### Test Scripts ```bash # Run diagnostic tests python3 test-apt-stage.py python3 test-resource-allocation.py python3 test-build-orchestration.py python3 test-complete-pipeline.py # Run performance tests python3 test-performance-optimization.py python3 test-stress-testing.py ``` ### 2. Community Support #### Issue Reporting When reporting issues, include: - **System information**: OS version, Python version, installed packages - **Error messages**: Complete error logs and stack traces - **Reproduction steps**: Detailed steps to reproduce the issue - **Environment**: Development or production, configuration details - **Recent changes**: Any recent modifications to the system #### Debug Information Collection ```bash #!/bin/bash # /home/debian-forge/debian-forge/scripts/collect-debug-info.sh DEBUG_DIR="/tmp/debian-forge-debug-$(date +%Y%m%d_%H%M%S)" mkdir -p $DEBUG_DIR echo "Collecting debug information..." # System information uname -a > $DEBUG_DIR/system-info.txt cat /etc/os-release > $DEBUG_DIR/os-release.txt python3 --version > $DEBUG_DIR/python-version.txt # Service status sudo supervisorctl status > $DEBUG_DIR/supervisor-status.txt sudo systemctl status debian-forge-* > $DEBUG_DIR/systemd-status.txt # Configuration files cp -r /home/debian-forge/debian-forge/config $DEBUG_DIR/ cp /etc/supervisor/conf.d/debian-forge.conf $DEBUG_DIR/ cp /etc/nginx/sites-available/debian-forge $DEBUG_DIR/ # Logs (last 1000 lines) tail -1000 /var/log/debian-forge/*.log > $DEBUG_DIR/recent-logs.txt # Resource usage df -h > $DEBUG_DIR/disk-usage.txt free -h > $DEBUG_DIR/memory-usage.txt ps aux > $DEBUG_DIR/process-list.txt # Package information pip list > $DEBUG_DIR/python-packages.txt dpkg -l | grep -E "(debian-forge|osbuild|ostree)" > $DEBUG_DIR/system-packages.txt echo "Debug information collected in: $DEBUG_DIR" echo "Please include this directory when reporting issues" ``` ## Conclusion This troubleshooting guide provides comprehensive information for diagnosing and resolving common Debian Forge issues. Key points: 1. **Start with quick diagnostics** to identify the problem area 2. **Use systematic troubleshooting** to isolate root causes 3. **Implement recovery procedures** to restore service 4. **Apply prevention strategies** to avoid future issues 5. **Collect debug information** when seeking community help For additional support, refer to the project documentation or create detailed issue reports with the collected debug information.