debian-forge/debian-forge-docs/troubleshooting-guides.md

21 KiB

Debian Forge Troubleshooting Guides

Overview

This document provides comprehensive troubleshooting information for Debian Forge, including common issues, diagnostic procedures, and step-by-step solutions.

Quick Diagnostic Commands

System Health Check

# Check system resources
htop
df -h
free -h
iostat -x 1

# Check service status
sudo systemctl status debian-forge-*
sudo supervisorctl status

# Check logs
tail -f /var/log/debian-forge/*.log
journalctl -u debian-forge-* -f

# Check network
ping -c 3 8.8.8.8
curl -I http://deb.debian.org/debian

Debian Forge Status Check

# Check build queue
curl -s http://localhost:8080/api/v1/queue/status | jq

# Check active builds
curl -s http://localhost:8080/api/v1/builds/active | jq

# Check system health
curl -s http://localhost:8080/api/v1/health | jq

# Check OSTree repository
ostree refs --repo=/var/lib/debian-forge/ostree
ostree log --repo=/var/lib/debian-forge/ostree debian/bookworm/amd64

Common Issues and Solutions

1. Build Failures

Issue: Build Process Hangs

Symptoms: Build appears to be running but no progress for extended periods Diagnosis:

# Check build process
ps aux | grep osbuild
ps aux | grep debootstrap

# Check system resources
htop
df -h /tmp
df -h /var/lib/debian-forge

# Check build logs
tail -f /var/log/debian-forge/worker.log

Solutions:

# Kill hanging processes
sudo pkill -f osbuild
sudo pkill -f debootstrap

# Clean temporary files
sudo rm -rf /tmp/osbuild-*
sudo rm -rf /var/lib/debian-forge/tmp/*

# Restart worker service
sudo supervisorctl restart debian-forge-worker

# Check for disk space issues
sudo du -sh /var/lib/debian-forge/* | sort -hr

Issue: Package Installation Failures

Symptoms: Build fails during package installation with APT errors Diagnosis:

# Check APT configuration
cat /etc/apt/sources.list
ls -la /etc/apt/sources.list.d/

# Test package availability
apt-cache policy package-name
apt-cache search package-name

# Check network connectivity
curl -I http://deb.debian.org/debian
ping -c 3 security.debian.org

Solutions:

# Update package lists
sudo apt update

# Fix broken packages
sudo apt --fix-broken install

# Clear APT cache
sudo apt clean
sudo apt autoclean

# Check for repository issues
sudo apt update 2>&1 | grep -i error

# Verify GPG keys
sudo apt-key list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys KEY_ID

Issue: OSTree Commit Failures

Symptoms: Build fails during OSTree commit creation Diagnosis:

# Check OSTree repository
ostree refs --repo=/var/lib/debian-forge/ostree
ostree log --repo=/var/lib/debian-forge/ostree

# Check repository permissions
ls -la /var/lib/debian-forge/ostree/
id debian-forge

# Check disk space
df -h /var/lib/debian-forge/ostree

Solutions:

# Fix repository permissions
sudo chown -R debian-forge:debian-forge /var/lib/debian-forge/ostree
sudo chmod -R 755 /var/lib/debian-forge/ostree

# Initialize repository if corrupted
sudo -u debian-forge ostree init --mode=archive-z2 /var/lib/debian-forge/ostree

# Clean old commits
sudo -u debian-forge ostree refs --repo=/var/lib/debian-forge/ostree | xargs -I {} ostree delete --repo=/var/lib/debian-forge/ostree {}

# Check for corrupted objects
sudo -u debian-forge ostree fsck --repo=/var/lib/debian-forge/ostree

2. Service Issues

Issue: API Service Not Responding

Symptoms: HTTP requests to API endpoints timeout or return errors Diagnosis:

# Check service status
sudo supervisorctl status debian-forge-api
sudo systemctl status nginx

# Check port binding
sudo netstat -tlnp | grep :8080
sudo ss -tlnp | grep :8080

# Check firewall
sudo ufw status
sudo iptables -L

# Test local connectivity
curl -v http://localhost:8080/health

Solutions:

# Restart API service
sudo supervisorctl restart debian-forge-api

# Check configuration
sudo cat /etc/supervisor/conf.d/debian-forge.conf
sudo cat /etc/nginx/sites-available/debian-forge

# Verify Python environment
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c "import flask; print('OK')"

# Check logs for errors
sudo tail -f /var/log/debian-forge/api.log
sudo tail -f /var/log/nginx/error.log

Issue: Worker Service Not Processing Builds

Symptoms: Builds remain in QUEUED status indefinitely Diagnosis:

# Check worker status
sudo supervisorctl status debian-forge-worker

# Check worker logs
sudo tail -f /var/log/debian-forge/worker.log

# Check build queue
curl -s http://localhost:8080/api/v1/queue/status | jq

# Check system resources
htop
df -h
free -h

Solutions:

# Restart worker service
sudo supervisorctl restart debian-forge-worker

# Check Python dependencies
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/pip list

# Verify build environment
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c "import build_orchestrator; print('OK')"

# Check for resource constraints
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python test-resource-allocation.py

3. Resource Issues

Issue: High CPU Usage

Symptoms: System becomes unresponsive, builds slow down Diagnosis:

# Check CPU usage
htop
top -p 1
iostat -x 1

# Identify high-CPU processes
ps aux --sort=-%cpu | head -10

# Check build processes
ps aux | grep osbuild
ps aux | grep debootstrap

Solutions:

# Reduce concurrent builds
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c "
from build_orchestrator import BuildOrchestrator
o = BuildOrchestrator()
o.resource_manager.set_concurrent_builds(2)
"

# Kill runaway processes
sudo pkill -f osbuild
sudo pkill -f debootstrap

# Check for infinite loops
sudo tail -f /var/log/debian-forge/worker.log | grep -i "cpu\|loop"

# Restart services
sudo supervisorctl restart all

Issue: High Memory Usage

Symptoms: Out of memory errors, system swapping Diagnosis:

# Check memory usage
free -h
cat /proc/meminfo | grep -E "(MemTotal|MemFree|MemAvailable|SwapTotal|SwapFree)"

# Check for memory leaks
ps aux --sort=-%mem | head -10

# Check swap usage
swapon --show
cat /proc/swaps

Solutions:

# Clear memory caches
sudo sync && sudo echo 3 | sudo tee /proc/sys/vm/drop_caches

# Restart memory-intensive services
sudo supervisorctl restart debian-forge-worker

# Check for memory leaks in logs
sudo tail -f /var/log/debian-forge/worker.log | grep -i "memory\|leak"

# Reduce memory limits
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c "
from build_orchestrator import BuildOrchestrator
o = BuildOrchestrator()
o.resource_manager.set_build_limits(max_memory=70)
"

Issue: Disk Space Exhaustion

Symptoms: Builds fail with "no space left on device" errors Diagnosis:

# Check disk usage
df -h
du -sh /var/lib/debian-forge/*
du -sh /tmp/*
du -sh .osbuild/

# Check for large files
find /var/lib/debian-forge -type f -size +100M -exec ls -lh {} \;
find /tmp -type f -size +100M -exec ls -lh {} \;

# Check inode usage
df -i

Solutions:

# Clean build artifacts
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -m cleanup_manager --force

# Clean temporary files
sudo rm -rf /tmp/osbuild-*
sudo rm -rf /var/lib/debian-forge/tmp/*

# Clean old OSTree commits
sudo -u debian-forge ostree refs --repo=/var/lib/debian-forge/ostree | xargs -I {} ostree delete --repo=/var/lib/debian-forge/ostree {}

# Clean package cache
sudo apt clean
sudo apt autoclean

# Check for log rotation
sudo logrotate -f /etc/logrotate.d/debian-forge

4. Network Issues

Issue: Package Download Failures

Symptoms: Builds fail when downloading packages from repositories Diagnosis:

# Test network connectivity
ping -c 3 8.8.8.8
ping -c 3 deb.debian.org
ping -c 3 security.debian.org

# Check DNS resolution
nslookup deb.debian.org
dig deb.debian.org

# Test HTTP connectivity
curl -I http://deb.debian.org/debian
curl -I https://security.debian.org/debian-security

# Check proxy configuration
echo $http_proxy
echo $https_proxy
cat /etc/apt/apt.conf.d/*proxy*

Solutions:

# Fix DNS issues
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
echo "nameserver 8.8.4.4" | sudo tee -a /etc/resolv.conf

# Test without proxy
unset http_proxy https_proxy
sudo apt update

# Check firewall rules
sudo ufw status
sudo iptables -L

# Verify repository URLs
sudo apt update 2>&1 | grep -i "failed\|error\|unreachable"

Issue: apt-cacher-ng Connection Problems

Symptoms: Builds fail with proxy connection errors Diagnosis:

# Check apt-cacher-ng status
sudo systemctl status apt-cacher-ng
sudo netstat -tlnp | grep :3142

# Test proxy connectivity
curl -I http://192.168.1.101:3142
telnet 192.168.1.101 3142

# Check proxy configuration
cat /etc/apt/apt.conf.d/99proxy

Solutions:

# Restart apt-cacher-ng
sudo systemctl restart apt-cacher-ng

# Verify proxy configuration
echo 'Acquire::http::Proxy "http://192.168.1.101:3142";' | sudo tee /etc/apt/apt.conf.d/99proxy
echo 'Acquire::https::Proxy "http://192.168.1.101:3142";' | sudo tee -a /etc/apt/apt.conf.d/99proxy

# Test proxy
curl -x http://192.168.1.101:3142 http://deb.debian.org/debian

# Check proxy logs
sudo tail -f /var/log/apt-cacher-ng/apt-cacher.log

5. Configuration Issues

Issue: Invalid Manifest Format

Symptoms: Builds fail with "invalid manifest" errors Diagnosis:

# Validate manifest syntax
python3 -m osbuild --libdir . --check-only manifest.json

# Check JSON syntax
python3 -c "import json; json.load(open('manifest.json')); print('Valid JSON')"

# Check manifest schema
python3 -c "
import json
schema = json.load(open('schemas/osbuild2.json'))
manifest = json.load(open('manifest.json'))
print('Schema validation needed')
"

Solutions:

# Fix JSON syntax
python3 -m json.tool manifest.json > manifest_fixed.json
mv manifest_fixed.json manifest.json

# Validate against schema
python3 -c "
import json
from jsonschema import validate
schema = json.load(open('schemas/osbuild2.json'))
manifest = json.load(open('manifest.json'))
validate(instance=manifest, schema=schema)
print('Valid manifest')
"

# Check stage names
python3 -c "
import json
manifest = json.load(open('manifest.json'))
stages = manifest.get('pipeline', {}).get('stages', [])
for stage in stages:
    print(f'Stage: {stage.get(\"name\", \"unknown\")}')
"

Issue: Missing Dependencies

Symptoms: Builds fail with "command not found" or import errors Diagnosis:

# Check Python dependencies
pip list
pip check

# Check system packages
which debootstrap
which ostree
which sbuild

# Check Python path
python3 -c "import sys; print('\n'.join(sys.path))"
python3 -c "import build_orchestrator; print('OK')"

Solutions:

# Install missing Python packages
pip install -r requirements.txt

# Install missing system packages
sudo apt update
sudo apt install -y debootstrap ostree sbuild pbuilder

# Fix Python path
export PYTHONPATH="${PYTHONPATH}:/home/debian-forge/debian-forge"
echo 'export PYTHONPATH="${PYTHONPATH}:/home/debian-forge/debian-forge"' >> ~/.bashrc

# Check virtual environment
source venv/bin/activate
pip list

Advanced Troubleshooting

1. Performance Analysis

Build Performance Profiling

# Profile build execution
python3 -m cProfile -o build_profile.prof test-complete-pipeline.py

# Analyze profile results
python3 -c "
import pstats
p = pstats.Stats('build_profile.prof')
p.sort_stats('cumulative')
p.print_stats(20)
"

# Memory profiling
python3 -m memory_profiler test-complete-pipeline.py

System Performance Monitoring

# Monitor system during build
sar -u 1 60 > cpu_usage.log &
sar -r 1 60 > memory_usage.log &
sar -d 1 60 > disk_usage.log &

# Start build
python3 -m osbuild --libdir . manifest.json

# Stop monitoring
pkill sar

# Analyze results
python3 -c "
import pandas as pd
cpu = pd.read_csv('cpu_usage.log', sep='\s+', skiprows=2)
print(f'Average CPU: {cpu[\"%user\"].mean():.1f}%')
print(f'Peak CPU: {cpu[\"%user\"].max():.1f}%')
"

2. Debug Mode

Enable Debug Logging

# Set debug environment variables
export OSBUILD_DEBUG=1
export OSBUILD_VERBOSE=1
export DEBIAN_FORGE_DEBUG=1

# Run with debug output
python3 -m osbuild --libdir . --verbose manifest.json

# Check debug logs
tail -f /var/log/debian-forge/debug.log

Python Debugger

# Add breakpoints in code
import pdb; pdb.set_trace()

# Run with debugger
python3 -m pdb test-complete-pipeline.py

# Common debugger commands
# n (next), s (step), c (continue), p variable_name, l (list), q (quit)

3. Log Analysis

Log Parsing and Analysis

# Extract error patterns
grep -i "error\|fail\|exception" /var/log/debian-forge/*.log | head -20

# Count error types
grep -i "error" /var/log/debian-forge/*.log | cut -d: -f2 | sort | uniq -c | sort -nr

# Extract build timing information
grep "Build completed" /var/log/debian-forge/worker.log | awk '{print $1, $2, $NF}' | tail -10

# Analyze resource usage patterns
grep "Resource usage" /var/log/debian-forge/worker.log | tail -20

Log Correlation

# Correlate errors across services
echo "=== API Errors ==="
grep -i "error" /var/log/debian-forge/api.log | tail -5

echo "=== Worker Errors ==="
grep -i "error" /var/log/debian-forge/worker.log | tail -5

echo "=== System Errors ==="
journalctl -u debian-forge-* --since "1 hour ago" | grep -i "error" | tail -5

Recovery Procedures

1. Service Recovery

Complete Service Restart

#!/bin/bash
# /home/debian-forge/debian-forge/scripts/service-recovery.sh

echo "Starting complete service recovery..."

# Stop all services
sudo supervisorctl stop all
sudo systemctl stop nginx

# Clean up temporary files
sudo rm -rf /tmp/debian-forge-*
sudo rm -rf /var/lib/debian-forge/tmp/*

# Restart system services
sudo systemctl start postgresql
sudo systemctl start redis-server
sudo systemctl start nginx

# Wait for services to be ready
sleep 10

# Start application services
sudo supervisorctl start all

# Check status
sudo supervisorctl status
sudo systemctl status nginx

echo "Service recovery completed"

Database Recovery

#!/bin/bash
# /home/debian-forge/debian-forge/scripts/db-recovery.sh

echo "Starting database recovery..."

# Check database status
sudo systemctl status postgresql

# Test database connection
sudo -u debian-forge psql -d debian_forge -c "SELECT version();"

if [ $? -ne 0 ]; then
    echo "Database connection failed, attempting recovery..."
    
    # Restart database
    sudo systemctl restart postgresql
    sleep 10
    
    # Test connection again
    sudo -u debian-forge psql -d debian_forge -c "SELECT version();"
    
    if [ $? -eq 0 ]; then
        echo "Database recovery successful"
    else
        echo "Database recovery failed"
        exit 1
    fi
else
    echo "Database is healthy"
fi

2. Data Recovery

Build Artifact Recovery

#!/bin/bash
# /home/debian-forge/debian-forge/scripts/artifact-recovery.sh

echo "Starting artifact recovery..."

# Check for corrupted objects
sudo -u debian-forge ostree fsck --repo=/var/lib/debian-forge/ostree

# Remove corrupted objects
sudo -u debian-forge ostree prune --repo=/var/lib/debian-forge/ostree --refs-only

# Rebuild object index
sudo -u debian-forge ostree summary --repo=/var/lib/debian-forge/ostree --update

# Verify repository integrity
sudo -u debian-forge ostree fsck --repo=/var/lib/debian-forge/ostree

echo "Artifact recovery completed"

Configuration Recovery

#!/bin/bash
# /home/debian-forge/debian-forge/scripts/config-recovery.sh

BACKUP_DIR="/var/backups/debian-forge"
LATEST_CONFIG=$(ls -t $BACKUP_DIR/config_*.tar.gz | head -1)

if [ -n "$LATEST_CONFIG" ]; then
    echo "Restoring configuration from: $LATEST_CONFIG"
    
    # Stop services
    sudo supervisorctl stop all
    
    # Restore configuration
    sudo tar -xzf $LATEST_CONFIG -C /
    
    # Fix permissions
    sudo chown -R debian-forge:debian-forge /home/debian-forge/debian-forge/config
    
    # Restart services
    sudo supervisorctl start all
    
    echo "Configuration recovery completed"
else
    echo "No configuration backup found"
    exit 1
fi

Prevention Strategies

1. Monitoring and Alerting

Health Check Automation

#!/bin/bash
# /etc/cron.daily/debian-forge-health-check

# Run health checks
/home/debian-forge/debian-forge/venv/bin/python /home/debian-forge/debian-forge/health_check.py

# Check for critical issues
if [ $? -ne 0 ]; then
    # Send alert
    /home/debian-forge/debian-forge/scripts/alert.py "Health check failed" "Critical system issue detected"
    
    # Attempt auto-recovery
    /home/debian-forge/debian-forge/scripts/service-recovery.sh
fi

Resource Monitoring

#!/bin/bash
# /etc/cron.hourly/debian-forge-resource-check

# Check disk space
DISK_USAGE=$(df /var/lib/debian-forge | tail -1 | awk '{print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 85 ]; then
    /home/debian-forge/debian-forge/scripts/alert.py "High disk usage" "Disk usage is ${DISK_USAGE}%"
    
    # Trigger cleanup
    /home/debian-forge/debian-forge/venv/bin/python -m cleanup_manager --force
fi

# Check memory usage
MEMORY_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}')
if [ $MEMORY_USAGE -gt 90 ]; then
    /home/debian-forge/debian-forge/scripts/alert.py "High memory usage" "Memory usage is ${MEMORY_USAGE}%"
fi

2. Maintenance Windows

Scheduled Maintenance

#!/bin/bash
# /etc/cron.weekly/debian-forge-maintenance

echo "Starting scheduled maintenance..."

# Stop services
sudo supervisorctl stop all

# Update system packages
sudo apt update && sudo apt upgrade -y

# Clean old artifacts
/home/debian-forge/debian-forge/venv/bin/python -m cleanup_manager --force

# Rotate logs
sudo logrotate -f /etc/logrotate.d/debian-forge

# Restart services
sudo supervisorctl start all

echo "Scheduled maintenance completed"

Getting Help

1. Self-Service Resources

Documentation

  • User Guide: /home/debian-forge/debian-forge-docs/user-documentation.md
  • Deployment Guide: /home/debian-forge/debian-forge-docs/deployment-documentation.md
  • Architecture Guide: /home/debian-forge/debian-forge-docs/osbuild-architecture.md

Test Scripts

# Run diagnostic tests
python3 test-apt-stage.py
python3 test-resource-allocation.py
python3 test-build-orchestration.py
python3 test-complete-pipeline.py

# Run performance tests
python3 test-performance-optimization.py
python3 test-stress-testing.py

2. Community Support

Issue Reporting

When reporting issues, include:

  • System information: OS version, Python version, installed packages
  • Error messages: Complete error logs and stack traces
  • Reproduction steps: Detailed steps to reproduce the issue
  • Environment: Development or production, configuration details
  • Recent changes: Any recent modifications to the system

Debug Information Collection

#!/bin/bash
# /home/debian-forge/debian-forge/scripts/collect-debug-info.sh

DEBUG_DIR="/tmp/debian-forge-debug-$(date +%Y%m%d_%H%M%S)"
mkdir -p $DEBUG_DIR

echo "Collecting debug information..."

# System information
uname -a > $DEBUG_DIR/system-info.txt
cat /etc/os-release > $DEBUG_DIR/os-release.txt
python3 --version > $DEBUG_DIR/python-version.txt

# Service status
sudo supervisorctl status > $DEBUG_DIR/supervisor-status.txt
sudo systemctl status debian-forge-* > $DEBUG_DIR/systemd-status.txt

# Configuration files
cp -r /home/debian-forge/debian-forge/config $DEBUG_DIR/
cp /etc/supervisor/conf.d/debian-forge.conf $DEBUG_DIR/
cp /etc/nginx/sites-available/debian-forge $DEBUG_DIR/

# Logs (last 1000 lines)
tail -1000 /var/log/debian-forge/*.log > $DEBUG_DIR/recent-logs.txt

# Resource usage
df -h > $DEBUG_DIR/disk-usage.txt
free -h > $DEBUG_DIR/memory-usage.txt
ps aux > $DEBUG_DIR/process-list.txt

# Package information
pip list > $DEBUG_DIR/python-packages.txt
dpkg -l | grep -E "(debian-forge|osbuild|ostree)" > $DEBUG_DIR/system-packages.txt

echo "Debug information collected in: $DEBUG_DIR"
echo "Please include this directory when reporting issues"

Conclusion

This troubleshooting guide provides comprehensive information for diagnosing and resolving common Debian Forge issues. Key points:

  1. Start with quick diagnostics to identify the problem area
  2. Use systematic troubleshooting to isolate root causes
  3. Implement recovery procedures to restore service
  4. Apply prevention strategies to avoid future issues
  5. Collect debug information when seeking community help

For additional support, refer to the project documentation or create detailed issue reports with the collected debug information.