21 KiB
Debian Forge Troubleshooting Guides
Overview
This document provides comprehensive troubleshooting information for Debian Forge, including common issues, diagnostic procedures, and step-by-step solutions.
Quick Diagnostic Commands
System Health Check
# Check system resources
htop
df -h
free -h
iostat -x 1
# Check service status
sudo systemctl status debian-forge-*
sudo supervisorctl status
# Check logs
tail -f /var/log/debian-forge/*.log
journalctl -u debian-forge-* -f
# Check network
ping -c 3 8.8.8.8
curl -I http://deb.debian.org/debian
Debian Forge Status Check
# Check build queue
curl -s http://localhost:8080/api/v1/queue/status | jq
# Check active builds
curl -s http://localhost:8080/api/v1/builds/active | jq
# Check system health
curl -s http://localhost:8080/api/v1/health | jq
# Check OSTree repository
ostree refs --repo=/var/lib/debian-forge/ostree
ostree log --repo=/var/lib/debian-forge/ostree debian/bookworm/amd64
Common Issues and Solutions
1. Build Failures
Issue: Build Process Hangs
Symptoms: Build appears to be running but no progress for extended periods Diagnosis:
# Check build process
ps aux | grep osbuild
ps aux | grep debootstrap
# Check system resources
htop
df -h /tmp
df -h /var/lib/debian-forge
# Check build logs
tail -f /var/log/debian-forge/worker.log
Solutions:
# Kill hanging processes
sudo pkill -f osbuild
sudo pkill -f debootstrap
# Clean temporary files
sudo rm -rf /tmp/osbuild-*
sudo rm -rf /var/lib/debian-forge/tmp/*
# Restart worker service
sudo supervisorctl restart debian-forge-worker
# Check for disk space issues
sudo du -sh /var/lib/debian-forge/* | sort -hr
Issue: Package Installation Failures
Symptoms: Build fails during package installation with APT errors Diagnosis:
# Check APT configuration
cat /etc/apt/sources.list
ls -la /etc/apt/sources.list.d/
# Test package availability
apt-cache policy package-name
apt-cache search package-name
# Check network connectivity
curl -I http://deb.debian.org/debian
ping -c 3 security.debian.org
Solutions:
# Update package lists
sudo apt update
# Fix broken packages
sudo apt --fix-broken install
# Clear APT cache
sudo apt clean
sudo apt autoclean
# Check for repository issues
sudo apt update 2>&1 | grep -i error
# Verify GPG keys
sudo apt-key list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys KEY_ID
Issue: OSTree Commit Failures
Symptoms: Build fails during OSTree commit creation Diagnosis:
# Check OSTree repository
ostree refs --repo=/var/lib/debian-forge/ostree
ostree log --repo=/var/lib/debian-forge/ostree
# Check repository permissions
ls -la /var/lib/debian-forge/ostree/
id debian-forge
# Check disk space
df -h /var/lib/debian-forge/ostree
Solutions:
# Fix repository permissions
sudo chown -R debian-forge:debian-forge /var/lib/debian-forge/ostree
sudo chmod -R 755 /var/lib/debian-forge/ostree
# Initialize repository if corrupted
sudo -u debian-forge ostree init --mode=archive-z2 /var/lib/debian-forge/ostree
# Clean old commits
sudo -u debian-forge ostree refs --repo=/var/lib/debian-forge/ostree | xargs -I {} ostree delete --repo=/var/lib/debian-forge/ostree {}
# Check for corrupted objects
sudo -u debian-forge ostree fsck --repo=/var/lib/debian-forge/ostree
2. Service Issues
Issue: API Service Not Responding
Symptoms: HTTP requests to API endpoints timeout or return errors Diagnosis:
# Check service status
sudo supervisorctl status debian-forge-api
sudo systemctl status nginx
# Check port binding
sudo netstat -tlnp | grep :8080
sudo ss -tlnp | grep :8080
# Check firewall
sudo ufw status
sudo iptables -L
# Test local connectivity
curl -v http://localhost:8080/health
Solutions:
# Restart API service
sudo supervisorctl restart debian-forge-api
# Check configuration
sudo cat /etc/supervisor/conf.d/debian-forge.conf
sudo cat /etc/nginx/sites-available/debian-forge
# Verify Python environment
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c "import flask; print('OK')"
# Check logs for errors
sudo tail -f /var/log/debian-forge/api.log
sudo tail -f /var/log/nginx/error.log
Issue: Worker Service Not Processing Builds
Symptoms: Builds remain in QUEUED status indefinitely Diagnosis:
# Check worker status
sudo supervisorctl status debian-forge-worker
# Check worker logs
sudo tail -f /var/log/debian-forge/worker.log
# Check build queue
curl -s http://localhost:8080/api/v1/queue/status | jq
# Check system resources
htop
df -h
free -h
Solutions:
# Restart worker service
sudo supervisorctl restart debian-forge-worker
# Check Python dependencies
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/pip list
# Verify build environment
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c "import build_orchestrator; print('OK')"
# Check for resource constraints
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python test-resource-allocation.py
3. Resource Issues
Issue: High CPU Usage
Symptoms: System becomes unresponsive, builds slow down Diagnosis:
# Check CPU usage
htop
top -p 1
iostat -x 1
# Identify high-CPU processes
ps aux --sort=-%cpu | head -10
# Check build processes
ps aux | grep osbuild
ps aux | grep debootstrap
Solutions:
# Reduce concurrent builds
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c "
from build_orchestrator import BuildOrchestrator
o = BuildOrchestrator()
o.resource_manager.set_concurrent_builds(2)
"
# Kill runaway processes
sudo pkill -f osbuild
sudo pkill -f debootstrap
# Check for infinite loops
sudo tail -f /var/log/debian-forge/worker.log | grep -i "cpu\|loop"
# Restart services
sudo supervisorctl restart all
Issue: High Memory Usage
Symptoms: Out of memory errors, system swapping Diagnosis:
# Check memory usage
free -h
cat /proc/meminfo | grep -E "(MemTotal|MemFree|MemAvailable|SwapTotal|SwapFree)"
# Check for memory leaks
ps aux --sort=-%mem | head -10
# Check swap usage
swapon --show
cat /proc/swaps
Solutions:
# Clear memory caches
sudo sync && sudo echo 3 | sudo tee /proc/sys/vm/drop_caches
# Restart memory-intensive services
sudo supervisorctl restart debian-forge-worker
# Check for memory leaks in logs
sudo tail -f /var/log/debian-forge/worker.log | grep -i "memory\|leak"
# Reduce memory limits
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -c "
from build_orchestrator import BuildOrchestrator
o = BuildOrchestrator()
o.resource_manager.set_build_limits(max_memory=70)
"
Issue: Disk Space Exhaustion
Symptoms: Builds fail with "no space left on device" errors Diagnosis:
# Check disk usage
df -h
du -sh /var/lib/debian-forge/*
du -sh /tmp/*
du -sh .osbuild/
# Check for large files
find /var/lib/debian-forge -type f -size +100M -exec ls -lh {} \;
find /tmp -type f -size +100M -exec ls -lh {} \;
# Check inode usage
df -i
Solutions:
# Clean build artifacts
sudo -u debian-forge /home/debian-forge/debian-forge/venv/bin/python -m cleanup_manager --force
# Clean temporary files
sudo rm -rf /tmp/osbuild-*
sudo rm -rf /var/lib/debian-forge/tmp/*
# Clean old OSTree commits
sudo -u debian-forge ostree refs --repo=/var/lib/debian-forge/ostree | xargs -I {} ostree delete --repo=/var/lib/debian-forge/ostree {}
# Clean package cache
sudo apt clean
sudo apt autoclean
# Check for log rotation
sudo logrotate -f /etc/logrotate.d/debian-forge
4. Network Issues
Issue: Package Download Failures
Symptoms: Builds fail when downloading packages from repositories Diagnosis:
# Test network connectivity
ping -c 3 8.8.8.8
ping -c 3 deb.debian.org
ping -c 3 security.debian.org
# Check DNS resolution
nslookup deb.debian.org
dig deb.debian.org
# Test HTTP connectivity
curl -I http://deb.debian.org/debian
curl -I https://security.debian.org/debian-security
# Check proxy configuration
echo $http_proxy
echo $https_proxy
cat /etc/apt/apt.conf.d/*proxy*
Solutions:
# Fix DNS issues
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
echo "nameserver 8.8.4.4" | sudo tee -a /etc/resolv.conf
# Test without proxy
unset http_proxy https_proxy
sudo apt update
# Check firewall rules
sudo ufw status
sudo iptables -L
# Verify repository URLs
sudo apt update 2>&1 | grep -i "failed\|error\|unreachable"
Issue: apt-cacher-ng Connection Problems
Symptoms: Builds fail with proxy connection errors Diagnosis:
# Check apt-cacher-ng status
sudo systemctl status apt-cacher-ng
sudo netstat -tlnp | grep :3142
# Test proxy connectivity
curl -I http://192.168.1.101:3142
telnet 192.168.1.101 3142
# Check proxy configuration
cat /etc/apt/apt.conf.d/99proxy
Solutions:
# Restart apt-cacher-ng
sudo systemctl restart apt-cacher-ng
# Verify proxy configuration
echo 'Acquire::http::Proxy "http://192.168.1.101:3142";' | sudo tee /etc/apt/apt.conf.d/99proxy
echo 'Acquire::https::Proxy "http://192.168.1.101:3142";' | sudo tee -a /etc/apt/apt.conf.d/99proxy
# Test proxy
curl -x http://192.168.1.101:3142 http://deb.debian.org/debian
# Check proxy logs
sudo tail -f /var/log/apt-cacher-ng/apt-cacher.log
5. Configuration Issues
Issue: Invalid Manifest Format
Symptoms: Builds fail with "invalid manifest" errors Diagnosis:
# Validate manifest syntax
python3 -m osbuild --libdir . --check-only manifest.json
# Check JSON syntax
python3 -c "import json; json.load(open('manifest.json')); print('Valid JSON')"
# Check manifest schema
python3 -c "
import json
schema = json.load(open('schemas/osbuild2.json'))
manifest = json.load(open('manifest.json'))
print('Schema validation needed')
"
Solutions:
# Fix JSON syntax
python3 -m json.tool manifest.json > manifest_fixed.json
mv manifest_fixed.json manifest.json
# Validate against schema
python3 -c "
import json
from jsonschema import validate
schema = json.load(open('schemas/osbuild2.json'))
manifest = json.load(open('manifest.json'))
validate(instance=manifest, schema=schema)
print('Valid manifest')
"
# Check stage names
python3 -c "
import json
manifest = json.load(open('manifest.json'))
stages = manifest.get('pipeline', {}).get('stages', [])
for stage in stages:
print(f'Stage: {stage.get(\"name\", \"unknown\")}')
"
Issue: Missing Dependencies
Symptoms: Builds fail with "command not found" or import errors Diagnosis:
# Check Python dependencies
pip list
pip check
# Check system packages
which debootstrap
which ostree
which sbuild
# Check Python path
python3 -c "import sys; print('\n'.join(sys.path))"
python3 -c "import build_orchestrator; print('OK')"
Solutions:
# Install missing Python packages
pip install -r requirements.txt
# Install missing system packages
sudo apt update
sudo apt install -y debootstrap ostree sbuild pbuilder
# Fix Python path
export PYTHONPATH="${PYTHONPATH}:/home/debian-forge/debian-forge"
echo 'export PYTHONPATH="${PYTHONPATH}:/home/debian-forge/debian-forge"' >> ~/.bashrc
# Check virtual environment
source venv/bin/activate
pip list
Advanced Troubleshooting
1. Performance Analysis
Build Performance Profiling
# Profile build execution
python3 -m cProfile -o build_profile.prof test-complete-pipeline.py
# Analyze profile results
python3 -c "
import pstats
p = pstats.Stats('build_profile.prof')
p.sort_stats('cumulative')
p.print_stats(20)
"
# Memory profiling
python3 -m memory_profiler test-complete-pipeline.py
System Performance Monitoring
# Monitor system during build
sar -u 1 60 > cpu_usage.log &
sar -r 1 60 > memory_usage.log &
sar -d 1 60 > disk_usage.log &
# Start build
python3 -m osbuild --libdir . manifest.json
# Stop monitoring
pkill sar
# Analyze results
python3 -c "
import pandas as pd
cpu = pd.read_csv('cpu_usage.log', sep='\s+', skiprows=2)
print(f'Average CPU: {cpu[\"%user\"].mean():.1f}%')
print(f'Peak CPU: {cpu[\"%user\"].max():.1f}%')
"
2. Debug Mode
Enable Debug Logging
# Set debug environment variables
export OSBUILD_DEBUG=1
export OSBUILD_VERBOSE=1
export DEBIAN_FORGE_DEBUG=1
# Run with debug output
python3 -m osbuild --libdir . --verbose manifest.json
# Check debug logs
tail -f /var/log/debian-forge/debug.log
Python Debugger
# Add breakpoints in code
import pdb; pdb.set_trace()
# Run with debugger
python3 -m pdb test-complete-pipeline.py
# Common debugger commands
# n (next), s (step), c (continue), p variable_name, l (list), q (quit)
3. Log Analysis
Log Parsing and Analysis
# Extract error patterns
grep -i "error\|fail\|exception" /var/log/debian-forge/*.log | head -20
# Count error types
grep -i "error" /var/log/debian-forge/*.log | cut -d: -f2 | sort | uniq -c | sort -nr
# Extract build timing information
grep "Build completed" /var/log/debian-forge/worker.log | awk '{print $1, $2, $NF}' | tail -10
# Analyze resource usage patterns
grep "Resource usage" /var/log/debian-forge/worker.log | tail -20
Log Correlation
# Correlate errors across services
echo "=== API Errors ==="
grep -i "error" /var/log/debian-forge/api.log | tail -5
echo "=== Worker Errors ==="
grep -i "error" /var/log/debian-forge/worker.log | tail -5
echo "=== System Errors ==="
journalctl -u debian-forge-* --since "1 hour ago" | grep -i "error" | tail -5
Recovery Procedures
1. Service Recovery
Complete Service Restart
#!/bin/bash
# /home/debian-forge/debian-forge/scripts/service-recovery.sh
echo "Starting complete service recovery..."
# Stop all services
sudo supervisorctl stop all
sudo systemctl stop nginx
# Clean up temporary files
sudo rm -rf /tmp/debian-forge-*
sudo rm -rf /var/lib/debian-forge/tmp/*
# Restart system services
sudo systemctl start postgresql
sudo systemctl start redis-server
sudo systemctl start nginx
# Wait for services to be ready
sleep 10
# Start application services
sudo supervisorctl start all
# Check status
sudo supervisorctl status
sudo systemctl status nginx
echo "Service recovery completed"
Database Recovery
#!/bin/bash
# /home/debian-forge/debian-forge/scripts/db-recovery.sh
echo "Starting database recovery..."
# Check database status
sudo systemctl status postgresql
# Test database connection
sudo -u debian-forge psql -d debian_forge -c "SELECT version();"
if [ $? -ne 0 ]; then
echo "Database connection failed, attempting recovery..."
# Restart database
sudo systemctl restart postgresql
sleep 10
# Test connection again
sudo -u debian-forge psql -d debian_forge -c "SELECT version();"
if [ $? -eq 0 ]; then
echo "Database recovery successful"
else
echo "Database recovery failed"
exit 1
fi
else
echo "Database is healthy"
fi
2. Data Recovery
Build Artifact Recovery
#!/bin/bash
# /home/debian-forge/debian-forge/scripts/artifact-recovery.sh
echo "Starting artifact recovery..."
# Check for corrupted objects
sudo -u debian-forge ostree fsck --repo=/var/lib/debian-forge/ostree
# Remove corrupted objects
sudo -u debian-forge ostree prune --repo=/var/lib/debian-forge/ostree --refs-only
# Rebuild object index
sudo -u debian-forge ostree summary --repo=/var/lib/debian-forge/ostree --update
# Verify repository integrity
sudo -u debian-forge ostree fsck --repo=/var/lib/debian-forge/ostree
echo "Artifact recovery completed"
Configuration Recovery
#!/bin/bash
# /home/debian-forge/debian-forge/scripts/config-recovery.sh
BACKUP_DIR="/var/backups/debian-forge"
LATEST_CONFIG=$(ls -t $BACKUP_DIR/config_*.tar.gz | head -1)
if [ -n "$LATEST_CONFIG" ]; then
echo "Restoring configuration from: $LATEST_CONFIG"
# Stop services
sudo supervisorctl stop all
# Restore configuration
sudo tar -xzf $LATEST_CONFIG -C /
# Fix permissions
sudo chown -R debian-forge:debian-forge /home/debian-forge/debian-forge/config
# Restart services
sudo supervisorctl start all
echo "Configuration recovery completed"
else
echo "No configuration backup found"
exit 1
fi
Prevention Strategies
1. Monitoring and Alerting
Health Check Automation
#!/bin/bash
# /etc/cron.daily/debian-forge-health-check
# Run health checks
/home/debian-forge/debian-forge/venv/bin/python /home/debian-forge/debian-forge/health_check.py
# Check for critical issues
if [ $? -ne 0 ]; then
# Send alert
/home/debian-forge/debian-forge/scripts/alert.py "Health check failed" "Critical system issue detected"
# Attempt auto-recovery
/home/debian-forge/debian-forge/scripts/service-recovery.sh
fi
Resource Monitoring
#!/bin/bash
# /etc/cron.hourly/debian-forge-resource-check
# Check disk space
DISK_USAGE=$(df /var/lib/debian-forge | tail -1 | awk '{print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 85 ]; then
/home/debian-forge/debian-forge/scripts/alert.py "High disk usage" "Disk usage is ${DISK_USAGE}%"
# Trigger cleanup
/home/debian-forge/debian-forge/venv/bin/python -m cleanup_manager --force
fi
# Check memory usage
MEMORY_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}')
if [ $MEMORY_USAGE -gt 90 ]; then
/home/debian-forge/debian-forge/scripts/alert.py "High memory usage" "Memory usage is ${MEMORY_USAGE}%"
fi
2. Maintenance Windows
Scheduled Maintenance
#!/bin/bash
# /etc/cron.weekly/debian-forge-maintenance
echo "Starting scheduled maintenance..."
# Stop services
sudo supervisorctl stop all
# Update system packages
sudo apt update && sudo apt upgrade -y
# Clean old artifacts
/home/debian-forge/debian-forge/venv/bin/python -m cleanup_manager --force
# Rotate logs
sudo logrotate -f /etc/logrotate.d/debian-forge
# Restart services
sudo supervisorctl start all
echo "Scheduled maintenance completed"
Getting Help
1. Self-Service Resources
Documentation
- User Guide:
/home/debian-forge/debian-forge-docs/user-documentation.md - Deployment Guide:
/home/debian-forge/debian-forge-docs/deployment-documentation.md - Architecture Guide:
/home/debian-forge/debian-forge-docs/osbuild-architecture.md
Test Scripts
# Run diagnostic tests
python3 test-apt-stage.py
python3 test-resource-allocation.py
python3 test-build-orchestration.py
python3 test-complete-pipeline.py
# Run performance tests
python3 test-performance-optimization.py
python3 test-stress-testing.py
2. Community Support
Issue Reporting
When reporting issues, include:
- System information: OS version, Python version, installed packages
- Error messages: Complete error logs and stack traces
- Reproduction steps: Detailed steps to reproduce the issue
- Environment: Development or production, configuration details
- Recent changes: Any recent modifications to the system
Debug Information Collection
#!/bin/bash
# /home/debian-forge/debian-forge/scripts/collect-debug-info.sh
DEBUG_DIR="/tmp/debian-forge-debug-$(date +%Y%m%d_%H%M%S)"
mkdir -p $DEBUG_DIR
echo "Collecting debug information..."
# System information
uname -a > $DEBUG_DIR/system-info.txt
cat /etc/os-release > $DEBUG_DIR/os-release.txt
python3 --version > $DEBUG_DIR/python-version.txt
# Service status
sudo supervisorctl status > $DEBUG_DIR/supervisor-status.txt
sudo systemctl status debian-forge-* > $DEBUG_DIR/systemd-status.txt
# Configuration files
cp -r /home/debian-forge/debian-forge/config $DEBUG_DIR/
cp /etc/supervisor/conf.d/debian-forge.conf $DEBUG_DIR/
cp /etc/nginx/sites-available/debian-forge $DEBUG_DIR/
# Logs (last 1000 lines)
tail -1000 /var/log/debian-forge/*.log > $DEBUG_DIR/recent-logs.txt
# Resource usage
df -h > $DEBUG_DIR/disk-usage.txt
free -h > $DEBUG_DIR/memory-usage.txt
ps aux > $DEBUG_DIR/process-list.txt
# Package information
pip list > $DEBUG_DIR/python-packages.txt
dpkg -l | grep -E "(debian-forge|osbuild|ostree)" > $DEBUG_DIR/system-packages.txt
echo "Debug information collected in: $DEBUG_DIR"
echo "Please include this directory when reporting issues"
Conclusion
This troubleshooting guide provides comprehensive information for diagnosing and resolving common Debian Forge issues. Key points:
- Start with quick diagnostics to identify the problem area
- Use systematic troubleshooting to isolate root causes
- Implement recovery procedures to restore service
- Apply prevention strategies to avoid future issues
- Collect debug information when seeking community help
For additional support, refer to the project documentation or create detailed issue reports with the collected debug information.