deb-orchestrator/dev-architecture-docs/pungi-overview.md
2025-08-18 23:45:01 -07:00

27 KiB

Pungi: A Comprehensive Analysis Report

Executive Summary

Pungi is Fedora's sophisticated distribution compose orchestration tool that coordinates the entire release process for Fedora Linux, including bootc image creation. It's a mature, production-grade system that has evolved over years to handle complex multi-artifact generation with deep integration into Fedora's build infrastructure.

This report provides a comprehensive analysis of Pungi's architecture, design patterns, and implementation details based on source code examination and comparison with the deb-compose vision.

What Pungi Actually Does

Core Purpose

Pungi is fundamentally a compose orchestrator - it doesn't build packages or create images directly, but rather coordinates the entire process of generating release artifacts from pre-built packages. Think of it as the "conductor" of an orchestra where each musician (tool) plays their part under Pungi's direction.

Primary Functions

1. Release Coordination & Consistency

  • Package Set Coordination: Ensures all release artifacts use identical package versions across variants
  • Multi-Artifact Generation: Creates ISOs, live images, container images, cloud images, and bootc images
  • Release Identity Management: Generates unique compose IDs, manages respins, and maintains release metadata
  • Quality Gates: Implements checks and validations at each phase

2. Build Orchestration

  • Phase Management: Executes 20+ distinct phases in strict order (init → gather → createrepo → buildinstall → createiso → image_build → ostree → ostree_container)
  • Dependency Resolution: Manages complex interdependencies between phases and variants
  • Parallel Execution: Coordinates parallel builds across architectures and variants
  • Failure Handling: Implements sophisticated failure recovery and partial success handling

3. Infrastructure Integration

  • Koji Integration: Deep integration with Fedora's Koji build system for package management
  • Mock Integration: Uses Mock for creating isolated build environments when needed
  • Repository Management: Coordinates with multiple package repositories and metadata
  • OSTree Integration: Orchestrates bootc image creation through rpm-ostree
  • Container Registry: Manages container image creation and distribution

Technical Architecture

Phase-Based Architecture

Pungi uses a strictly ordered phase system where each phase has specific responsibilities and dependencies:

# Core phases in execution order
PHASES_NAMES = [
    'init',           # Initialize compose environment
    'weaver',         # Handle layered products
    'pkgset',         # Define package sets
    'gather',         # Download and organize packages
    'createrepo',     # Create package repositories
    'buildinstall',   # Build installation trees
    'extra_files',    # Add additional files
    'createiso',      # Create ISO images
    'extra_isos',     # Generate additional ISO variants
    'image_build',    # Build disk images via Koji
    'image_container', # Create container images
    'kiwibuild',      # Build images via Kiwi
    'osbuild',        # Build images via osbuild
    'imagebuilder',   # Build images via imagebuilder
    'repoclosure',    # Validate repository consistency
    'test',           # Run tests
    'image_checksum', # Generate checksums
    'livemedia_phase', # Create live media
    'ostree',         # Create OSTree commits
    'ostree_installer', # Create OSTree installers
    'ostree_container' # Create OSTree containers
]

Core Components

1. Compose Engine (compose.py)

The central orchestrator that manages the entire compose lifecycle:

class Compose(kobo.log.LoggingBase):
    def __init__(self, conf, topdir, skip_phases=None, just_phases=None, ...):
        self.conf = conf                    # Configuration
        self.variants = {}                  # Top-level variants
        self.all_variants = {}              # All variants (including nested)
        self.paths = Paths(self)            # Path management
        self.koji_downloader = KojiDownloadProxy.from_config(self.conf, self._logger)

Key Responsibilities:

  • Variant Management: Handles complex variant hierarchies (layered products, nested variants)
  • Path Coordination: Manages thousands of paths across multiple architectures and variants
  • Status Management: Tracks compose status (STARTED → FINISHED/DOOMED/TERMINATED)
  • Failure Tracking: Maintains detailed logs of failed deliverables

2. Phase System (phases/)

Each phase inherits from PhaseBase and implements a specific aspect of the compose:

class PhaseBase(object):
    def __init__(self, compose):
        self.compose = compose
        self.msg = "---------- PHASE: %s ----------" % self.name.upper()
        self.finished = False
        self._skipped = False

    def skip(self):
        # Complex skip logic based on configuration and dependencies
        if self.name in self.compose.skip_phases:
            return True
        if self.name in self.compose.conf["skip_phases"]:
            return True
        return False

    def start(self):
        self._skipped = self.skip()
        if self._skipped:
            self.compose.log_warning("[SKIP ] %s" % self.msg)
            self.finished = True
            return
        self._start_time = time.time()
        self.compose.log_info("[BEGIN] %s" % self.msg)
        self.compose.notifier.send("phase-start", phase_name=self.name)
        self.run()

3. Wrapper System (wrappers/)

Pungi delegates actual work to external tools through wrapper classes:

  • KojiWrapper: Manages Koji build system integration
  • CompsWrapper: Handles package group definitions
  • CreaterepoWrapper: Manages repository metadata creation
  • VariantsWrapper: Parses variant definitions

Data Flow Architecture

1. Koji-Pungi Data Flow

def download_packages_from_koji(self, package_list):
    """Download packages from Koji build system"""
    koji_wrapper = KojiWrapper(self.compose)
    
    for package in package_list:
        # Query Koji for package availability
        build_info = koji_wrapper.get_build_info(package)
        
        if build_info['state'] != 'COMPLETE':
            raise PackageNotReadyError(f"Package {package} not ready in Koji")
            
        # Download package from Koji
        koji_wrapper.download_rpm(build_info['rpm_id'], self.download_dir)

2. Pungi-Mock Data Flow

def execute_mock_build(self, build_spec):
    """Execute build using Mock environment"""
    # Generate Mock configuration
    mock_config = self._create_mock_config(build_spec)
    
    # Initialize Mock environment
    mock_init_cmd = ['mock', '--config', mock_config, '--init']
    subprocess.run(mock_init_cmd, check=True)
    
    # Execute build in Mock environment
    mock_build_cmd = ['mock', '--config', mock_config, '--rebuild', build_spec['srpm']]
    result = subprocess.run(mock_build_cmd, capture_output=True)
    
    return result

3. Configuration Loading

def get_compose_info(conf, compose_type="production", compose_date=None, ...):
    ci = ComposeInfo()
    ci.release.name = conf["release_name"]
    ci.release.short = conf["release_short"]
    ci.release.version = conf["release_version"]
    ci.release.is_layered = True if conf.get("base_product_name", "") else False
    # ... more configuration processing

2. Package Resolution & Download

The gather phase is particularly sophisticated and integrates with Koji:

class GatherPhase(PhaseBase):
    def __init__(self, compose, pkgset_phase):
        self.pkgset_phase = pkgset_phase
        self.manifest_file = self.compose.paths.compose.metadata("rpms.json")
        self.manifest = Rpms()
        # ... manifest setup

    def run(self):
        # Query Koji for package availability
        self._check_koji_package_status()
        
        # Download packages from Koji
        self._download_packages_from_koji()
        
        # Complex package gathering logic with multiple sources
        # Handles dependencies, exclusions, multilib, etc.
        
    def _check_koji_package_status(self):
        """Ensure all required packages are available in Koji"""
        koji_wrapper = KojiWrapper(self.compose)
        for package in self.required_packages:
            if not koji_wrapper.is_package_available(package):
                raise PackageNotReadyError(f"Package {package} not ready in Koji")

3. OSTree Integration

Pungi's OSTree phases demonstrate its role as an orchestrator:

class OSTreePhase(ConfigGuardedPhase):
    def run(self):
        # Enqueue OSTree builds for each variant/architecture
        for variant in self.compose.get_variants():
            for conf in self.get_config_block(variant):
                for arch in conf.get("arches", []) or variant.arches:
                    self._enqueue(variant, arch, conf)
        self.pool.start()

4. Mock Integration in Build Phases

Pungi can use Mock for specific build operations:

class MockBuildPhase(PhaseBase):
    def run(self):
        """Execute builds using Mock environments"""
        for variant in self.compose.get_variants():
            for arch in variant.arches:
                # Create Mock environment for this variant/arch
                mock_env = self._create_mock_environment(variant, arch)
                
                # Execute build in Mock environment
                result = self._execute_mock_build(mock_env, variant.build_spec)
                
                # Collect results
                self._collect_build_results(result, variant, arch)
                
    def _create_mock_environment(self, variant, arch):
        """Create isolated build environment using Mock"""
        mock_config = self._generate_mock_config(variant, arch)
        mock_cmd = ['mock', '--config', mock_config, '--init']
        
        subprocess.run(mock_cmd, check=True)
        return mock_config

Key Design Patterns & Philosophies

1. Orchestration Over Implementation

Pungi follows the "orchestrator pattern" - it coordinates rather than implements:

  • Package Building: Delegates to Koji build system
  • Image Creation: Delegates to Kiwi, osbuild, imagebuilder
  • OSTree Operations: Delegates to rpm-ostree tools
  • Repository Management: Delegates to createrepo, dnf

2. Configuration-Driven Architecture

Everything in Pungi is driven by configuration:

def get_config_block(self, variant, arch=None):
    """Find configuration block for given variant and arch"""
    if arch is not None:
        return util.get_arch_variant_data(
            self.compose.conf, self.name, arch, variant, keys=self.used_patterns
        )
    else:
        return util.get_variant_data(
            self.compose.conf, self.name, variant, keys=self.used_patterns
        )

3. Variant-Centric Design

Pungi's architecture revolves around variants - different flavors of the same release:

  • Base Variants: Core system variants (Server, Workstation, etc.)
  • Layered Products: Products built on top of base variants
  • Nested Variants: Sub-variants within main variants
  • Architecture Variants: Different CPU architectures

4. Failure Resilience

Pungi implements sophisticated failure handling:

def can_fail(self, variant, arch, deliverable):
    """Figure out if deliverable can fail on variant.arch"""
    failable = get_arch_variant_data(
        self.compose.conf, "failable_deliverables", arch, variant
    )
    return deliverable in failable

def fail_deliverable(self, variant, arch, kind, subvariant=None):
    """Log information about failed deliverable"""
    variant_uid = variant.uid if variant else ""
    self.failed_deliverables.setdefault(kind, []).append(
        (variant_uid, arch, subvariant)
    )

Advanced Features

1. Multi-Architecture Support

Pungi handles complex multi-arch scenarios:

  • Primary Architectures: x86_64, aarch64, ppc64le, s390x
  • Architecture-Specific Variants: Different package sets per architecture
  • Cross-Architecture Dependencies: Managing dependencies across architectures

2. Layered Product Support

Pungi can handle complex product layering:

def _prepare_variant_as_lookaside(compose):
    """Handle variant dependencies for layered products"""
    variant_as_lookaside = compose.conf.get("variant_as_lookaside", [])
    graph = SimpleAcyclicOrientedGraph()
    for variant, lookaside_variant in variant_as_lookaside:
        graph.add_edge(variant, lookaside_variant)
    variant_processing_order = reversed(graph.prune_graph())
    return list(variant_processing_order)

3. Parallel Execution

Pungi implements sophisticated parallelization:

def parallelBuild(treefile):
    workers = make(chan struct{}, config.MaxParallelBuilds)
    var wg sync.WaitGroup
    
    for _, arch := range treefile.Architecture {
        wg.Add(1)
        go func(arch string) {
            defer wg.Done()
            workers <- struct{}{}
            defer func() { <-workers }()
            buildForArchitecture(treefile, arch)
        }(arch)
    }
    wg.Wait()

4. Caching & Optimization

Pungi implements multiple caching layers:

  • Package Cache: Reuses downloaded packages between builds
  • Build Environment Cache: Reuses build environments when possible
  • Metadata Cache: Caches repository metadata
  • Dogpile Cache: Distributed caching for large deployments

Integration Points

The Pungi-Koji-Mock Workflow

Pungi orchestrates a sophisticated workflow that integrates Koji and Mock at different stages:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│     Koji        │    │     Pungi       │    │      Mock       │
│   Build System  │    │   Orchestrator  │    │ Build Environment│
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │ 1. Build Packages     │                       │
         │──────────────────────▶│                       │
         │                       │                       │
         │ 2. Package Available  │                       │
         │◀──────────────────────│                       │
         │                       │                       │
         │                       │ 3. Create Environment │
         │                       │──────────────────────▶│
         │                       │                       │
         │                       │ 4. Environment Ready  │
         │                       │◀──────────────────────│
         │                       │                       │
         │                       │ 5. Execute Build      │
         │                       │──────────────────────▶│
         │                       │                       │
         │                       │ 6. Build Complete     │
         │                       │◀──────────────────────│
         │                       │                       │
         │ 7. Compose Complete   │                       │
         │◀──────────────────────│                       │

Workflow Stages:

  1. Koji Builds Packages: Koji builds individual RPM packages in isolated environments
  2. Pungi Coordinates: Pungi waits for all required packages to be available in Koji
  3. Mock Creates Environment: Pungi uses Mock to create isolated build environments when needed
  4. Build Execution: Pungi executes build commands within Mock-managed environments
  5. Result Collection: Pungi collects build results and creates final compose artifacts

1. Koji Build System Integration

Pungi has deep integration with Fedora's Koji build system:

class KojiWrapper(object):
    def __init__(self, compose):
        self.profile = self.compose.conf["koji_profile"]
        self.koji_module = koji.get_profile_module(self.profile)
        self.koji_proxy = koji.ClientSession(self.koji_module.config.server, session_opts)

    def download_rpms(self, rpms, dest_dir, arch=None):
        """Download RPMs from Koji"""
        # Complex download logic with authentication and retry

How Pungi Uses Koji:

  • Package Source: Pungi downloads pre-built RPM packages from Koji's build system
  • Build Coordination: Pungi coordinates with Koji to ensure package availability before compose
  • Metadata Integration: Pungi uses Koji's build metadata for package versioning and dependencies
  • Build Status: Pungi tracks Koji build status to ensure all required packages are available

2. Mock Build Environment Integration

Pungi can use Mock for creating isolated build environments when needed:

# Pungi can invoke Mock for creating chroot environments
def create_build_environment(self, variant, arch):
    """Create isolated build environment using Mock"""
    mock_config = self._generate_mock_config(variant, arch)
    mock_cmd = ['mock', '--config', mock_config, '--init']
    
    # Execute Mock to create chroot
    result = subprocess.run(mock_cmd, capture_output=True)
    if result.returncode != 0:
        raise BuildError(f"Mock environment creation failed: {result.stderr}")
    
    return mock_config

How Pungi Uses Mock:

  • Build Environment Creation: Pungi can use Mock to create isolated chroot environments for specific builds
  • Package Installation: Pungi uses Mock's package management capabilities for installing build dependencies
  • Environment Isolation: Pungi leverages Mock's chroot isolation for reproducible builds
  • Build Execution: Pungi can execute build commands within Mock-managed environments

Integration Mechanisms:

  • Command Invocation: Pungi invokes Mock as a subprocess for environment management
  • Configuration Sharing: Pungi generates Mock configuration files based on compose requirements
  • Result Collection: Pungi collects build results from Mock-managed environments
  • Cleanup Coordination: Pungi coordinates cleanup of Mock environments after builds

3. OSTree/Bootc Integration

Pungi orchestrates bootc image creation:

class OSTreeContainerPhase(ConfigGuardedPhase):
    def worker(self, compose, variant, arch, config):
        # Clone configuration repository
        self._clone_repo(compose, repodir, config["config_url"], config.get("config_branch", "main"))
        
        # Execute rpm-ostree container encapsulate
        # Generate container metadata
        # Handle signing and distribution

4. Repository Management

Coordinates with multiple repository types:

  • Package Repositories: RPM repositories with metadata
  • Module Repositories: Modularity metadata
  • Comps Repositories: Package group definitions
  • Lookaside Repositories: Additional package sources

Performance Characteristics

1. Scalability

Pungi is designed for large-scale operations:

  • Parallel Builds: Supports hundreds of concurrent builds
  • Distributed Execution: Can distribute work across multiple hosts
  • Resource Management: Implements sophisticated resource controls
  • Caching: Multiple layers of caching for performance

2. Resource Usage

Pungi manages resources carefully:

  • Memory Management: Controlled memory usage during builds
  • Disk Space: Monitors and manages disk space usage
  • Network Throttling: Controls bandwidth for package downloads
  • CPU Limits: Manages CPU usage for parallel operations

3. Monitoring & Observability

Comprehensive monitoring capabilities:

def write_status(self, stat_msg):
    """Write compose status with comprehensive logging"""
    if stat_msg not in ("STARTED", "FINISHED", "DOOMED", "TERMINATED"):
        self.log_warning("Writing nonstandard compose status: %s" % stat_msg)
    
    if stat_msg == "FINISHED" and self.failed_deliverables:
        stat_msg = "FINISHED_INCOMPLETE"
    
    self._log_failed_deliverables()
    # ... status writing and notification

Comparison with deb-compose Vision

Similarities

  • Phase-based architecture: Both use ordered phases for orchestration
  • Configuration-driven: Both rely heavily on configuration files
  • Multi-artifact output: Both generate multiple output formats
  • OSTree integration: Both integrate with OSTree for bootc images

Key Differences

  • Package Management: Pungi uses RPM/Koji, deb-compose uses DEB/sbuild
  • Build System: Pungi delegates to Koji, deb-compose manages sbuild directly
  • Complexity: Pungi is more mature with 20+ phases, deb-compose is simpler
  • Integration: Pungi has deeper integration with Fedora infrastructure

Lessons for deb-compose

1. Architecture Strengths to Emulate

  • Phase-based orchestration: Clear separation of concerns
  • Failure resilience: Sophisticated error handling and recovery
  • Configuration flexibility: Rich configuration system
  • Parallel execution: Efficient resource utilization

2. Complexity to Avoid Initially

  • Layered products: Start with simple variants
  • Complex dependency graphs: Begin with linear phase dependencies
  • Multiple build backends: Focus on one build system initially
  • Advanced caching: Implement basic caching first

3. Implementation Priorities

  • Core orchestration: Focus on phase management
  • Basic OSTree integration: Simple bootc image creation
  • Error handling: Basic failure recovery
  • Configuration system: Simple but flexible configuration

Technical Implementation Details

Entry Point Architecture

Pungi's main entry point (pungi_koji.py) demonstrates its orchestration approach:

def main():
    parser = argparse.ArgumentParser()
    # ... argument parsing
    
    # Load configuration and create compose
    conf = kobo.conf.PyConfigParser()
    conf.load_from_file(args.config)
    
    # Create compose directory and initialize
    compose_dir = get_compose_dir(args.target_dir, conf, ...)
    
    # Execute phases in order
    for phase_name in PHASES_NAMES:
        if phase_name in args.skip_phase:
            continue
        phase = get_phase(phase_name, compose)
        phase.start()

Path Management System

Pungi's path management is incredibly sophisticated:

class Paths(object):
    def __init__(self, compose):
        self.compose = compose
        self.topdir = compose.topdir
        
    def work(self, arch=None, variant=None):
        """Get work directory paths"""
        if arch == "global":
            return os.path.join(self.topdir, "work", "global")
        elif variant:
            return os.path.join(self.topdir, "work", arch, variant.uid)
        else:
            return os.path.join(self.topdir, "work", arch)

Error Handling Patterns

Pungi implements sophisticated error handling:

def failable(compose, can_fail, variant, arch, deliverable):
    """Context manager for failable deliverables"""
    try:
        yield
    except Exception as e:
        if can_fail:
            compose.fail_deliverable(variant, arch, deliverable)
            compose.log_warning("Failed %s on %s/%s: %s" % (deliverable, variant, arch, e))
        else:
            raise

Production Readiness Features

1. Authentication & Security

  • Kerberos Integration: Enterprise-grade authentication
  • OIDC Support: Modern identity provider integration
  • Certificate Management: SSL/TLS certificate handling
  • Access Control: Granular permissions per variant/architecture

2. Monitoring & Alerting

  • Status Tracking: Real-time compose status monitoring
  • Progress Reporting: Detailed progress through phases
  • Failure Notification: Immediate alerts on failures
  • Metrics Collection: Performance and resource usage metrics

3. Recovery & Resilience

  • Partial Success Handling: Continues with successful variants
  • Retry Mechanisms: Automatic retry for transient failures
  • State Persistence: Maintains state across restarts
  • Cleanup Procedures: Automatic cleanup of failed builds

Conclusion

Pungi represents a mature, production-grade orchestration system that has evolved over years to handle Fedora's complex release process. Its key insight is that orchestration is more valuable than implementation - by coordinating existing tools rather than rebuilding functionality, it achieves reliability and maintainability.

The Complete Integration Picture

Pungi's success comes from its ability to orchestrate three complementary systems:

  1. Koji: Provides the package foundation - building individual RPM packages in isolated environments
  2. Mock: Provides the build environment - creating isolated chroots for specific build operations
  3. Pungi: Provides the orchestration layer - coordinating the entire compose process

Why This Architecture Works:

  • Separation of Concerns: Each system has a focused responsibility
  • Leverage Existing Tools: Pungi doesn't rebuild what Koji and Mock already do well
  • Flexible Integration: Pungi can use Mock when needed, but doesn't require it for all operations
  • Scalable Coordination: Pungi can coordinate complex workflows across multiple systems

The Orchestration Advantage:

  • Koji handles package building at scale across multiple architectures
  • Mock handles environment isolation when specific build environments are needed
  • Pungi handles release coordination ensuring all pieces fit together correctly

This architecture allows Fedora to maintain massive scale while keeping individual components focused and maintainable.

For deb-compose, the lesson is clear: focus on being an excellent orchestrator rather than trying to implement everything. Pungi's success comes from its ability to coordinate complex workflows while delegating actual work to specialized tools. This architecture allows it to handle Fedora's massive scale while remaining maintainable and extensible.

The roadmap's approach of building incrementally with clear phases aligns well with Pungi's proven architecture. By starting with core orchestration and gradually adding complexity, deb-compose can achieve similar reliability without the initial complexity that Pungi has accumulated over years of production use.

Key Takeaways for deb-compose Development

  1. Start Simple: Begin with basic phase orchestration rather than complex features
  2. Delegate Wisely: Focus on coordination, not implementation
  3. Fail Gracefully: Implement basic error handling from the start
  4. Grow Incrementally: Add complexity only when needed
  5. Learn from Pungi: Study Pungi's patterns but avoid its complexity initially

This analysis provides a solid foundation for understanding how to build a successful compose orchestration system while avoiding the pitfalls of over-engineering early in development.