tumbi-assembler/pungi/phases
Adam Williamson c8fe99b1aa pkgset: optimize cache check (saves 20 minutes)
The pkgset phase takes around 35 minutes in current composes.
Around 20 minutes of that is spent creating these per-arch
subsets of the global package set. In a rather roundabout way
(see #1794 ), I figured out that almost all of this time is
spent in this cache check, which is broken for a subtle reason.

Python's `in` keyword works by first attempting to call the
container's magic `__contains__` method. If the container does
not implement `__contains__`, it falls back to iteration - it
tries to iterate over the container until it either hits what
it's looking for, or runs out. (If the container implements
neither, you get an error).

The FileCache instance's `file_cache` is a plain Python dict.
dicts have a very efficient `__contains__` implementation, so
doing `foo in (somedict)` is basically always very fast no matter
how huge the dict is. FileCache itself, though, implements
`__iter__` by returning an iterator over the `file_cache` dict's
keys, but it does *not* implement `__contains__`. So when we do
`foo in self.file_cache`, Python has to iterate over every key
in the dict until it hits foo or runs out. This is massively
slower than `foo in self.file_cache.file_cache`, which uses the
efficient `__contains__` method.

Because these package sets are so huge, and we're looping over
*one* huge set and checking each package from it against the cache
of another, increasingly huge, set, this effect becomes massive.
To make it even worse, I ran a few tests where I added a debug log
if we ever hit the cache, and it looks like we never actually do -
so every check has to iterate through the entire dict.

We could probably remove this entirely, but changing it to check
the dict instead of the FileCache instance makes it just about as
fast as taking it out, so I figured let's go with that in case
there's some unusual scenario in which the cache does work here.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-11-19 00:22:09 -08:00
..
gather gather: Skip lookaside packages from local lookaside repo 2024-06-05 09:45:02 +02:00
pkgset pkgset: optimize cache check (saves 20 minutes) 2024-11-19 00:22:09 -08:00
__init__.py Remove live_images.py (LiveImagesPhase) 2024-08-05 10:55:08 +00:00
base.py Log time taken of each phase 2022-07-12 16:56:41 +08:00
buildinstall.py iso: Extract volume id with xorriso if available 2024-04-23 09:13:43 +02:00
createiso.py createiso: Block reuse if unsigned packages are allowed 2024-08-14 11:09:37 +02:00
createrepo.py Fix module defaults and obsoletes validation 2022-06-10 11:35:26 +00:00
extra_files.py Format code base with black 2020-02-05 17:35:47 +08:00
extra_isos.py createiso: Block reuse if unsigned packages are allowed 2024-08-14 11:09:37 +02:00
image_build.py image_build: drop .tar.gz as an expected extension for docker 2024-09-19 23:03:58 -07:00
image_checksum.py Adding multithreading support for pungi/phases/image_checksum.py 2021-08-12 10:13:15 +02:00
image_container.py Various phases: consistent format of failure message 2024-03-13 12:17:11 +00:00
init.py init: Filter comps for modular variants with tags 2022-11-03 11:11:01 +01:00
kiwibuild.py move osbuild/kiwi-specific EXTENSIONS to each phase 2024-09-19 23:03:15 -07:00
livemedia_phase.py Add ability to download images 2023-08-23 07:26:56 +00:00
osbs.py Include task ID in DONE message for OSBS phase 2024-03-13 12:17:11 +00:00
osbuild.py move osbuild/kiwi-specific EXTENSIONS to each phase 2024-09-19 23:03:15 -07:00
ostree.py checks: don't require "repo" in the "ostree" schema 2024-01-19 08:25:09 +01:00
ostree_container.py ostree_container: make filename configurable, include arch 2024-10-09 16:33:48 -07:00
ostree_installer.py Add skip_branding to ostree_installer. 2022-05-11 15:19:53 +02:00
phases_metadata.py Format code base with black 2020-02-05 17:35:47 +08:00
repoclosure.py Delete outdated comments 2020-04-22 17:14:51 +08:00
test.py Stop trying to validate non-existent metadata 2021-11-04 09:57:20 +01:00
weaver.py Format code base with black 2020-02-05 17:35:47 +08:00