Replace the dnf-json `Hash()` function in
favour of a hash calculated using the
`rpmmd.RepConfig.Hash()` function. The
`repoHash` field is populated when converting
a `rpmmd.RepoConfig` to `dnfjson.repoConfig`
object. The `dnfson.repoConfig.Hash()` function
then returns the `repoHash` field instead of
re-calculating the hash.
Convert some of the fields in the `RepoConfig` struct
to pointers. Since `RepoConfig` will be used to convert
custom repositories to an array of `osbuild.YumRepository`,
we need to ensure that fields that are not set explicitly
are not saved to the `/etc/yum.repos.d` repository files.
Update the internal RepoConfig object to
accept a slice of baseurls rather than a
single field. This change was needed to
align RepoConfig with the dnf spec [1].
Additionally, this change adds custom json
marshal and unmarshal functions to ensure
backwards compatibility with older workers.
Add json tags to the internal rpmmd config
since this is serialized in dnfjson.
Add unit tests to check the serialization
is okay.
[1] See dnf.config
This adds a function, CleanupOldCacheDirs, that checks the dirs under
/var/cache/osbuild-composer/rpmmd/ and removes files and directories
that don't match the current list of supported distros.
This will clean up the cache from old releases as the are retired, and
will also cleanup the old top level cache directory structure after an
upgrade.
NOTE: This function does not return errors, any real problems it
encounters will also be caught by the cache initialization code and
handled there.
This causes dnf-json to use separate caches, allowing them to run in
parallel, with one lock per distribution. Multiple depsolves with the
same distribution in the blueprint will continue to be serial.
This allows verification of repository metadata signatures.
The gpgkeys field is a list of key urls, or the gpg key itself, starting
with '-----BEGIN PGP PUBLIC KEY BLOCK-----'. These will be written to a
temporary file, and that file:// url will be passed to dnf.
DNF supports more than one GPG key. It is possible that one may be used for
signing packages, and another to sign the repository metadata. This
renamed GPGKey to GPGKeys internally. It does not change the on-disk
repository json format.
This is used to cache the results of dump and search requests for 60s.
Once the timeout has passed the request is repeated and the timeout
reset. The timeout is *not* reset on every cache hit which prevents, for
example, a request every 59 seconds from keeping the cache from
updating.
When the existing CleanCache() function is called to check the on-disk
metadata cache it will also delete any expired entries from the
resultCache in order to keep it from eventually consuming all memory.
After a depsolve, each package inherits the `IgnoreSSL` value from its
repository configuration.
This information is not yet used. It will be used to expose this
information to osbuild's org.osbuild.curl stage.
The test data is updated to match the new behaviour:
The test repository config specifies `IgnoreSSL=true` and the packages
in the response inherit the value.
The BaseSolver is an object which gets constructed when the worker
starts, and the subscriptions attached to it expire after about 3
days. By refreshing the subscriptions each time a new Solver is created,
valid subscriptions are used.
If dnf-json returns an error that is related to a repository, it uses
the ID to identify the repository that caused the error. Since IDs
can't easily be mapped back to a configuration, appending the URL and
name (if any) to the error message makes it easier to identify which
repository failed.
Keeping the ID in the message is also useful for finding the cache
directory of the repository if needed.
Apply a RWMutex lock to a cache directory.
A global map of cache locks is maintained, keyed by the absolute path to
the cache directory, so multiple cache instances can coexist and share
locks if they use the same cache root.
Currently, the lock only prevents multiple concurrent `shrink()`
operations when multiple cache instances share the same root.
- Update timestamps for cache elements whenever a repository is used.
- Call the new `shrink()` function instead of the old `clean()`.
- Remove the old `clean()` function.
If the repoRecency and repoElements somehow become inconsistent (an ID
in repoRecency does not exist in repoElements), ignore and continue.
The repoID will be removed from the repoRecency list at the end as it's
still counted in the nDeleted.
Functions for managing repository cache management based on a max
desirable size for the entire dnf-json cache directory.
While none of the functions are currently used, the workflow should
be as follows:
- Update the timestamp of a repository whenever it's used in a
transaction by calling `touchRepo()` with the repository ID and the
current time.
- Update the internal cache information when desired by calling
`updateInfo()`. This should be called for example after multiple
depsolve transactions are run for a single build request.
- Shrink the cache to below the configured maxSize by calling
`shrink()`.
The most important work happens in `updateInfo()`. It collects all the
information it needs from the on-disk cache directories and organises it
in a way that makes it convenient for the `shrink()` function to run
efficiently. It stores three important pieces of information:
1. repoElements: a map that links a repository ID with all the
information about a repository's cache:
- the top-level elements (files and directories) for the cache
- size of the repository cache (total of all elements)
- most recent mtime from all the elements which, if the
`touchRepo()` call is consistently used, should reflect the most
recent time the repository was used
2. repoRecency: a list of repository IDs sorted by mtime (oldest first)
3. size: the total size of the cache (total of all repository caches)
This way, when `shrink()` is called, the paths associated with the
least-recently-used repositories can be easily deleted by iterating on
repoRecency, obtaining the repository info from the map, deleting every
path in the repoElements array, and subtracting the repository's size
from the total. The `shrink()` function stops when the new size is
below the maxSize (or when all repositories have been deleted).
Move cache handling data and code to a substruct of the BaseSolver.
This is all internal to the dnfjson package.
Paves the way for cache management with a persistent state.
Added CleanCache() method to the solver that deletes all the caches if
the total size grows above a certain (configurable) limit
(default: 500 MiB).
The function is called externally to handle errors (usually log or
ignore completely) and to avoid calling multiple times for multiple
depsolves of a single request.
The cleanup is extremely simple and is meant as a placeholder for more
sophisticated cache management. The goal is to simply avoid ballooning
cache sizes that might cause issues for users or our own services.
They were originally added as convenience functions for single-case
calls, but they're not that useful and they have a million function
arguments, which isn't pretty.
The repository checksums in the response from dnf-json aren't used
anywhere. Since we're making changes to dnf-json and depsolving, now is
a good opportunity to drop them completely.
Move package set chain collation to the distro package and add
repositories to the package sets while returning the package sets from
their source, i.e., the ImageType.PackageSets() method.
This also removes the concept of "base repositories". There are no
longer repositories that are added implicitly to all package sets but
instead each package set needs to specify *all* the repositories it will
be depsolved against.
This paves the way for the requirement we have for building RHEL 7
images with a RHEL 8 build root. The build root package set has to be
depsolved against RHEL 8 repositories without any "base repos" included.
This is now possible since package sets and repositories are explicitly
associated from the start and there is no implicit global repository
set.
The change requires adding a list of PackageSet names to the core
rpmmd.RepoConfig. In the cloud API, repositories that are limited to
specific package sets already contain the correct package set names and
these are now copied to the internal RepoConfig when converting types in
genRepoConfig().
The user-specified repositories are only associated with the payload
package sets like before.
On systems where `dnf` and the Python module aren't available, skip the
unit tests that call into the `dnf-json` script.
A test flag, `-force-dnf` is added to avoid this check and run the tests
unconditionally. This is useful for cases where the sniff check might
fail for wrong reasons or, more importantly, for cases where we want to
be sure the tests are ran and consider a missing `dnf` module to be an
error state (e.g., in CI).
Attach the repository configurations that are specific to a package set
directly on the PackageSet object. This simplifies the Depsolve()
signature and avoids requiring a `nil` when no additional repositories
are required. More importantly, it makes associating repositories to
package sets explicit, no longer relying on matching array indices or
map keys.
Defined a Hash() method on rpmmd.RepoConfig that calculates a SHA-256 ID
for a repository based on its configuration. Identical configurations
should produce the same ID. The Name and ImageTypeTags of a repository
aren't taken into account. These attributes affect a repository's
functional configuration.
This ID lets us change the way we handle repository configurations in a
few places:
- Preparing the depsolve job arguments is simpler since we have
predictable IDs for the repository configurations. We don't need to
rely on the index of a RepoConfig in a list to identify or access it,
which prevented us from building a list of all repository
configurations, since we needed them to be placed in the list in a
certain order.
- Associating packages from the depsolve result with the repository
configuration (in depsToRPMMD) no longer relies on an ID string
converted from and back to an integer index. Repositories define
their own IDs.
- Tests are a bit messier now but the changes simplify the main code, so
it's an acceptable trade-off.
- Fixtures need to change based on the repository configuration for
the test.
- We need to calculate the ID for the repository configuration for
the temporary file server URL.
Remove the single Depsolve function from the dnfjson package and the
depsolve command from the dnf-json tool. The new ChainDepsolve
functions and chain-depsolve command can handle single depsolves in the
same way so there's no need to keep (and have to maintain) two versions
of very similar code.
The ChainDepsolve function (in Go) and chain-depsolve command (in
Python) have been renamed to plain Depsolve and depsolve respectively,
since they are now general purpose depsolve functions.
The rpmrepo mock contains code to be used for testing depsolving. It
creates a file server that serves the metadata in test/data/testrepo and
can be used as a repository for depsolve tests.
The dnfjson tests perform a single depsolve with an expected response.
The chain depsolve tests perform multiple depsolves that should produce
the same expected response:
- Single transaction using the ChainDepsove() function
- Two transactions for the same packages split in two with no extra
repositories
- Two transactions for the same packages split in two with the main
repository redefined
dnfjsontest: squash
This package is meant to serve as the interface between osbuild-composer
and the (new, upcoming) dnf-json. It defines structures and functions
for calling the dnf-json commands ("depsolve" and "dump"). The package
uses the rpmmd types to interface with osbuild-composer and converts
them to the necessary representations (for dnf-json) internally. New
types aren't made public unless necessary.
A lot of the functions and types are copied or adapted from the rpmmd
package and those will eventually be removed. The rpmmd package will
remain to manage RPM package representations and conversion functions.
The FetchMetadata() function sorts the packages it will return, as does
the original implementation in rpmmd, but now the sort key is the NVR.
This is to make package order stable when multiple packages have the
same name (multiple version of the same package). This way, the
'builds' arrays of the resulting package infos will also have a stable
order.
The request and result structures differ from the current implementation
of dnf-json. The change is meant to simplify handling multiple
depsolves with the same dnf.Base object and the new dnf-json tool will
be made to handle this request structure.
The dnf-json command is configurable and supports command line arguments
if necessary.
Signed-off-by: Achilleas Koutsou <achilleas@koutsou.net>