Commit graph

28 commits

Author SHA1 Message Date
Brian C. Lane
9827126d30 dnfjson: Add dnf-json result cache to BaseSolver
This is used to cache the results of dump and search requests for 60s.
Once the timeout has passed the request is repeated and the timeout
reset. The timeout is *not* reset on every cache hit which prevents, for
example, a request every 59 seconds from keeping the cache from
updating.

When the existing CleanCache() function is called to check the on-disk
metadata cache it will also delete any expired entries from the
resultCache in order to keep it from eventually consuming all memory.
2022-09-15 11:34:39 +01:00
Brian C. Lane
e307a8174a dnfjson: Add Hash functions to repoConfig and Request
These will be used to generate a unique hash to be used with the cache
of dnf-json results.
2022-09-15 11:34:39 +01:00
Brian C. Lane
35059ca60e dnfjson: Add a cache of dnf-json results
This adds a cache structure with timeout handling and cache cleanup.
Also adds some testing of the new functions.
2022-09-15 11:34:39 +01:00
Brian C. Lane
a751dfe71c dnfjson: Add the search support to the Solver
Pass the list of package names or globs to dnf-json and return the
results.
2022-08-23 22:47:46 +01:00
Sanne Raymaekers
03b57f002c jobqueue: Move jobqueue out of internal 2022-07-04 15:37:28 +02:00
Achilleas Koutsou
e340687ab5 rpmmd: add IgnoreSSL field to PackageSpec
After a depsolve, each package inherits the `IgnoreSSL` value from its
repository configuration.

This information is not yet used.  It will be used to expose this
information to osbuild's org.osbuild.curl stage.

The test data is updated to match the new behaviour:
The test repository config specifies `IgnoreSSL=true` and the packages
in the response inherit the value.
2022-06-15 20:13:47 +02:00
Sanne Raymaekers
fe918fd8a0 dnfjson: Move subscriptions to Solver with config
The BaseSolver is an object which gets constructed when the worker
starts, and the subscriptions attached to it expire after about 3
days. By refreshing the subscriptions each time a new Solver is created,
valid subscriptions are used.
2022-06-15 15:15:23 +02:00
Achilleas Koutsou
af94d28b52 dnfjson: test for repo name and URL in error message 2022-06-14 11:39:07 +02:00
Achilleas Koutsou
0c13277940 dnfjson: append name and URL a repository to error message
If dnf-json returns an error that is related to a repository, it uses
the ID to identify the repository that caused the error.  Since IDs
can't easily be mapped back to a configuration, appending the URL and
name (if any) to the error message makes it easier to identify which
repository failed.
Keeping the ID in the message is also useful for finding the cache
directory of the repository if needed.
2022-06-14 11:39:07 +02:00
Achilleas Koutsou
9fc3f17117 dnfjson: acquire read lock when calling dnf-json 2022-06-10 12:45:41 +01:00
Achilleas Koutsou
fb34c69e91 dnfjson: lock cache directory when cleaning
Apply a RWMutex lock to a cache directory.
A global map of cache locks is maintained, keyed by the absolute path to
the cache directory, so multiple cache instances can coexist and share
locks if they use the same cache root.

Currently, the lock only prevents multiple concurrent `shrink()`
operations when multiple cache instances share the same root.
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
31f7040e05 dnfjson: use new size-based cache management
- Update timestamps for cache elements whenever a repository is used.
- Call the new `shrink()` function instead of the old `clean()`.
- Remove the old `clean()` function.
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
bd2fbee48c dnfjson: add cache unit tests
Create cache-like directory trees on disk and check that the info is
read as expected and that the expected caches are removed by `shrink()`.
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
542da40844 dnfjson: skip deletion if repoID not found in repoElements
If the repoRecency and repoElements somehow become inconsistent (an ID
in repoRecency does not exist in repoElements), ignore and continue.
The repoID will be removed from the repoRecency list at the end as it's
still counted in the nDeleted.
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
a7a1f1ac07 dnfjson: size-based cache management
Functions for managing repository cache management based on a max
desirable size for the entire dnf-json cache directory.
While none of the functions are currently used, the workflow should
be as follows:
- Update the timestamp of a repository whenever it's used in a
  transaction by calling `touchRepo()` with the repository ID and the
  current time.
- Update the internal cache information when desired by calling
  `updateInfo()`.  This should be called for example after multiple
  depsolve transactions are run for a single build request.
- Shrink the cache to below the configured maxSize by calling
  `shrink()`.

The most important work happens in `updateInfo()`.  It collects all the
information it needs from the on-disk cache directories and organises it
in a way that makes it convenient for the `shrink()` function to run
efficiently.  It stores three important pieces of information:
1. repoElements: a map that links a repository ID with all the
   information about a repository's cache:
    - the top-level elements (files and directories) for the cache
    - size of the repository cache (total of all elements)
    - most recent mtime from all the elements which, if the
      `touchRepo()` call is consistently used, should reflect the most
      recent time the repository was used
2. repoRecency: a list of repository IDs sorted by mtime (oldest first)
3. size: the total size of the cache (total of all repository caches)

This way, when `shrink()` is called, the paths associated with the
least-recently-used repositories can be easily deleted by iterating on
repoRecency, obtaining the repository info from the map, deleting every
path in the repoElements array, and subtracting the repository's size
from the total.  The `shrink()` function stops when the new size is
below the maxSize (or when all repositories have been deleted).
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
b8d16bc395 dnfjson: cache information and methods as a substruct
Move cache handling data and code to a substruct of the BaseSolver.
This is all internal to the dnfjson package.

Paves the way for cache management with a persistent state.
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
af4b474e89 dnfjson: add docstrings to public methods and BaseSolver 2022-06-10 12:45:41 +01:00
Achilleas Koutsou
9fda1ff55f dnfjson: cache cleanup
Added CleanCache() method to the solver that deletes all the caches if
the total size grows above a certain (configurable) limit
(default: 500 MiB).

The function is called externally to handle errors (usually log or
ignore completely) and to avoid calling multiple times for multiple
depsolves of a single request.

The cleanup is extremely simple and is meant as a placeholder for more
sophisticated cache management.  The goal is to simply avoid ballooning
cache sizes that might cause issues for users or our own services.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
640dfac7a7 dnfjson: remove one-shot helper functions
They were originally added as convenience functions for single-case
calls, but they're not that useful and they have a million function
arguments, which isn't pretty.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
28862936bf dnfjson: convert depsoToRPMMD() to packageSpecs method
New type `packageSpecs` is an alias to `[]PackageSpec`.  The
`depsToRPMMD()` function is no a method of this type.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
7a70a5e69b dnfjson: drop repo checksums
The repository checksums in the response from dnf-json aren't used
anywhere.  Since we're making changes to dnf-json and depsolving, now is
a good opportunity to drop them completely.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
c092783a70 simplify package set chain handling
Move package set chain collation to the distro package and add
repositories to the package sets while returning the package sets from
their source, i.e., the ImageType.PackageSets() method.

This also removes the concept of "base repositories".  There are no
longer repositories that are added implicitly to all package sets but
instead each package set needs to specify *all* the repositories it will
be depsolved against.

This paves the way for the requirement we have for building RHEL 7
images with a RHEL 8 build root.  The build root package set has to be
depsolved against RHEL 8 repositories without any "base repos" included.
This is now possible since package sets and repositories are explicitly
associated from the start and there is no implicit global repository
set.

The change requires adding a list of PackageSet names to the core
rpmmd.RepoConfig.  In the cloud API, repositories that are limited to
specific package sets already contain the correct package set names and
these are now copied to the internal RepoConfig when converting types in
genRepoConfig().
The user-specified repositories are only associated with the payload
package sets like before.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
5a01d6b339 dnfjson: skip dnf-json tests if dnf python module isn't available
On systems where `dnf` and the Python module aren't available, skip the
unit tests that call into the `dnf-json` script.
A test flag, `-force-dnf` is added to avoid this check and run the tests
unconditionally.  This is useful for cases where the sniff check might
fail for wrong reasons or, more importantly, for cases where we want to
be sure the tests are ran and consider a missing `dnf` module to be an
error state (e.g., in CI).
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
86536f11e7 rpmmd: add Repositories list to PackageSet struct
Attach the repository configurations that are specific to a package set
directly on the PackageSet object.  This simplifies the Depsolve()
signature and avoids requiring a `nil` when no additional repositories
are required.  More importantly, it makes associating repositories to
package sets explicit, no longer relying on matching array indices or
map keys.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
1c4d8f9988 dnfjson: use repo config hash as repo ID
Defined a Hash() method on rpmmd.RepoConfig that calculates a SHA-256 ID
for a repository based on its configuration.  Identical configurations
should produce the same ID.  The Name and ImageTypeTags of a repository
aren't taken into account.  These attributes affect a repository's
functional configuration.

This ID lets us change the way we handle repository configurations in a
few places:
- Preparing the depsolve job arguments is simpler since we have
  predictable IDs for the repository configurations.  We don't need to
  rely on the index of a RepoConfig in a list to identify or access it,
  which prevented us from building a list of all repository
  configurations, since we needed them to be placed in the list in a
  certain order.
- Associating packages from the depsolve result with the repository
  configuration (in depsToRPMMD) no longer relies on an ID string
  converted from and back to an integer index.  Repositories define
  their own IDs.
- Tests are a bit messier now but the changes simplify the main code, so
  it's an acceptable trade-off.
    - Fixtures need to change based on the repository configuration for
      the test.
    - We need to calculate the ID for the repository configuration for
      the temporary file server URL.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
61d7c465af dnfjson: remove single Depsolve function and command
Remove the single Depsolve function from the dnfjson package and the
depsolve command from the dnf-json tool.  The new ChainDepsolve
functions and chain-depsolve command can handle single depsolves in the
same way so there's no need to keep (and have to maintain) two versions
of very similar code.

The ChainDepsolve function (in Go) and chain-depsolve command (in
Python) have been renamed to plain Depsolve and depsolve respectively,
since they are now general purpose depsolve functions.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
d0da8fd122 dnfjson: add package tests
The rpmrepo mock contains code to be used for testing depsolving.  It
creates a file server that serves the metadata in test/data/testrepo and
can be used as a repository for depsolve tests.

The dnfjson tests perform a single depsolve with an expected response.
The chain depsolve tests perform multiple depsolves that should produce
the same expected response:
- Single transaction using the ChainDepsove() function
- Two transactions for the same packages split in two with no extra
  repositories
- Two transactions for the same packages split in two with the main
  repository redefined

dnfjsontest: squash
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
4b289ce861 New package: dnfjson
This package is meant to serve as the interface between osbuild-composer
and the (new, upcoming) dnf-json.  It defines structures and functions
for calling the dnf-json commands ("depsolve" and "dump").  The package
uses the rpmmd types to interface with osbuild-composer and converts
them to the necessary representations (for dnf-json) internally.  New
types aren't made public unless necessary.

A lot of the functions and types are copied or adapted from the rpmmd
package and those will eventually be removed.  The rpmmd package will
remain to manage RPM package representations and conversion functions.

The FetchMetadata() function sorts the packages it will return, as does
the original implementation in rpmmd, but now the sort key is the NVR.
This is to make package order stable when multiple packages have the
same name (multiple version of the same package).  This way, the
'builds' arrays of the resulting package infos will also have a stable
order.

The request and result structures differ from the current implementation
of dnf-json.  The change is meant to simplify handling multiple
depsolves with the same dnf.Base object and the new dnf-json tool will
be made to handle this request structure.

The dnf-json command is configurable and supports command line arguments
if necessary.

Signed-off-by: Achilleas Koutsou <achilleas@koutsou.net>
2022-06-01 11:36:52 +01:00