Commit graph

6548 commits

Author SHA1 Message Date
Tomas Hozza
fa37005a32 worker/server: add JobDependencyChainErrors() method
Add new `JobDependencyChainErrors()` method for gathering a stack trace
of job errors from the job's dependencies which caused it to fail.

The `JobDependencyChainErrors()` implementation uses job-type specific
`...Status()` methods intentionally, because job-type specific status
methods check the job's result in a slightly different way and set
the result.JobError to a specific value. Due to this reason, it would
not be practical to introduce a generic `JobStatus()` method and get rid
of the `switch` block, because in reality, the new method would have
to implement an equivalent `switch` block as well.

Add unit test covering the method functionality.
2022-06-10 14:48:18 +01:00
Tomas Hozza
5bd02f2f27 worker: treat ErrorKojiFailedDependency as a dependency error
The `ErrorKojiFailedDependency` was previously not treated as a
dependency error. Fix it.
2022-06-10 14:48:18 +01:00
Tomas Hozza
d9e4889866 worker: rename HasDependencyError() -> IsDependencyError()
Rename the `HasDependencyError()` method to `IsDependencyError()` to
better express what it does.
2022-06-10 14:48:18 +01:00
Tomas Hozza
cc1ff1ee1b worker/koji-finalize: handle both osbuild and osbuild-koji results
Adjust the `koji-finalize` job implementation to be able to handle
results from both the `osbuild` and `osbuild-koji` jobs.

In case of `osbuild` job, the result is of type
`worker.OSBuildJobResult` and the important values are stored in the
Koji upload target options. For now assume that there may be only a
single upload target results.

In case of `osbuild-koji` job, the result is of type
`worker.OSBuildKojiJobResult` and the important values are already part
of the structure. Add "Old" suffix to all functions handling this case.
2022-06-10 14:48:18 +01:00
Tomas Hozza
66f7eaf440 worker/osbuild: check errors of all job dependencies
Ensure that none of the job dependencies failed. This covers the case
when there are more than one job dependencies, which will be the case
for Koji composes.
2022-06-10 14:48:18 +01:00
Tomas Hozza
4032dea6d2 Cloud API: support composes/<id>/manifests endpoint for non-koji builds
Support the composes/<id>/manifests API endpoint for non-koji builds.
The endpoint will have to anyway handle `osbuild` job results once Koji
composes will start using `osbuild` job type for builds.

The endpoint previously contained a bug. If the `osbuild-koji` job had
an empty manifest attached as a static job argument (this is the default
type value), then this empty manifest was added to the endpoint
response. Since Cloud API uses the depsolve and manifest jobs, the
actual manifest was never attached to the job as a static argument. As a
result, the endpoint was always returning an empty manifest for any koji
compose. Fixing this required also adjusting unit tests, which was
relying on the buggy behavior.

Extend the unit test testing a successful compose to test the logs
endpoint.
2022-06-10 14:48:18 +01:00
Tomas Hozza
205dcd4147 Cloud API: support composes/<id>/logs endpoint for non-koji builds
Support the composes/<id>/logs API endpoint for non-koji builds. The
endpoint will have to anyway handle `osbuild` job results once Koji
composes will start using `osbuild` job type for builds.

Extend the unit test testing a successful compose to test the logs
endpoint.
2022-06-10 14:48:18 +01:00
Tomas Hozza
97da1e7ad6 worker/osbuild: handle manifest dynamic argument index
Previously, the `OSBuild` job assumed that it can have only a single
job dependency, which could be only the `ManifestJobByID`. This won't
work well for the Koji use case, because the Koji OSBuild job has also
dependency on the Koji-init job.

Extend the `worker.OSBuildJob` structure with a new field, which holds
the `ManifestJobByIDResult` index in the job's dynamic arguments slice.
This value is considered in case when there is more than one dependency
of the `OSBuild` job.
2022-06-10 14:48:18 +01:00
Tomas Hozza
a4e6531565 worker: define job types as constants
Define supported job type names as constants and use them in all places,
instead of string literals.

There are multiple benefits of this approach. Using constants removed
the room for typos in the string literals. One can use autocompletion in
IDE for job types. Using constant makes it easier to find all references
where it is used and thus all places that are handling a specific job
type.
2022-06-10 14:48:18 +01:00
Tomas Hozza
69b9f115c9 worker: allow enqueueing OSBuild job with multiple dependencies
Change the definition of `EnqueueOSBuildAsDependency()` function to
accept a slice of job IDs on which the OSBuild job depends. Previously,
only the manifest job ID was accepted as the only possible dependency.
This change will be needed in order to enqueue OSBuild jobs for Koji,
which depends on two jobs.
2022-06-10 14:48:18 +01:00
Tomas Hozza
da78a76751 target/koji: delete unused fields from KojiTargetOptions
Delete unused fields from the `KojiTargetOptions` structure.
2022-06-10 14:48:18 +01:00
Tomas Hozza
42d623b743 worker/osbuild: support Koji target
Add Koji as a separate upload target to the osbuild job implementation.
2022-06-10 14:48:18 +01:00
Tomas Hozza
c7126e3c70 target: add KojiTargetResultOptions
Add `KojiTargetResultOptions` which contains the values contained in
`OSBuildKojiJobResult`, but not in `OSBuildJobResult`.
2022-06-10 14:48:18 +01:00
Tomas Hozza
bb54318432 worker/osbuild: add host OS and architecture to job result
It is generally useful to have this information in the
`OSBuildJobResult`. This information is currently part of the
`OSBuildKojiJobResult`. Instead of moving it to the new
`KojiTargetResultOptions`, lets move it to the `OSBuildJobResult`
structure and set it for all jobs.
2022-06-10 14:48:18 +01:00
Tomas Hozza
c7e5e3c9c2 Move GetRedHatRelease() and GetHostDistroName() to common package
The `distro` package is now used for distro definitions supported by
osbuild-composer, not for introspecting the Host system. Move
`GetRedHatRelease()` and `GetHostDistroName()` functions to the `common`
package.
2022-06-10 14:48:18 +01:00
Tomas Hozza
804d4210df worker: standardize logging in OCI target
The OCI target used `log`, instead of `logWithId` for logging messages.
Modify the code to be consistent with other targets.
2022-06-10 14:48:18 +01:00
Achilleas Koutsou
8e0db1a4e3 gen-manifests: print message about leftover caches
Print the location of the cache directory to the user in case they want
to clean or inspect it.
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
9fc3f17117 dnfjson: acquire read lock when calling dnf-json 2022-06-10 12:45:41 +01:00
Achilleas Koutsou
fb34c69e91 dnfjson: lock cache directory when cleaning
Apply a RWMutex lock to a cache directory.
A global map of cache locks is maintained, keyed by the absolute path to
the cache directory, so multiple cache instances can coexist and share
locks if they use the same cache root.

Currently, the lock only prevents multiple concurrent `shrink()`
operations when multiple cache instances share the same root.
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
31f7040e05 dnfjson: use new size-based cache management
- Update timestamps for cache elements whenever a repository is used.
- Call the new `shrink()` function instead of the old `clean()`.
- Remove the old `clean()` function.
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
bd2fbee48c dnfjson: add cache unit tests
Create cache-like directory trees on disk and check that the info is
read as expected and that the expected caches are removed by `shrink()`.
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
542da40844 dnfjson: skip deletion if repoID not found in repoElements
If the repoRecency and repoElements somehow become inconsistent (an ID
in repoRecency does not exist in repoElements), ignore and continue.
The repoID will be removed from the repoRecency list at the end as it's
still counted in the nDeleted.
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
a7a1f1ac07 dnfjson: size-based cache management
Functions for managing repository cache management based on a max
desirable size for the entire dnf-json cache directory.
While none of the functions are currently used, the workflow should
be as follows:
- Update the timestamp of a repository whenever it's used in a
  transaction by calling `touchRepo()` with the repository ID and the
  current time.
- Update the internal cache information when desired by calling
  `updateInfo()`.  This should be called for example after multiple
  depsolve transactions are run for a single build request.
- Shrink the cache to below the configured maxSize by calling
  `shrink()`.

The most important work happens in `updateInfo()`.  It collects all the
information it needs from the on-disk cache directories and organises it
in a way that makes it convenient for the `shrink()` function to run
efficiently.  It stores three important pieces of information:
1. repoElements: a map that links a repository ID with all the
   information about a repository's cache:
    - the top-level elements (files and directories) for the cache
    - size of the repository cache (total of all elements)
    - most recent mtime from all the elements which, if the
      `touchRepo()` call is consistently used, should reflect the most
      recent time the repository was used
2. repoRecency: a list of repository IDs sorted by mtime (oldest first)
3. size: the total size of the cache (total of all repository caches)

This way, when `shrink()` is called, the paths associated with the
least-recently-used repositories can be easily deleted by iterating on
repoRecency, obtaining the repository info from the map, deleting every
path in the repoElements array, and subtracting the repository's size
from the total.  The `shrink()` function stops when the new size is
below the maxSize (or when all repositories have been deleted).
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
b8d16bc395 dnfjson: cache information and methods as a substruct
Move cache handling data and code to a substruct of the BaseSolver.
This is all internal to the dnfjson package.

Paves the way for cache management with a persistent state.
2022-06-10 12:45:41 +01:00
Achilleas Koutsou
af4b474e89 dnfjson: add docstrings to public methods and BaseSolver 2022-06-10 12:45:41 +01:00
Major Hayden
26b93f8f25 Blacklist amdgpu module on Azure images
The `amdgpu` module causes issues on certain GPU-enabled instances
on Azure and it must not be loaded by default.
Modules are sorted alphabetically.

Signed-off-by: Major Hayden <major@redhat.com>
Co-Authored-By: Christian Kellner <christian@redhat.com>
2022-06-09 14:18:45 +01:00
Sanne Raymaekers
8d5cdfdd57 osbuild-worker: Correct cast of dnfjson error in depsolve job
This error is failing to parse correctly on the workers as a
dnfjson.Error. The old rpmmd.DNFError was returned by pointer, however
the internal/dnfjson package returns the Error by value.
2022-06-08 23:07:37 +02:00
Sanne Raymaekers
ff408aa68f osbuild-service-maintenance: Vacuum tables
Call vacuum analyze after each chunk of updates, and dump vacuum stats
at the beginning and end of the db cleanup.

Nulling results can increase size on disk, but calling vacuum analyze
will free up space within the table (not on disk) and reuse the space
for new inserts and updates.
2022-06-08 21:12:46 +02:00
Sanne Raymaekers
8bfc6c9961 dbjobqueue: Filter maintenance queries based on results
Jobs that already had their results nulled, shouldn't be included in the
maintenance job.
2022-06-08 21:12:46 +02:00
Juan Abia
4827f0e83e add cloud-image-val to aws test
cloud-image-val is a tool that performs basic validation tests on cloud
images. Incorporate this tool in aws.sh test
2022-06-08 16:14:35 +02:00
Juan Abia
c255267d96 save report.html from cloud-image-val as an artifact 2022-06-08 16:14:35 +02:00
Tomas Hozza
8635b7d2bb dbjobqueue-tests: fix issue introduced by PR #2618 2022-06-08 14:28:03 +02:00
Chloe Kaubisch
873798514b prometheus: add tenant label
Include a tenant label for all prometheus metrics. Modify
jobstatus function in the worker accordingly to return channel
so it can be passed to prometheus.
2022-06-07 16:35:03 +02:00
Ondřej Budai
5315264f2e packer: pin the vector version
See the comment inline.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-06-07 09:08:22 +02:00
Sanne Raymaekers
92ae2f7c83 osbuild-service-maintenance: Delete/update results in chunks
The results of the manifest jobs can be very big, and operating on
30-40k rows at once can starve or crash a smaller rds instance.
2022-06-06 17:49:46 +02:00
Alexander Todorov
daaab1742e Update dependency of osbuild to v57 2022-06-06 12:53:43 +02:00
Alexander Todorov
8e372a257e ci: Remove 8.6 & 9.0 nightly repos in Schutzfile 2022-06-06 12:53:43 +02:00
Alexander Todorov
857d352325 ci: Comment out job which doesn't have any runners 2022-06-06 12:53:43 +02:00
Alexander Todorov
84d5bc7a22 ci: Disable 8.6-nightly and 9.0-nightly test jobs
to avoid interference with 8.6 and 9.0 GA builds/repos
2022-06-06 12:53:43 +02:00
Alexander Todorov
ee044a50bb COMPOSER-1576: Start building RPMs on 8.6 and 9.0 GA before we can test 2022-06-06 12:53:43 +02:00
Alexander Todorov
807804ba54 COMPOSER-1593: Retire the use of Fedora 34 in CI
we already use Fedora 35 anyway
2022-06-06 12:53:43 +02:00
Christian Kellner
a1306a122a distro/rhel90: remove skx_edac, intel_cstate from denylist again
In commit 5c1530e we disabled `skx_edac` and `intel_cstate` but 
after further consultation with Prarit Bhargava it was agreed that 
for RHEL 9 we should indeed allow them.
2022-06-06 08:07:26 +01:00
Sanne Raymaekers
968023f950 templates/composer: Map db secrets to maintenance container 2022-06-04 12:48:17 +02:00
Sanne Raymaekers
9b119fa4cf osbuild-service-maintenance: Delete results from select jobs
Instead of deleting records, delete the results from the manifest and
depsolve jobs. This redacts sensitive data which the manifest can
contain, and this conserves space.
2022-06-03 14:38:53 +02:00
Sanne Raymaekers
eeb2238b12 osbuild-service-maintenance: Split out db cleanup 2022-06-03 14:38:53 +02:00
Sanne Raymaekers
9bff4a4f0f dbjobqueue: Alter foreign key constraints
When deleting rows from the job table, make sure the delete is cascaded
to the dependencies and heartbeat tables.
2022-06-02 18:45:24 +02:00
Ygal Blum
feb357e538 Support Generic S3 upload in Composer API
Use case
--------
If Endpoint is not set and Region is - upload to AWS S3
If both the Endpoint and Region are set - upload the Generic S3 via Weldr API
If neither the Endpoint and Region are set - upload the Generic S3 via Composer API (use configuration)

jobimpl-osbuild
---------------
Add configuration fields for Generic S3 upload
Support S3 upload requests coming from Weldr or Composer API to either AWS or Generic S3
Weldr API for Generic S3 requires that all connection parameters but the credentials be passed in the API call
Composer API for Generic S3 requires that all conneciton parameters are taken from the configuration
Adjust to the consolidation in Target and UploadOptions

Target and UploadOptions
------------------------
Add the fields that were specific to the Generic S3 structures to the AWS S3 one
Remove the structures for Generic S3 and always use the AWS S3 ones

Worker Main
-----------
Add Endpoint, Region, Bucket, CABundle and SkipSSLVerification to the configuration structure
Pass the values to the Server

Weldr API
---------
Keep the generic.s3 provider name to maintain the API, but unmarshel into awsS3UploadSettings

tests - api.sh
--------------
Allow the caller to specifiy either AWS or Generic S3 upload targets for specific image types
Implement the pieces required for testing upload to a Generic S3 service
In some cases generalize the AWS S3 functions for reuse

GitLab CI
---------
Add test case for api.sh tests with edge-commit and generic S3
2022-06-02 16:12:53 +03:00
schutzbot
335c597452 Post release version bump
[skip ci]
2022-06-01 13:41:41 +00:00
Achilleas Koutsou
9fda1ff55f dnfjson: cache cleanup
Added CleanCache() method to the solver that deletes all the caches if
the total size grows above a certain (configurable) limit
(default: 500 MiB).

The function is called externally to handle errors (usually log or
ignore completely) and to avoid calling multiple times for multiple
depsolves of a single request.

The cleanup is extremely simple and is meant as a placeholder for more
sophisticated cache management.  The goal is to simply avoid ballooning
cache sizes that might cause issues for users or our own services.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
8b4607c94f gen-manifests: do not return workerName from makeManifestJob
The value doesn't represent the worker name, just the top-level cache
directory for a job.  It's useful for separating caches and making the
generation faster, but it's not necessary to return from the function.
2022-06-01 11:36:52 +01:00