Commit graph

133 commits

Author SHA1 Message Date
Brian C. Lane
56fc58cca3 cloudapi: Add DeleteCompose to delete a job by UUID
This adds the handler for DELETE /composes/{id} which will delete a job and
all of its dependencies, and any artifacts.

Related: RHEL-60120
2025-06-05 10:32:56 +02:00
Brian C. Lane
f1a2c24563 worker: Add CleanupArtifacts function
This removes all artifact directories, and their contents, if there
isn't an associated Job. This is used to clean up local artifacts after
the compose job has been deleted.

Related: RHEL-60120
2025-06-05 10:32:56 +02:00
Brian C. Lane
5cddc4223d dbjobqueue: Add AllRootJobIDs implementation
Related: RHEL-60120
2025-06-05 10:32:56 +02:00
Sanne Raymaekers
17416bf60b worker: adapt to new oapi-codegen 2025-03-26 11:13:14 +01:00
Brian C. Lane
d8e9a86921 cloudapi: save and return compose request details
The original compose request contains useful details that are not
preserved when it is converted to a manifest. Things like the
distribution, arch, image type, blueprint or customizations are useful
when examining builds later.

This saves the original request json using the job id and a new
directory (ComposeRequest) under the artifacts directory. The original
request, if present, is then added to the compose/<id>/metadata response
alongside the package list.

Related: RHEL-60120
2025-03-05 12:36:36 +01:00
Brian C. Lane
199a3d31f8 worker: Expose the ArtifactsDir path
This will help make it easier to write the original compose request json
to the same directory tree.

Related: RHEL-60120
2025-03-05 12:36:36 +01:00
Brian C. Lane
3781820b45 server: Return the path alongside the error from JobArtifactLocation
Even though the file isn't there it can be useful to have the full path
that it was checking for.

Related: RHEL-60142
2025-02-11 16:09:27 +01:00
Brian C. Lane
325a018c75 cloudapi: Return the compose status of all root compose jobs
This returns the status using the same structure as it does for
requesting individual statuses for the jobs.

Related: RHEL-60120
2025-02-03 17:27:31 -08:00
Brian C. Lane
a613e8cb37 DepsolveJobResult: Remove unused Error and ErrorType
These fields are not set by the depsolve job, they are only set and used
in tests so remove them. Errors are reported in the result.JobError

Related: Related: RHEL-60125
2025-01-30 08:00:12 +01:00
Brian C. Lane
bd55670dd9 worker: Add worker server support for Search job
This adds support for sending a search job to the worker client,
gathering results, and handling errors.

The errors returned are the same as for the Depsolve job, since they
both use the osbuild-depsolve-dnf script via images/pkg/dnfjson.

Related: RHEL-60136
2025-01-30 08:00:12 +01:00
Brian C. Lane
d8df7e7cd4 worker: Add search job implementation to worker client
This is similar to the depsolve job, and it shares the solver (which
supports locking, as does DNF itself). This will allow searching for
specific package names, names with globs, or names as substrings of
other names using * as the wildcard.

Related: RHEL-60136
2025-01-30 08:00:12 +01:00
Sanne Raymaekers
7bfcac30dd cloudapi: support worker server target artifact retrieval
In order to get the artifact location from the cloudapi, add a helper
function in the worker server.
2025-01-24 15:26:15 +01:00
Florian Schüller
d3e3474fb7 internal/worker/server: return an error on depsolve timeout HMS-2989
Fixes the special case that if no worker is available and we
generate an internal timeout and cancel the depsolve including all
followup jobs, no error was propagated.
2024-11-19 13:55:38 +01:00
Sanne Raymaekers
2eb3c9f44c worker/server: add tests for job heartbeats 2024-11-07 17:18:48 +01:00
Sanne Raymaekers
a971f9340b worker/server: update metrics on requeue
When requeuing a job the next worker requesting the job would decrement
pending counter, but the pending counter only ever got incremented once,
when the job was first enqueued. Thus make sure to increment the pending
counter when a job is requeued.
2024-11-07 17:18:48 +01:00
Michael Vogt
573b349f16 clienterrors: rename WorkerClientError to clienterrors.New
The usual convention to create new object is to prefix `New*` so
this commit renames the `WorkerClientError`. Initially I thought
it would be `NewWorkerClientError()` but looking at the package
prefix it seems unneeded, i.e. `clienterrors.New()` already
provides enough context it seems and it's the only error we
construct.

We could consider renaming it to `clienterror` (singular) too
but that could be a followup.

I would also like to make `clienterror.Error` implement the
`error` interface but that should be a followup to make this
(mechanical) rename trivial to review.
2024-07-31 17:04:58 +02:00
Sanne Raymaekers
4bb61da37e Revert "prometheus: active worker gauge"
This reverts commit 68bc8e0c88.
2024-06-12 17:20:01 +02:00
Sanne Raymaekers
68bc8e0c88 prometheus: active worker gauge 2024-04-19 14:32:07 +02:00
Sanne Raymaekers
1b4935c325 jobqueue: add channel to workers
Stores the channel alongside the worker.
2024-04-19 14:32:07 +02:00
Sanne Raymaekers
e24772dc57 worker/server: check if worker is available for architecture 2023-12-14 21:25:32 +01:00
Sanne Raymaekers
850e44589b worker/server: split out jobqueue call from PostWorker handler 2023-12-14 21:25:32 +01:00
Sanne Raymaekers
ac854b7cc8 pkg/jobqueue: add arch to worker 2023-12-14 21:25:32 +01:00
Sanne Raymaekers
794acd8e34 worker: add ability to track workers serverside
Unresponsive workers (>=1 hour of no status update) are cleaned up.

Several things are enabled by keeping track of workers, in future the
worker server could:
- keep track of how many workers are active
- see if a worker for a specific architecture is available
2023-12-06 17:22:36 +01:00
Sanne Raymaekers
64e9f1a2c7 worker: don't log job not pending dequeue errors
This happens a lot when requesting a job by ID, which happens for the
manifest jobs.
2023-10-02 23:37:26 +01:00
Sanne Raymaekers
6040c10e10 worker/v1: rearrange middlewares
The duration middleware should come after the tenant channel middleware,
otherwise the tenant in the context will be empty. The status middleware
can come beforehand because it queries the request context right before
sending a response.
2023-06-29 16:41:36 +02:00
Sanne Raymaekers
2837b2a3ad prometheus: split off request timing information into separate mw
Tracks the worker api in addition to the composer api.
2023-06-28 15:08:37 +02:00
Sanne Raymaekers
9594156baf internal/worker: use TenantChannelMiddleware 2023-06-28 15:08:37 +02:00
Sanne Raymaekers
9dc0881247 internal/worker: log dequeue failures 2023-04-14 12:12:41 +02:00
Gianluca Zuccarelli
c056db4811 worker/server: add file resolver job
Add a file resolver job to the worker server in
order for us to resolve the contents of a remote
file.
2023-03-16 09:55:39 +00:00
Gianluca Zuccarelli
98d611d34f worker/server: fix container resolver job error
The container job resolve job error message was printing
the wrong error type to the error string.
2023-03-16 09:55:39 +00:00
Sanne Raymaekers
a25e0f4adb prometheus:: add arch label to dequeue metrics
Only add the arch label for osbuild job types, as the finish metrics
behave similarly. Having arch labels on dequeue metrics for any other
job type (but not on the finish metrics) would produce weird results.
2023-03-09 18:47:57 +01:00
Gianluca Zuccarelli
08aa1e99a1 worker/server: log unresponsive job removal
Re-add the logging for when unresponsive heartbeats
are being removed so we can verify that they
are correctly being logged as 5xx errors.
2023-01-10 09:29:33 +01:00
Gianluca Zuccarelli
113cda7d39 internal/worker: register status middleware
Register the custom middleware function to the worker
server. This function is responsible for recording all
the status codes of all the server's endpoints.

Due to a bug with echo/v4, a request to an endpoint using
the incorrect method should return a `405` error but returns
a `404` error instead when a middleware function is registered.
The worker `server_test` has been updated to reflect this.
2022-11-30 11:14:29 +01:00
Tom Gundersen
626530818d worker/server: requeue unresponsive jobs
If a job is unresponsive the worker has most likely crashed or been shut
down and the in-progress job been lost.

Instead of failing these jobs, requeue them up to two times. Once a job is lost
a third time it fails. This avoids infinite loops.

This is implemented by extending FinishJob to RequeuOrFinish job. It takes a
max number of requeues as an argument, and if that is 0, it has the same
behavior as FinishJob used to have.

If the maximum number of requeues has not yet been reached, then the running
job is returned to pending state to be picked up again.
2022-11-02 15:26:00 +01:00
Sanne Raymaekers
ebeb339f96 osbuild-worker: add ostree resolve job
This job resolves an ostree ref. Similar to the depsolve and container
resolve jobs, this should be a dependency of a manifest job.
2022-10-19 18:14:10 +02:00
Sanne Raymaekers
599829a3b8 worker: Return dependent jobs in OsbuildJobStatus 2022-08-30 16:14:52 +02:00
Sanne Raymaekers
0fe3f1b2ae jobqueue: Query job dependents 2022-08-30 16:14:52 +02:00
Sanne Raymaekers
099b34b301 worker: Define new jobs to handle copying and resharing of images
The copy job copies from one region to another. It does not preserve the
sharing on the ami and it's snapshot, that needs to be queued
separately.
2022-08-30 16:14:52 +02:00
Christian Kellner
388154d7f6 cloudapi: support container embedding
Add support for embedding container images via the cloud API. For
this the container resolve job was plumbed into the cloud api's
handler and the API specification updated with a new `containers`
section that mimics the blueprint section with the same name.
2022-08-04 14:37:12 +02:00
Sanne Raymaekers
111feda1f5 worker: Remove ellipsis operator from clienterrors.Error
The ellipsis operator was used as a hack to not need to pass any details
as an argument, but it makes what the end object will actually look like
less obvious. It also makes it impossible to pass an array to details
without getting a nested array.

Fixes #2874
2022-08-03 13:51:52 +02:00
Gianluca Zuccarelli
e5d9d2d045 worker/server: rename JobStatus() to JobInfo()
Since the `jobStatus` functions return a `JobInfo`
struct that contains the `JobStatus`, it makes sense
to rename the function names for the sake of consistency.
2022-07-27 13:37:14 +02:00
Gianluca Zuccarelli
95c8657f9e metrics: remove arch from osbuild type
The osbuild jobtype currently contains the
architecture as a suffix. Since the arch
is now being supplied as a label, the
`arch` suffix can be removed.
2022-07-27 13:37:14 +02:00
Gianluca Zuccarelli
967ac1c35e worker/server: job status struct
The number of return values from the `jobStatus`
function was growing and getting out of hand. Not
all return values were being used in all cases
and so returning a single struct with the information
and status of a job makes more sense. Then in each case
the resulting fields can be used as needed.
2022-07-27 13:37:14 +02:00
Gianluca Zuccarelli
9f4e765657 metrics: build jobs arch label
Add the architecture label to build jobs
which will enable filtering and monitoring
build jobs by architecture. Build job results
contain the `arch` field in the results struct,
this is then used to pass to the metrics, where
there is a value, otherwise it is set to an
empty string.
2022-07-27 13:37:14 +02:00
Gianluca Zuccarelli
8b4aff3857 worker/server: remove duplicate metrics
Remove a duplicate call to the `DequeueJobMetrics`
function in the worker server. This duplicate call
resulted in negative numbers for pending jobs in
the prometheus metrics.
2022-07-27 13:37:14 +02:00
Christian Kellner
50e630a76f worker: add new container resolve job type
This is a new job that can be used to resolve containers. It uses
the existing `container.Resolver` class to do the actual work.
2022-07-25 21:21:44 +02:00
Ondřej Budai
e779562f3c worker: remove osbuild-koji job
Koji API removed by the previous commit was the last user of osbuild-koji job.
Let's remove it since nothing uses it. This also removes all of the
compatibility code in Cloud API, see concerns below:

Compatibility concerns:
- the internal deployment was moved to a completely different composer
  instance, thus there are no old jobs
- Fedora deployment is still unused in prod, thus we don't care about keeping
  backward compatibility of the old jobs

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-07-19 16:00:52 +02:00
Sanne Raymaekers
03b57f002c jobqueue: Move jobqueue out of internal 2022-07-04 15:37:28 +02:00
Tomas Hozza
6dcadc9d20 worker/osbuild: move target errors to detail of job error
Add a new worker client error type `ErrorTargetError` representing that
at least one of job targets failed. The actual target errors are added
to the job detail.

Add a new `OSBuildJobResult.TargetErrors()` method for gathering a slice
of target errors contained within an `OSBuildJobResult` instance. Cover
the method with unit test.
2022-07-01 18:55:01 +01:00
Tomas Hozza
59ded68457 worker: delete TargetErrors from OSBuildJobResult
The `TargetErrors` is not used any more since PR#2192 [1] and there is
no need to keep the backward compatibility any more, because there are
no composer / worker instances in production, which are not running the
modified code.

In addition, delete unit tests covering this legacy error handling.

[1] https://github.com/osbuild/osbuild-composer/pull/2192
2022-07-01 18:55:01 +01:00