Commit graph

125 commits

Author SHA1 Message Date
Gianluca Zuccarelli
e31fb36d65 cloudapi: add build job dependency checks
If an osbuild or koji-osbuild job has failed, add
a check to see if it is a result of the build jobs
dependencies and return the dependency failure job
error furthest up the chain of errors & add this
error to the details filed of the build job error.
2022-04-13 10:31:53 +02:00
Gianluca Zuccarelli
da94f2cbeb worker/server: build job dep errors
Add a helper function to query dependency
failures of osbuild & koji-osbuild jobs.
If a build job has a dependency error the
function will check for the job error of the
manifest job. If that also has a dependency
error the function will query the depsolve
job too for a job error.
2022-04-13 10:31:53 +02:00
Gianluca Zuccarelli
30d75d0e74 worker/clienterrors: depenency error check
Add a helper function to check for dependency
errors for job errors. This simply returns true
if a job error has a dependency error code and
false otherwise.
2022-04-13 10:31:53 +02:00
Gianluca Zuccarelli
b1969ba6a6 worker/clienterrors: omit details if empty
Omit the details field if it is null/empty.
2022-04-13 10:31:53 +02:00
Gianluca Zuccarelli
ab98c66b9f worker/server: fix manifest-id job status
The manifest by id job status type safe function
was failing due to the jobType check which was checking
for the wrong string.
2022-04-06 21:34:02 +01:00
Gianluca Zuccarelli
b75cf30a05 worker/server: remove duplicate function
The `ManifestJobStatus` and `ManifestByIdJobStatus` both
had identical functionality. The `ManifestByIdJobStatus`
is not being referenced anywhere in the codebase and so
this function has been removed.
2022-04-06 21:34:02 +01:00
Gianluca Zuccarelli
14b006d480 worker/clienterrors: add empty packagespec error
Add an error case for an empty package spec returned
by a depsolve job and mark this with a `4xx` status.
2022-04-06 21:34:02 +01:00
Gianluca Zuccarelli
cc7d555fb2 worker/errors: consider dep errors as 4xx status
All dependency errors, whether they are 4xx or 5xx,
are currently being considered as a 5xx error in parent
jobs. This is causing some of the build alerts to fire
off when a depsolve job has failed, for example, when
in reality, this is an expected result. This commit
ensures that dependency errors are being reported as
4xx status in monitoring.
2022-04-06 10:57:37 +02:00
Gianluca Zuccarelli
8241e1f948 worker/clienterrors: add empty manifest error
If a manifest is empty we should have a specific error
code for that case and treat it as a 4xx error since
this would be bad input for a build job
2022-04-06 10:57:37 +02:00
Eng Zer Jun
00ea3eb285 test: use T.TempDir to create temporary test directory
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-04-05 09:27:43 +02:00
Sanne Raymaekers
2023f7731d worker: Support client_credentials grant type in client
This will allow us to use the service accounts which work against
identity.api.openshift.com. These are much easier to manage, especially
with the new multi-tenancy, as there's a single page to create/expire
them across an account.

They also have the added benefit of not expiring automatically when
they're not used like offline tokens, and immediate expiration when
desired.
2022-03-21 09:43:43 +01:00
Sanne Raymaekers
8900bcec40 worker: Client lazy token refresh 2022-03-21 09:43:43 +01:00
Sanne Raymaekers
8a6d6ed6cf worker: Clean up worker client config 2022-03-21 09:43:43 +01:00
Sanne Raymaekers
318a4525c6 cmd/osbuild-worker: dnf-json returns MarkingErrors (plural) 2022-03-11 10:13:27 +01:00
Gianluca Zuccarelli
761aab6cac cloudapi/v2: add error object to ImageStatus
Add an error object to the ComposeStatus.ImageStatus.
The error object contains a human-readable error reason
and optional details in the case of an error.
2022-03-09 08:49:37 +00:00
Ondřej Budai
cfb756b9ba api/{cloud,worker}: used channel name based on JWT claims for new jobs
This commit implements multi-tenancy. A tenant is defined based on a value
from JWT claims. The key of this value must be specified in the configuration
file. This allows us to pick different values when using multiple SSOs.

Let me explain more in depth how this works:

Cloud API gets a new compose request. Firstly, it extracts a tenant name from
JWT claims. The considered claims are configured as an array in
cloud_api.jwt.tenant_provider_fields in composer's config file. The channel
name for all jobs belonging to this compose is created by `"org-" + tenant`.

Why is the channel prefixed by "org-"? To give us options in the future. I can
imagine the request having a channel override. This basically means that
multiple tenants can share a channel. A real use-case for this is multiple
Fedora projects sharing one pool of workers.

Why this commit adds a whole new cloud_api section to the config? Because the
current config is a mess and we should stop adding new stuff into the koji
section. As the Koji API is basically deprecated, we will need to remove it
soon nevertheless.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
c1dc58eba4 worker: NewServer: move config parameters to a new Config struct
We will have more parameters soon so let's make this prettier sooner rather
than later.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
7bfcee36f8 jobqueue: introduce the concept of channels
Channels are a concept similar to job types. Callers must specify a channel
name when queueing a new job. A list of channels is also specified when
dequeueing a job. The dequeued job's channel will always be from one of the
specified channel. Of course, the job types are also respected. The dequeued
job will also always be from one of the specified type.

Currently, all calls to jobqueue were changed so all queue operations use
an empty channel name and all dequeue operations use a list containing
an empty channel.

Thus, this is a non-functional change.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
17e809b5f7 worker: use default transport instead of "blank" one
This allows us to use the default behaviour of http.DefaultTransport
to honor HTTP_PROXY.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-02-21 14:46:49 +01:00
Achilleas Koutsou
82eedf5b82 DepsolveJob: rename struct field for consistency
We have two fields, `Repos` and `PackageSets`.  Renaming
`PackageSetsRepositories` to `PackageSetsRepos` for consistency.
The struct is for internal use only so the rename has no impact as long
as the serialised name is the same (json tag).

Also it's shorter.

Added docstring to the struct that explains the arguments in the same
way as they are described for the `depsolve()` function.

Changing the name of the argument in the internal `depsolve()` function
for the same reasons.
2022-02-14 17:38:41 +01:00
Gianluca Zuccarelli
1e443cf0fa worker: fix error status codes
The DNFDepsolveError and DNFMarking error should have
a `4xx` code instead of a `5xx` error code.
2022-02-04 19:30:25 +01:00
Gianluca Zuccarelli
290472dfdf metrics: add worker error metrics
This commit introduces the collection of error
metrics since it is now possible to differentiate
between internal errors and user input errors.
Additionally, the error status is reported for
job duration metrics.
2022-02-03 23:40:42 +00:00
Gianluca Zuccarelli
6c4caec022 metrics: move metrics to worker server
For simplicity, the collection of the job metrics
was carried out in the job queue. This was only
being done in the dbqueue and not in the fsqueue.
This pr refactors the metric collection and moves
the job metrics to the worker server, by adding a
wrapper function to enqueueing jobs so that the
metrics only have to be recorded in one place when
queueing a job.
2022-02-03 23:40:42 +00:00
Diaa Sami
7c52db1ae1 worker/api: align & improve error handlers 2022-02-02 11:15:20 +01:00
Tom Gundersen
c81d0d08ec cloudapi/v2: support logs/manifests endpoints
For now these only work for koji builds.
2022-02-01 20:28:40 +00:00
Tom Gundersen
b32ab36e1d worker/server: typesafe Job and JobStatus
Replace Job() and JobStatus() with typesafe versions, and introduce JobType()
for the rare instances where we don't know the type up front.

Additionally, catch a few more error cases:
 - if OSBuildResult is nil, then we failed to invoke osbuild
 - make sure the same JobResult handling is done for osbuild-koji, as for osbuild
2022-02-01 20:28:40 +00:00
Tom Gundersen
92c7fc2534 cloupapi/v2: add koji support
Extend the compose endpoints to have minimal koji support.

This is intended to replace the current koji API so that it
can be consumed through api.openshift.com.
2022-02-01 20:28:40 +00:00
Gianluca Zuccarelli
88b5529cc4 osbuild-worker: test error backwards compatability
Since the workers will use structured error messages
going forward, it is necessary to maintain backwards
compatability for there errors in composer. Tests have
been added to the various apis to ensure that each api
checks for both kinds of errors, old and new.
2022-01-27 16:45:14 +01:00
Gianluca Zuccarelli
cc981b887a osbuild-worker: implement structured errors
Implement the structured errors as defined by the worker client.
Every error for each of the job types now returns a structured
error with a reason and a specific error code.  This will make
it possible to differentiate between 4xx errors and 5xx errors.

This commit refactors the way errors are implemented in the workers,
but maintains backwards compatability in composer by checking for
both kinds of errors.
2022-01-27 16:45:14 +01:00
Gianluca Zuccarelli
daf24f8db3 worker: define worker errors
Define worker errors to give more structured
error messages. The error api is:
id: VALIDATION_ERROR_NUMBER, reason: STRING, details: { issues: [{...}, {...}] }

The api was agreed upon with osbuild so that,
in future, osbuild errors will share the same
structure
2022-01-27 16:45:14 +01:00
sanne
a83cf95d5b go.mod: Update oapi-codegen and kin-openapi 2022-01-12 11:35:06 +01:00
sanne
8406ada6f5 worker: Treat a non echo.HTTPError like a regular error 2021-12-17 13:13:05 +01:00
Diaa Sami
510d2ccac0 worker/server: pass more error details to handler 2021-12-16 11:58:41 +00:00
Diaa Sami
c1aeeeaf0e internal/worker: log internal details when available 2021-12-16 11:58:41 +00:00
Djebran Lezzoum
c93ea748a2 distro/depsolve/cloudapi: Add 3rd-party repository support.
Allow 3rd-party repositories to be supported and custom packages installed.
Fixes #COMPOSER-1273
2021-12-15 20:12:49 +01:00
Gianluca Zuccarelli
91f2457363 metrics: add prometheus namespaces
Make use of the prometheus namespace and subsystem
to give the metrics a consistent namespaces in openshift.
2021-11-19 22:48:25 +01:00
sanne
028eca1b26 cloudapi/v2: Use manifest-id-only job
job dependencies:
depsolve -> manifest -> osbuild

This allows the compose handler to return the osbuild job id
immediately.
2021-11-18 10:26:17 +01:00
sanne
b075cac9e3 worker: Correct servers in openapi spec
Similar to other services on api.openshift.com, the full url should be
shown.
2021-11-16 10:30:58 +00:00
Achilleas Koutsou
778b2de3c0 worker: test mixed new and old jobs in jobqueue
Two new tests, one for OSBuild and one for Koji jobs. Both follow the
same flow:
- Enqueue a job that doesn't specify PipelineNames (oldJob)
- Enqueue a job that does specify PipelineNames (newJob)
- Read the job data for the oldJob and check that the default
  PipelineNames were added
- Read the job data for the newJob and check that it's unchanged
- Finish oldJob and add results without specifying PipelineNames
- Finish newJob and add results with PipelineNames
- Read the oldJob result and check that the default PipelineNames were
  added
- Read the newJob result and check that it's unchanged

This is meant to test several scenarios that can occur when upgrading
the service:
1. The existing jobqueue has old jobs in it that were queued before the
   PipelineNames were part of the data structure. The worker should be
   able to read these and add the fallback data.
2. New jobs are added while old jobs still exist in the queue and the
   worker can read both types.
3. The existing jobqueue has old finished jobs in it that were finished
   and had results written before the PipelineNames were part of the
   result data structure. The worker should be able to read these and
   add the fallback data.
4. New jobs are finished and results are written while old jobs still
   exist in the queue and the worker can read both result types.
2021-11-16 09:49:37 +01:00
Achilleas Koutsou
51870676cc worker/json: add fallback pipeline names when reading data
When worker reading data into the job and result types, check if the
PipelineNames are populated and, if not, add the fallback values from
distro.

This makes it simpler to work with job queues that contain old data
before the introduction of the PipelineNames. In any situation where the
job or result data are read, the reader can assume that the
PipelineNames are non-nil and that if they belong to an old job, they
have the fallback names.

This assumption goes hand-in-hand with the change in v2 format for
osbuild results, since old jobs that don't have PipelineNames set *must*
contain results in the old format for the names to be valid.
2021-11-16 09:49:37 +01:00
Achilleas Koutsou
143eb5cb91 worker: add PipelineNames to Job descriptions
The names of the pipelines that make up a Manifest for a job are
attached to the job data that is stored in the queue. The pipelines are
separated into Build and Payload.

This information is useful for identifying the build pipeline results
and metadata and for the order of the pipelines as they appeared in the
manifest.
2021-11-16 09:49:37 +01:00
Achilleas Koutsou
2004c71f89 cloudapi: use osbuild v2 result struct to extract metadata
Reading stage metadata using osbuild's v2 result format.
For RPM stages we only want the core (OS) RPMs (not the build root
RPMs). Skip the build pipeline by name, but this should be handled
better since names are arbitrary.

Using type switch to convert metadata types instead of relying on the
type string of the stage result.

The rpmmd helper function isn't used anymore since that requires two
conversion passes (osbuild.StageMetadata -> rpmmd.RPM ->
cloudapi.PackageMetadata).

Signed-off-by: Achilleas Koutsou <achilleas@koutsou.net>
2021-11-16 09:49:37 +01:00
sanne
6757916c54 worker: Introduce manifest-id-only job
A job intended to run in composer itself, after which a dependant
osbuild job can parse the manifest from it's dynamic arguments.
2021-11-15 16:04:12 +01:00
sanne
d25ae71fef worker: Configurable timeout for RequestJob
This is backwards compatible, as long as the timeout is 0 (never
timeout), which is the default.

In case of the dbjobqueue the underlying timeout is due to
context.Canceled, context.DeadlineExceeded, or net.Error with Timeout()
true. For the fsjobqueue only the first two are considered.
2021-10-19 00:12:18 +01:00
Ondřej Budai
e904397fdb cloudapi/v2: Use worker to depsolve
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-10-11 13:16:51 +02:00
Tom Gundersen
0f90aa9c78 worker: Add a depsolve job type
Allow depsolving to be done in a worker through the job queue rather
than synchronously in composer.

The benefit this might unlock include:
 - no more blocking calls in the cloud/koji APIs
 - only workers accessing repositoires
   - no VPN access from composer
   - composer not needing to be subscribed to CDN, etc
 - no dnf cache managment in composer

Potential problems:
 - the version of composer (so the distro definitions) that
   triggered a depsolve, may not be the same that uses the
   result to generate a manfiset

Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-10-11 13:16:51 +02:00
sanne
ce7ac9a756 worker: Make BasePath configurable 2021-10-11 09:52:21 +02:00
Diaa Sami
12ca5325d6 worker: Use Recover middleware to handle panics
recover from panics such as out-of-bounds array access & nil
pointer access, print a stack trace and return 5xx error
2021-10-06 17:04:52 +02:00
Diaa Sami
22f151df68 worker: Improve logging
Use logrus library for logging
Use appropriate log-level for different log statements
2021-10-06 17:04:52 +02:00
sanne
2f328b0e97 workers: Backwards compatible api.openshift.com spec compliance
The main changes are:
- Kind, Href, Id fields for every object returned
- Attach operationIds to each request, return it for errors
- Errors are predefined and queryable
2021-09-27 13:10:05 +01:00