Commit graph

140 commits

Author SHA1 Message Date
Chloe Kaubisch
873798514b prometheus: add tenant label
Include a tenant label for all prometheus metrics. Modify
jobstatus function in the worker accordingly to return channel
so it can be passed to prometheus.
2022-06-07 16:35:03 +02:00
Achilleas Koutsou
c8ce3e4428 worker: test depsolve job format compatibility
Test the conversion of the new and old DepsolveJob given the custom
marshaller.
The deserialised old format is not exactly the same as it would have
been before, but it is functionally equivalent, with the added benefit
of supporting depsolve jobs where we don't want base repositories to be
used by all depsolves.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
94c7fda779 worker: make DepsolveJob serialisation backwards compatible
Add custom marshaller for DepsolveJob that serialises the struct into a
format compatible with both the new and old formats.  The format on the
wire is a superset of both the new and old format and can be
deserialised into either while retaining all information.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
c092783a70 simplify package set chain handling
Move package set chain collation to the distro package and add
repositories to the package sets while returning the package sets from
their source, i.e., the ImageType.PackageSets() method.

This also removes the concept of "base repositories".  There are no
longer repositories that are added implicitly to all package sets but
instead each package set needs to specify *all* the repositories it will
be depsolved against.

This paves the way for the requirement we have for building RHEL 7
images with a RHEL 8 build root.  The build root package set has to be
depsolved against RHEL 8 repositories without any "base repos" included.
This is now possible since package sets and repositories are explicitly
associated from the start and there is no implicit global repository
set.

The change requires adding a list of PackageSet names to the core
rpmmd.RepoConfig.  In the cloud API, repositories that are limited to
specific package sets already contain the correct package set names and
these are now copied to the internal RepoConfig when converting types in
genRepoConfig().
The user-specified repositories are only associated with the payload
package sets like before.
2022-06-01 11:36:52 +01:00
Achilleas Koutsou
8a23a77c5b worker: add new error type for RepoError
dnf-json now returns a new error kind: RepoError
Add it to the list of known error types and handle it in the worker.
2022-06-01 11:36:52 +01:00
Tom Gundersen
4eeaebd40b prometheus/job: measure time spent pending rather than queued
We are interested in the time it takes from a job could be dequeued
until it is, but if a job has dependencies that are not yet finished, it
cannot be dequeued.

Change the logic to measure the time since the last dependency was
dequeued rather than when the job was queued.

The purpose of this metric is to have an alert fire in case we have too
few workers processing jobs.
2022-05-14 17:47:38 +01:00
Tom Gundersen
4621768c14 server/requestJob: record metrics last
This ensures that only if the dequeuing is successful are metrics recorded.
2022-05-14 17:47:38 +01:00
Tom Gundersen
ac642c3d70 server/requestJob: failing to read job status is fatal
Error out early in case reading a job status fails. The state would otherwise
be inconsistent if only some of the job statuses have been read out.
2022-05-14 17:47:38 +01:00
Tomas Hozza
0bf67dfad5 Stop setting the StreamOptimized option in Weldr and Cloud APIs
The VMDK image is already produced as stream-optimized. Therefore stop
setting the `StreamOptimized` option in `OSBuildJob` structure by both,
Weldr and Cloud APIs.

Keep the handling of the option in worker for backward compatibility,
in case an older instance of Composer server is used, which does not
produce VMDK manifests as stream-optimized. In such case, the worker
needs to convert the image.
2022-05-04 16:22:29 +02:00
Ondřej Budai
6fce34a5ea worker: add proxy support to composer and oauth calls
In the internal deployment, we want to talk with composer over a http/https
proxy. This proxy adds new composer.proxy field to the worker config that
causes the worker to connect to composer and the oauth server using
a specified proxy.

NB: The proxy is not supported when connection to composer via unix sockets.

For testing this, I added a small HTTP proxy implementation, pls don't
use this in production, it's just good enough for tests.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-03 06:19:31 +01:00
Ondřej Budai
9ee3997428 worker: use custom requester also for oauth refresh
Just so we can share e.g. proxy server or other http transport settings.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-03 06:19:31 +01:00
Ondřej Budai
71a4ceecaa worker/client: factor out common testing code
Just so we don't need to care about all the server-side setup in individual
test cases and we can just reuse the setup.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-03 06:19:31 +01:00
Ondřej Budai
b4d6ec5a75 worker/client: simplify the oauth test
Firstly, let's use t.TempDir(), it's less code.

Secondly, let's remove all the code that touches distributions, we can just
use random values, both worker server and client actually do't inspect
any values so they can be completely random.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-03 06:19:31 +01:00
Ondřej Budai
ed8bcd2f49 worker: move client test to its own file
This test actually verifies that the client code for OAuth works. As this was
the only code that tests client in the file, I think it deserves its own one.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-03 06:19:31 +01:00
Tomas Hozza
e819e08098 worker: extend the depsolve job to use DepsolvePackageSets()
Extend the `DepsolveJob` worker job argument to contain package sets
chains and use `DepsolvePackageSets()` for depsolving.
2022-04-28 14:42:49 +02:00
Gianluca Zuccarelli
e31fb36d65 cloudapi: add build job dependency checks
If an osbuild or koji-osbuild job has failed, add
a check to see if it is a result of the build jobs
dependencies and return the dependency failure job
error furthest up the chain of errors & add this
error to the details filed of the build job error.
2022-04-13 10:31:53 +02:00
Gianluca Zuccarelli
da94f2cbeb worker/server: build job dep errors
Add a helper function to query dependency
failures of osbuild & koji-osbuild jobs.
If a build job has a dependency error the
function will check for the job error of the
manifest job. If that also has a dependency
error the function will query the depsolve
job too for a job error.
2022-04-13 10:31:53 +02:00
Gianluca Zuccarelli
30d75d0e74 worker/clienterrors: depenency error check
Add a helper function to check for dependency
errors for job errors. This simply returns true
if a job error has a dependency error code and
false otherwise.
2022-04-13 10:31:53 +02:00
Gianluca Zuccarelli
b1969ba6a6 worker/clienterrors: omit details if empty
Omit the details field if it is null/empty.
2022-04-13 10:31:53 +02:00
Gianluca Zuccarelli
ab98c66b9f worker/server: fix manifest-id job status
The manifest by id job status type safe function
was failing due to the jobType check which was checking
for the wrong string.
2022-04-06 21:34:02 +01:00
Gianluca Zuccarelli
b75cf30a05 worker/server: remove duplicate function
The `ManifestJobStatus` and `ManifestByIdJobStatus` both
had identical functionality. The `ManifestByIdJobStatus`
is not being referenced anywhere in the codebase and so
this function has been removed.
2022-04-06 21:34:02 +01:00
Gianluca Zuccarelli
14b006d480 worker/clienterrors: add empty packagespec error
Add an error case for an empty package spec returned
by a depsolve job and mark this with a `4xx` status.
2022-04-06 21:34:02 +01:00
Gianluca Zuccarelli
cc7d555fb2 worker/errors: consider dep errors as 4xx status
All dependency errors, whether they are 4xx or 5xx,
are currently being considered as a 5xx error in parent
jobs. This is causing some of the build alerts to fire
off when a depsolve job has failed, for example, when
in reality, this is an expected result. This commit
ensures that dependency errors are being reported as
4xx status in monitoring.
2022-04-06 10:57:37 +02:00
Gianluca Zuccarelli
8241e1f948 worker/clienterrors: add empty manifest error
If a manifest is empty we should have a specific error
code for that case and treat it as a 4xx error since
this would be bad input for a build job
2022-04-06 10:57:37 +02:00
Eng Zer Jun
00ea3eb285 test: use T.TempDir to create temporary test directory
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-04-05 09:27:43 +02:00
Sanne Raymaekers
2023f7731d worker: Support client_credentials grant type in client
This will allow us to use the service accounts which work against
identity.api.openshift.com. These are much easier to manage, especially
with the new multi-tenancy, as there's a single page to create/expire
them across an account.

They also have the added benefit of not expiring automatically when
they're not used like offline tokens, and immediate expiration when
desired.
2022-03-21 09:43:43 +01:00
Sanne Raymaekers
8900bcec40 worker: Client lazy token refresh 2022-03-21 09:43:43 +01:00
Sanne Raymaekers
8a6d6ed6cf worker: Clean up worker client config 2022-03-21 09:43:43 +01:00
Sanne Raymaekers
318a4525c6 cmd/osbuild-worker: dnf-json returns MarkingErrors (plural) 2022-03-11 10:13:27 +01:00
Gianluca Zuccarelli
761aab6cac cloudapi/v2: add error object to ImageStatus
Add an error object to the ComposeStatus.ImageStatus.
The error object contains a human-readable error reason
and optional details in the case of an error.
2022-03-09 08:49:37 +00:00
Ondřej Budai
cfb756b9ba api/{cloud,worker}: used channel name based on JWT claims for new jobs
This commit implements multi-tenancy. A tenant is defined based on a value
from JWT claims. The key of this value must be specified in the configuration
file. This allows us to pick different values when using multiple SSOs.

Let me explain more in depth how this works:

Cloud API gets a new compose request. Firstly, it extracts a tenant name from
JWT claims. The considered claims are configured as an array in
cloud_api.jwt.tenant_provider_fields in composer's config file. The channel
name for all jobs belonging to this compose is created by `"org-" + tenant`.

Why is the channel prefixed by "org-"? To give us options in the future. I can
imagine the request having a channel override. This basically means that
multiple tenants can share a channel. A real use-case for this is multiple
Fedora projects sharing one pool of workers.

Why this commit adds a whole new cloud_api section to the config? Because the
current config is a mess and we should stop adding new stuff into the koji
section. As the Koji API is basically deprecated, we will need to remove it
soon nevertheless.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
c1dc58eba4 worker: NewServer: move config parameters to a new Config struct
We will have more parameters soon so let's make this prettier sooner rather
than later.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
7bfcee36f8 jobqueue: introduce the concept of channels
Channels are a concept similar to job types. Callers must specify a channel
name when queueing a new job. A list of channels is also specified when
dequeueing a job. The dequeued job's channel will always be from one of the
specified channel. Of course, the job types are also respected. The dequeued
job will also always be from one of the specified type.

Currently, all calls to jobqueue were changed so all queue operations use
an empty channel name and all dequeue operations use a list containing
an empty channel.

Thus, this is a non-functional change.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
17e809b5f7 worker: use default transport instead of "blank" one
This allows us to use the default behaviour of http.DefaultTransport
to honor HTTP_PROXY.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-02-21 14:46:49 +01:00
Achilleas Koutsou
82eedf5b82 DepsolveJob: rename struct field for consistency
We have two fields, `Repos` and `PackageSets`.  Renaming
`PackageSetsRepositories` to `PackageSetsRepos` for consistency.
The struct is for internal use only so the rename has no impact as long
as the serialised name is the same (json tag).

Also it's shorter.

Added docstring to the struct that explains the arguments in the same
way as they are described for the `depsolve()` function.

Changing the name of the argument in the internal `depsolve()` function
for the same reasons.
2022-02-14 17:38:41 +01:00
Gianluca Zuccarelli
1e443cf0fa worker: fix error status codes
The DNFDepsolveError and DNFMarking error should have
a `4xx` code instead of a `5xx` error code.
2022-02-04 19:30:25 +01:00
Gianluca Zuccarelli
290472dfdf metrics: add worker error metrics
This commit introduces the collection of error
metrics since it is now possible to differentiate
between internal errors and user input errors.
Additionally, the error status is reported for
job duration metrics.
2022-02-03 23:40:42 +00:00
Gianluca Zuccarelli
6c4caec022 metrics: move metrics to worker server
For simplicity, the collection of the job metrics
was carried out in the job queue. This was only
being done in the dbqueue and not in the fsqueue.
This pr refactors the metric collection and moves
the job metrics to the worker server, by adding a
wrapper function to enqueueing jobs so that the
metrics only have to be recorded in one place when
queueing a job.
2022-02-03 23:40:42 +00:00
Diaa Sami
7c52db1ae1 worker/api: align & improve error handlers 2022-02-02 11:15:20 +01:00
Tom Gundersen
c81d0d08ec cloudapi/v2: support logs/manifests endpoints
For now these only work for koji builds.
2022-02-01 20:28:40 +00:00
Tom Gundersen
b32ab36e1d worker/server: typesafe Job and JobStatus
Replace Job() and JobStatus() with typesafe versions, and introduce JobType()
for the rare instances where we don't know the type up front.

Additionally, catch a few more error cases:
 - if OSBuildResult is nil, then we failed to invoke osbuild
 - make sure the same JobResult handling is done for osbuild-koji, as for osbuild
2022-02-01 20:28:40 +00:00
Tom Gundersen
92c7fc2534 cloupapi/v2: add koji support
Extend the compose endpoints to have minimal koji support.

This is intended to replace the current koji API so that it
can be consumed through api.openshift.com.
2022-02-01 20:28:40 +00:00
Gianluca Zuccarelli
88b5529cc4 osbuild-worker: test error backwards compatability
Since the workers will use structured error messages
going forward, it is necessary to maintain backwards
compatability for there errors in composer. Tests have
been added to the various apis to ensure that each api
checks for both kinds of errors, old and new.
2022-01-27 16:45:14 +01:00
Gianluca Zuccarelli
cc981b887a osbuild-worker: implement structured errors
Implement the structured errors as defined by the worker client.
Every error for each of the job types now returns a structured
error with a reason and a specific error code.  This will make
it possible to differentiate between 4xx errors and 5xx errors.

This commit refactors the way errors are implemented in the workers,
but maintains backwards compatability in composer by checking for
both kinds of errors.
2022-01-27 16:45:14 +01:00
Gianluca Zuccarelli
daf24f8db3 worker: define worker errors
Define worker errors to give more structured
error messages. The error api is:
id: VALIDATION_ERROR_NUMBER, reason: STRING, details: { issues: [{...}, {...}] }

The api was agreed upon with osbuild so that,
in future, osbuild errors will share the same
structure
2022-01-27 16:45:14 +01:00
sanne
a83cf95d5b go.mod: Update oapi-codegen and kin-openapi 2022-01-12 11:35:06 +01:00
sanne
8406ada6f5 worker: Treat a non echo.HTTPError like a regular error 2021-12-17 13:13:05 +01:00
Diaa Sami
510d2ccac0 worker/server: pass more error details to handler 2021-12-16 11:58:41 +00:00
Diaa Sami
c1aeeeaf0e internal/worker: log internal details when available 2021-12-16 11:58:41 +00:00
Djebran Lezzoum
c93ea748a2 distro/depsolve/cloudapi: Add 3rd-party repository support.
Allow 3rd-party repositories to be supported and custom packages installed.
Fixes #COMPOSER-1273
2021-12-15 20:12:49 +01:00