Commit graph

66 commits

Author SHA1 Message Date
Ondřej Budai
cfb756b9ba api/{cloud,worker}: used channel name based on JWT claims for new jobs
This commit implements multi-tenancy. A tenant is defined based on a value
from JWT claims. The key of this value must be specified in the configuration
file. This allows us to pick different values when using multiple SSOs.

Let me explain more in depth how this works:

Cloud API gets a new compose request. Firstly, it extracts a tenant name from
JWT claims. The considered claims are configured as an array in
cloud_api.jwt.tenant_provider_fields in composer's config file. The channel
name for all jobs belonging to this compose is created by `"org-" + tenant`.

Why is the channel prefixed by "org-"? To give us options in the future. I can
imagine the request having a channel override. This basically means that
multiple tenants can share a channel. A real use-case for this is multiple
Fedora projects sharing one pool of workers.

Why this commit adds a whole new cloud_api section to the config? Because the
current config is a mess and we should stop adding new stuff into the koji
section. As the Koji API is basically deprecated, we will need to remove it
soon nevertheless.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
c1dc58eba4 worker: NewServer: move config parameters to a new Config struct
We will have more parameters soon so let's make this prettier sooner rather
than later.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
7bfcee36f8 jobqueue: introduce the concept of channels
Channels are a concept similar to job types. Callers must specify a channel
name when queueing a new job. A list of channels is also specified when
dequeueing a job. The dequeued job's channel will always be from one of the
specified channel. Of course, the job types are also respected. The dequeued
job will also always be from one of the specified type.

Currently, all calls to jobqueue were changed so all queue operations use
an empty channel name and all dequeue operations use a list containing
an empty channel.

Thus, this is a non-functional change.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Gianluca Zuccarelli
290472dfdf metrics: add worker error metrics
This commit introduces the collection of error
metrics since it is now possible to differentiate
between internal errors and user input errors.
Additionally, the error status is reported for
job duration metrics.
2022-02-03 23:40:42 +00:00
Gianluca Zuccarelli
6c4caec022 metrics: move metrics to worker server
For simplicity, the collection of the job metrics
was carried out in the job queue. This was only
being done in the dbqueue and not in the fsqueue.
This pr refactors the metric collection and moves
the job metrics to the worker server, by adding a
wrapper function to enqueueing jobs so that the
metrics only have to be recorded in one place when
queueing a job.
2022-02-03 23:40:42 +00:00
Tom Gundersen
c81d0d08ec cloudapi/v2: support logs/manifests endpoints
For now these only work for koji builds.
2022-02-01 20:28:40 +00:00
Tom Gundersen
b32ab36e1d worker/server: typesafe Job and JobStatus
Replace Job() and JobStatus() with typesafe versions, and introduce JobType()
for the rare instances where we don't know the type up front.

Additionally, catch a few more error cases:
 - if OSBuildResult is nil, then we failed to invoke osbuild
 - make sure the same JobResult handling is done for osbuild-koji, as for osbuild
2022-02-01 20:28:40 +00:00
Tom Gundersen
92c7fc2534 cloupapi/v2: add koji support
Extend the compose endpoints to have minimal koji support.

This is intended to replace the current koji API so that it
can be consumed through api.openshift.com.
2022-02-01 20:28:40 +00:00
Gianluca Zuccarelli
cc981b887a osbuild-worker: implement structured errors
Implement the structured errors as defined by the worker client.
Every error for each of the job types now returns a structured
error with a reason and a specific error code.  This will make
it possible to differentiate between 4xx errors and 5xx errors.

This commit refactors the way errors are implemented in the workers,
but maintains backwards compatability in composer by checking for
both kinds of errors.
2022-01-27 16:45:14 +01:00
Diaa Sami
510d2ccac0 worker/server: pass more error details to handler 2021-12-16 11:58:41 +00:00
Gianluca Zuccarelli
91f2457363 metrics: add prometheus namespaces
Make use of the prometheus namespace and subsystem
to give the metrics a consistent namespaces in openshift.
2021-11-19 22:48:25 +01:00
sanne
6757916c54 worker: Introduce manifest-id-only job
A job intended to run in composer itself, after which a dependant
osbuild job can parse the manifest from it's dynamic arguments.
2021-11-15 16:04:12 +01:00
sanne
d25ae71fef worker: Configurable timeout for RequestJob
This is backwards compatible, as long as the timeout is 0 (never
timeout), which is the default.

In case of the dbjobqueue the underlying timeout is due to
context.Canceled, context.DeadlineExceeded, or net.Error with Timeout()
true. For the fsjobqueue only the first two are considered.
2021-10-19 00:12:18 +01:00
Ondřej Budai
e904397fdb cloudapi/v2: Use worker to depsolve
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-10-11 13:16:51 +02:00
sanne
ce7ac9a756 worker: Make BasePath configurable 2021-10-11 09:52:21 +02:00
Diaa Sami
12ca5325d6 worker: Use Recover middleware to handle panics
recover from panics such as out-of-bounds array access & nil
pointer access, print a stack trace and return 5xx error
2021-10-06 17:04:52 +02:00
Diaa Sami
22f151df68 worker: Improve logging
Use logrus library for logging
Use appropriate log-level for different log statements
2021-10-06 17:04:52 +02:00
sanne
2f328b0e97 workers: Backwards compatible api.openshift.com spec compliance
The main changes are:
- Kind, Href, Id fields for every object returned
- Attach operationIds to each request, return it for errors
- Errors are predefined and queryable
2021-09-27 13:10:05 +01:00
sanne
7a0ea5b244 worker: Remove identity filter
Partially reverts "0ea31c39d5"
2021-09-04 02:48:52 +02:00
Chloe Kaubisch
4c800f29a7 worker: add metrics
use prometheus to gather metrics
2021-07-23 21:54:28 +02:00
sanne
4385c39d66 worker: Introduce heartbeats
An occupied worker checks about every 15 seconds if it's current job was
cancelled. Use this to introduce a heartbeat mechanism, where if
composer hasn't heard from the worker in 2 minutes, the job times out
and is set to fail.
2021-07-08 21:14:38 +01:00
sanne
0fcb44e617 worker: Move job tokens to the queue itself
This removes state from the worker server, as it no longer contains the
list of running jobs. Instead only the queue knows if jobs are running
or not.
2021-07-08 21:14:38 +01:00
sanne
8fa822c02e worker: Return basepath depending on route 2021-06-17 10:08:35 +02:00
sanne
0ea31c39d5 worker: Add identity filter and client oauth support 2021-06-17 10:08:35 +02:00
Achilleas Koutsou
668fb003ef jobqueue: Replace JobArgs() with Job()
JobArgs() function replaced with more general Job() function that
returns all the parameters used to originally define a job during
Enqueue(). This new function enables access to the type of a job in the
queue, which wasn't available until now (except when Dequeueing).
2021-01-19 10:37:51 +01:00
Achilleas Koutsou
1c5f6810be worker: Add JobArgs() method
Wraps jobqueue.JobArgs() method and unmarshals data into the provided
concrete struct value. The struct should match the requested job type of
the given ID.
2021-01-16 13:39:30 +01:00
Ondřej Budai
e10a7f1ccc {koji,worker}/server: log errors returned from handlers
Previously, we had no clue what errors were catched by the default echo's
error handler. Thus, in the case of an error, we were basically blind. Let's
log all errors so we can investigate them later.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2020-12-02 08:52:27 +01:00
Ondřej Budai
978e309153 worker/server: move it to the style of koji server
The previous code was smelling a bit (e.g. Server.server field) so I decided
to rewrite it in the style of the much nicer koji server.

Not a functional change.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2020-11-19 17:39:24 +00:00
Tom Gundersen
c777a18df0 jobqueue: expose dependencies when querying status
The status of a job may depend on the status of its dependenices,
as we do not repeat for instance the failed state in each dependent
job.

Return also the list of dependencies so these can be queried too.
2020-11-11 18:16:42 +01:00
Tom Gundersen
98fd290a08 worker: make Enqueue() specific for each job type
Most of the worker API is now untyped, but keep Enqueu() typed to
ensure the job objects match the names in the queue. This means we
must add a version of Enqueue() for each job type we support.
2020-11-11 18:16:42 +01:00
Tom Gundersen
79f87ea347 worker/RequestJob: treat 'osbuild-koji' jobs like 'osbuild' ones
We must special-case the treatment of architecture, to select the
correct remote worker for any job that requires a specific
architecture. For now this means any jobs that run osbuild.
2020-11-11 18:16:42 +01:00
Tom Gundersen
a2895376ae worker: introduce dynamicArgs
In addition to the arguments passed when scheduling a job, a job now
also takes the results of its dependencies as additional arguments. We
call these dynamic arguments for the lack of a better term.

The immediate use-case for this is to allow koji jobs to be split up
as follows:
 - koji-init: Creates a koji build, and returns us a token.
 - osbuild-koji: one job per architecture, depending on koji-init
   having succeeded. Builds the image, and uploads it to koji,
   returning metadata about the image produced.
 - koji-finalize: uses the token from koji-init and the metadata
   from osbuild-koji to import the build into koji if it succeeded
   or mark it as failed if it failed.
2020-11-11 18:16:42 +01:00
Tom Gundersen
11d0da0b5c jobqueue/JobStatus: return result as json.RawMessage
Similarly to the recent changes to Dequeue(), let the caller unmarshal the
return JSON. This allows us to pass the result on without being able
to unmarshal it.

In follow-up patches, we will pass results of jobs to dependent jobs,
but the worker API does not know about the different job types, nor how
to unmarshal them.
2020-11-11 18:16:42 +01:00
Tom Gundersen
e277501ca3 jobqueue: return dependencies on dequeue
Once a job has been enqueued, there is no way to query its dependencies.

This makes dequeue more symmetric to enqueue by returning the
dependencies that were passed to enqueue, allowing the caller to
query the dependencies and their results.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-11-11 18:16:42 +01:00
Lars Karlitski
59e73a686a worker: generalize job types in the server
The worker server was heavily tied to OSBuildJob(Result). Untie it so
that it can deal with different job types in the future.

This necessitates a change in the jobqueue: Dequeue() now returns the
job type, as well as job arguments as json.RawMessage. This is so that
the server can wait on multiple job types with different argument
types.

The weldr, composer, and koji APIs continue to use only "osbuild" jobs.
2020-11-09 14:17:19 +01:00
Lars Karlitski
299a5e52ab worker: use OSBuildJobResult consistently
Workers reported status via an `osbuild.Result`, which only includes
osbuild output. Make it report OSBuildJobResult instead, which was meant
to be used for this purpose and is already used as the result type in
the jobqueue.

While at it, add any errors produced by targets into this struct, as
well as an overall success flag.

Note that this breaks older workers returning the result of an osbuild
job to a new composer. I think this is fine in this case, for two
reasons:

1. We don't support running different versions of the worker and
composer in the weldr API, and remote workers aren't widely used yet.

2. Both osbuild.Result and worker.OSBuildJobResult have a top-level
`Success` boolean. Thus, logs are lost in such cases, but the overall
status of the compose is not.
2020-11-09 14:17:19 +01:00
Lars Karlitski
0cd7174598 worker: deprecate the local target
Add "image_name" and "stream_optimized" fields to the osbuild job as
replacement for the local target options. The former signifies the name
of the uploaded artifact and whether an artifact should be uploaded at
all (only weldr API). The latter will be deprecated at some point, when
osbuild itself can make streamoptimized vmdk images.

This change separates what have always been two distinct concepts:
artifacts that are reported back to the composer node (in practice
always running on the same machine), and upload targets to clouds and
such. Separating them makes it easier to add job types that only allow
one upload target while keeping artifacts.

Keep the local target around, so that jobs that are scheduled can still
be run after an upgrade.
2020-11-09 14:17:19 +01:00
Lars Karlitski
d1f322ec6f worker: always send status "FINISHED"
The server hasn't used common.ImageBuildState to mark a job as
successful or failed for a long time. Instead, it's using the job's
return argument for that. (Jobs don't have a high-level concept of
failing).

Drop the check in the server, and always send "FINISHED" from the client
for backwards compatibility.
2020-11-09 14:17:19 +01:00
Lars Karlitski
5d2f2402cf worker: drop unused variable 2020-11-09 14:17:19 +01:00
Lars Karlitski
669b612d96 worker: remove State from JobStatus
This state is specific to weldr. Previous commits removed it from the
other APIs, because they use different values.

Move the conversion into the weldr API.
2020-11-09 14:17:19 +01:00
Lars Karlitski
a8ba969f6e worker: prefix all routes with /api/worker/v1
Mention this in the `servers` section of the openapi.yml (relative URLs
are allowed) too, even though our generator does not consider it.
2020-09-24 21:08:56 +01:00
Lars Karlitski
9008a1defc worker: require workers to pass their architecture
Jobs are scheduled with type "osbuild:{arch}", to ensure that workers
only get jobs with the right architecture assigned.
2020-09-23 14:28:52 +01:00
Lars Karlitski
44c2144994 worker: Server.RequestJob → RequestOSBuildJob
This clarifies what it does, at least until its use is expanded to other
job types.
2020-09-23 14:28:52 +01:00
Lars Karlitski
ba6a480e32 worker: require workers to declare job types they accept
For now, workers must send `[ "osbuild" ]`.
2020-09-23 14:28:52 +01:00
Lars Karlitski
d3c99b8e93 worker: allow passing different jobs to workers
Until now, all jobs were put as "osbuild" jobs into the job queue and
the worker API hard-coded sending an osbuild manifest and upload
targets.

Change the API to take a "type" and "args" keys, which are equivalent to
the job-queue's type and args. Workers continue to support only osbuild
jobs, but this makes other jobs possible in the future.
2020-09-23 14:28:52 +01:00
Ondřej Budai
03768e5f18 api/worker, koji: fix race condition when using multiple listeners
When remote worker socket was enabled, this was happening:

e := echo.New()

go func() {
    e.Listener = listener1
    e.Start("")
}()

e.Listener = listener2
e.Start("")

Yeah, this is a race condition. None of the echo's Start methods cannot safely
handle multiple listeners.

This commit fixes this issue by using Echo only as a router for standard
http.Server which handles multiple listeners in a non-racy way.
2020-09-23 09:38:29 +02:00
Lars Karlitski
85423d403c worker: clarify 404 error message
When a token doesn't exist, the URL is invalid. Nothing to do with a
job.
2020-09-11 14:23:24 +01:00
Lars Karlitski
3bedd25087 worker/api: send job id to worker after all
Full circle. After switching the worker to not operate on jobs directly,
send the id anyway, so that workers can print it in their logs.
2020-09-11 14:23:24 +01:00
Lars Karlitski
b03e1254e9 worker/api: remove token in favor of callback URLs
Instead of sending a `token` to workers, send back to URLs:

 1. "location": URL at which the job can be inspected (GET) and updated
    (PATCH).
 2. "artifact_location": URL at which artifacts should be uploaded to.

The actual URLs remain the same, but a client does not need to stitch
them together manually (except appending the artifact's name).

Unfortunately, the client code generated by `deepmap` does not lend
itself to this style of APIs. Use standard http.Client again, which is a
partial revert of 0962fbd30.
2020-09-11 14:23:24 +01:00
Lars Karlitski
ccebfb014c worker: echo.NewHTTPError() does not take format string
Use a plain string where possible and `fmt.Errorf` for internal server
errors (echo converts all non-HTTPError errors to 500s)
2020-09-11 14:23:24 +01:00