The store is responsible for two things: user state and the compose queue. This
is problematic, because the rcm API has slightly different semantics from weldr
and only used the queue part of the store. Also, the store is simply too
complex.
This commit splits the queue part out, using the new jobqueue package in both
the weldr and the rcm package. The queue is saved to a new directory `queue/`.
The weldr package now also has access to a worker server to enqueue and list
jobs. Its store continues to track composes, but the `QueueStatus` for each
compose (and image build) is deprecated. The field in `ImageBuild` is kept for
backwards compatibility for composes which finished before this change, but a
lot of code dealing with it in package compose is dropped.
store.PushCompose() is degraded to storing a new compose. It should probably be
renamed in the future. store.PopJob() is removed.
Job ids are now independent of compose ids. Because of that, the local
target gains ComposeId and ImageBuildId fields, because a worker cannot
infer those from a job anymore. This also necessitates a change in the
worker API: the job routes are changed to expect that instead of a
(compose id, image build id) pair. The route that accepts built images
keeps that pair, because it reports the image back to weldr.
worker.Server() now interacts with a job queue instead of the store. It gains
public functions that allow enqueuing an osbuild job and getting its status,
because only it knows about the specific argument and result types in the job
queue (OSBuildJob and OSBuildJobResult). One oddity remains: it needs to report
an uploaded image to weldr. Do this with a function that's passed in for now,
so that the dependency to the store can be dropped completely.
The rcm API drops its dependencies to package blueprint and store, because it
too interacts only with the worker server now.
Fixes#342
The commit 2435163f broke sending the logs to osbuild-composer. This was
partly because of unusual error handling in the RunOSBuild function.
This commit fixes that by creating a custom error and properly propagating
the result from it.
This package does not contain an actual queue, but a server and client
implementation for osbuild's worker API. Name it accordingly.
The queue is in package `store` right now, but about to be split off.
This rename makes the `jobqueue` name free for that effort.
This makes the jobqueue package independent of forking osbuild, the
choices for which (exact invocation, location of the cache directory)
should be made in the worker.
We require passing the address from the unit file. Do the same for the
socket, using host:port syntax.
Overriding the port was broken before, because we unconditionally
appended ":8700" to every address.
Introduce a mandatory argument `address`, which is interpreted as a path
to a unix socket when `-unix` is given or a network address otherwise.
Move the default path to the service file.
Add a more useful usage message when passing `-help` or no arguments.
This moves the client code into the same package as the server code,
which makes it easier to change (and version) the two in sync. Also, it
will allow to make some structs private to the jobqueue package and to
test `Client`.
Also rename it to jobqueue.Client.
Use the default dialing functions for tcp connections and set the tls
config on the transport directly. This makes the code easier to follow,
because the only special case is overriding the DialContext() for unix
connections.
According to the new guidelines in docs/errors.md.
Note that this does not include code that marshals to a writer that
might fail (when a connection drops, for example).
There's a usecase for running workers at a different machine than
the composer. For example when there's need for making images for
architecture different then the composer is running at. Although osbuild has
some kind of support for cross-architecture builds, we still consider it
as experimental, not-yet-production-ready feature.
This commit adds a support to composer and worker to communicate using TCP.
To ensure safe communication through the wild worlds of Internet, TLS is not
only supported but even required when using TCP. Both server and client
TLS authentication are required. This means both sides must have their own
private key/certificate pair and both certificates must be signed using one
certificate authority. Examples how to generate all this fancy crypto stuff
can be found in Makefile.
Changes on the composer side:
When osbuild-remote-worker.socket is started before osbuild-composer.service,
osbuild-composer also serves jobqueue API on this socket. The unix domain
socket is not affected by this changes - it is enabled at all times
independently on the remote one. The osbuild-remote-worker.socket listens
by default on TCP port 8700.
When running the composer with remote worker socket enabled, the following
files are required:
- /etc/osbuild-composer/ca-crt.pem (CA certificate)
- /etc/osbuild-composer/composer-key.pem (composer private key)
- /etc/osbuild-composer/composer-crt.pem (composer certificate)
Changes on the worker side:
osbuild-worker has now --remote argument taking the address to a composer
instance. When present, the worker will try to establish TLS secured TCP
connection with the composer. When not present, the worker will use
the unix domain socket method. The unit template file osbuild-remote-worker
was added to simplify the spawning of workers. For example
systemctl start osbuild-remote-worker@example.com
starts a worker which will attempt to connect to the composer instance
running on the address example.com.
When running the worker with --remote argument, the following files are
required:
- /etc/osbuild-composer/ca-crt.pem (CA certificate)
- /etc/osbuild-composer/worker-key.pem (worker private key)
- /etc/osbuild-composer/worker-crt.pem (worker certificate)
By default osbuild-composer.service will always spawn one local worker.
If you don't want it you need to mask the default worker unit by:
systemctl mask osbuild-worker@1.service
Closing remarks:
Remember that both composer and worker certificate must be signed by
the same CA!
Prior this commit local target copied the image from a worker to a composer
using cp(1) command. This prevented the local target to work on remote
workers.
This commit switches the local target implementation to using the jobqueue
API introduced in the previous commit. I had some concerns about speed
of this solution (imho nothing can beat pure cp(1) implementation) but
ad hoc sanity tests showed the copying of the image using the jobqueue API
when running the worker on the same machine as the composer is still
more or less instant.
Imho it makes more sense from REST perspective. Also, in the future there
will be ROUTE for uploading image to image build. As it's not a good idea
transport file inside JSON, all the parameters (compose id and image
build id) need to be inside the URL. Therefore for the sake of consistency,
all these routes should have compose id and image build id in the URL.
There is another solution to embedding multiple values inside http body
which allows file transport - multipart/form-data. I think using form-data
is worth when doing more complex stuff, for our usecase transporting all
the metadata in the URL is more appropriate solution.
Everything that this field contained can be computed in another way:
- path: just lookup the local target and read the path from there
- mime: can be derived from distribution and compose output type
- size: can be derived from the path
Therefore it imho doesn't make much sense to store these information multiple
times.
In the future remote workers will be introduced. Obviously, the remote worker
cannot support the local target. Unfortunately, the current implementation of
storing the osbuild result is dependant on it.
This commit moves the responsibility of storing osbuild result to the
composer process instead of the worker process. The result is transferred from
a worker to a composer using extended HTTP API.
Only panic on compile-time errors (e.g., built for unsupported
architecture). Otherwise, use log.Fatalf(), which is equivalent to
printing and exiting with return code 1. Only ever do this from
main(), in all other cases pass on the error object.
This is mostly relevant when the server disconects, in which case
we'll get EOF, and will now restart cleanly instead of panicing.
Signed-off-by: Tom Gundersen <teg@jklm.no>
As a part of f4991cb1 ComposeEntry struct was removed from store package.
This change made sense because this struct is connected more with API than
with store - store uses its own Compose struct. In addition, converters
between Compose and ComposeEntry were added. Unfortunately, ComposeEntry
contains ImageSize which was not stored in Compose but retrieved from store
using GetImage method. This made those converters dependent on the store,
which was messy.
To solve this issue this commit adds image struct into Compose struct.
The content of image struct is generated on the worker side - when the worker
sets the compose status to FINISHED, it also sends Image struct with detailed
information about the result.
Return the error code of the osbuild run, and an array of errors,
one for each target provided. If a target fails, all other targets
are still attempted.
If either osbuild or one of the targets retursn an error, the worker
notifies osbuild-composer that the job failed.
Signed-off-by: Tom Gundersen <teg@jklm.no>
The main purpose of this is to share the structs between the server
and the client, and let the compiler ensure that our marshaling and
unmarshaling matches.
In the future we also want to make it easier to write unittests for
this code.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Let the store in weldr be the only one that keeps state, and push
updates directly there. This fixes a bug where there was an ID mismatch.
Change the API to not let the caller pick the UUID, but provide it
in the response. Use the same UUID as is used to identify composes,
this makes it simpler to trace what is going on.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Use the exact same status strings as is used in the API,
making it clearer that they are the same (and avoiding any
translation). Remember the creation/start/finish timestamps.
And store the output type.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Go doesn't really do variants, so we must somehow emulate it. The
json objects we use are essentially tagged unions, with a `name`
field in reverse domain name notation identifying the type and a
type specific 'options' object.
In Go we represent this by having an BarOptions interface, which
implements a private method `isBarOptions()`, making sure that only
types in the same package are able to implement it. Each type FooBar
that should belong to the variant implements the interface, and a
constructor `NewFooBar(options *FooBarOptions) *Bar` that makes sure
the `name` field is set correctly.
This would be enough to represent our types and marshal them into
JSON, but unmarshalling would not work (json does not know about
our tags, so would not know what concrete types to demarshal to).
We therefore must also implement the Unmarshall interface for Bar,
to select the right types for the Options field.
We implement his logic for Target, Stage and Assembler. A handful
of concrete types are also implemented, matching what osbuild
supports.
Signed-off-by: Tom Gundersen <teg@jklm.no>