osbuild reports failing builds in two ways: it sets the "success" field
in its output to `false` and it returns with a non-zero exit status. The
worker used both, returning an `OSBuildError` when osbuild return
non-zero, but also forwarding the resulting object with the "success"
field.
Change this to only use the "success" field and ignore the return value.
The latter is useful for people running osbuild in a terminal or script,
but is redundant for this use-case.
This makes error reporting more consistent: `RunOSBuild` only returns an
error when *running* osbuild failed, not when the build fails.
We have the same thing for AWS. The AWS target also specifies under what name
should be the image available in EC2.
As requested by Brew maintainers Tomáš Kopeček and Lubomír Sedlář.
osbuild runs directly on the host, there's no intermediate container,
therefore we should set the container type to none.
As suggested by Brew maintainers Tomáš Kopeček and Lubomír Sedlář.
Contrary to our assumption, we cannot initialize the build with the
link to the task. We can only update the link once the build has
completed.
This seems like a bug in koji, but we keep it like this for now.
Move to requiring CGInitBuild to be called before CGImport. In the
future we could make the former optional again, but for now we want to
allow the caller to have done CGInitBuild and for composer only to do
the CGImport using the passed in build_id and token.
Also rename and document some struct fields in the metadata struct to
make them more specific to our use-case and hopefully easier to read.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Introduce a target for Koji and hooked it up in the worker, so if koji
target is specified, the image is uploaded to koji.
[teg: use system kerberos config rather than reading from env]
Instead of sending a `token` to workers, send back to URLs:
1. "location": URL at which the job can be inspected (GET) and updated
(PATCH).
2. "artifact_location": URL at which artifacts should be uploaded to.
The actual URLs remain the same, but a client does not need to stitch
them together manually (except appending the artifact's name).
Unfortunately, the client code generated by `deepmap` does not lend
itself to this style of APIs. Use standard http.Client again, which is a
partial revert of 0962fbd30.
The job token will be deprecated in favor of URLs.
If a key is not set, use a new random UUID. Also, don't overwrite the
options struct with that new key.
Don't give out job ids to workers, but `tokens`, which serve as an
indirection. This way, restarting composer won't confuse it when a stray
worker returns a result for a job that was still running. Also,
artifacts are only moved to the final location once a job finishes.
This change breaks backwards compatibility, but we're not yet promising
a stable worker API to anyone.
This drops the transition tests in server_test.go. These don't make much
sense anymore, because there's only one allowed transition, from running
to finished. They heavily relied on job slot ids, which are not easily
accessible with the `TestRoute` API. Overall, adjusting this seemed like
too much work for their benefit.
In the same way `osbuild.Manifest` is the input to the osbuild API,
`osbuild.Result` is the output. Move it to the `osbuild` package where
it belongs.
This is not a functional change.
Signed-off-by: Tom Gundersen <teg@jklm.no>
vCenter requires images to be uploaded as vmdk StreamOptimized. Lorax
always produced images on this format, so we should make sure to do the
same for our VMWare images.
Allow LocalTarget to request the images produced by osbuild be converted
to be streamOptimized before saving in composer, and hook the weldr API
up to enable this option for vmdk images.
Ideally this should simply be an option in osbuild, but that would
require some more work, which we will not manage in time for RHEL8.3.
Therefore do this minimal fix.
Note that that means the images produced by our manifests (including in
our image-test test cases) are not on the format that the weldr API
returns, so the tests we run on them would also, for now, need to
convert before uploading to vCenter.
Signed-off-by: Tom Gundersen <teg@jklm.no>
composer uses the success field to decide whether a build succeeded or failed.
This is bad.
Unfortunately, fixing this requires kinda big code changes. This commit
changes the worker's behaviour to set the osbuild success flag to false
even on errors which weren't caused by osbuild (e.g. an upload error).
This is certainly hacky but I think it's still essential to tell the user
that an error occurred.
Fixes#789
We print messages to the log when the build fails, but we don't print
one when the build is successful.
I really want to celebrate our successes more often, so let's print a
success message when osbuild completes successfully. Since success
doesn't feel like success without an emoji, let's add one of those, too. 🎉
Signed-off-by: Major Hayden <major@redhat.com>
When osbuild crashes (e.g. when cp fails because of the machine running
out of disk space), it doesn't produce a machine-readable result. Due to
our suboptimal handling of the result struct (this is my fault), this can
lead to result == nil. However, composer expects that result != nil in all
cases because it uses the Success flag to assess the compose state. If
result == nil, it just crashes terribly.
Exit the whole worker process when a job was canceled, because osbuild
does not clean up all child processes when receiving SIGKILL.
Change the service to restart osbuild-worker also on success, and
decrease the restart timeout.
When osbuild-composer crashed, it left temporary directories in
`/var/cache`. Use `/var/tmp` for these output directories, because
systemd will clean these up (we set PrivateTmp=true).
Also, put the store into `/var/cache/osbuild-store`. The worker does not
checkpoint anything. The store is only used as a cache for rpms. That
can be shared between multiple workers and successive runs of a single
worker.
Until osbuild-14, the images were unconditionally kept in the cache,
meaning the cache could grow very large. Now only the downloaded RPMs
are saved, which greatly limits how big it can grow.
Having the RPMs cached should speed up all but the first image build a
lot, so we should take advantage of that by not flushing the cache
between each build.
The cache is still flushed when the worker is stopped / restarted.
This moves the cache from /var/tmp/osbulid-worker* to
/var/cache/osbulid-worker/osbulid-worker-*. This means that each worker
gets a dedicated cache, in case there are several on one machine. In the
future we may want to combine them and only ever have one cache, but for
that we need improvements in parallel access and cache-cleanup.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Rather than Manifest() returning an osbuild.Manifest object, introduce a
new distro.Manifest object which represents it as an opaque, JSON
serializable object. This new type has the following properties:
1) its serialization is compatible with the input to osbuild,
2) any valid osbuild input can be deserialized into it, and
3) marshalling and unmarshaling to and from JSON is lossless.
This means that even as we change the subset of valid osbulid manifests
that we support, we can still load any previous state from disk, and it
will continue to work just as before, even though we can no longer
deserialize it into our internal notion of osbuild.Manifest.
This fixes the underlying problem of which #685 was a symptom.
Signed-off-by: Tom Gundersen <teg@jklm.no>
When fdd753615 added `--output-directory` to the invocation of osbuild,
it also removed `--store`.
This was a mistake: osbuild's default store is `.osbuild`, which is not
what we want. Restore the old behavior of passing a temporary directory.
The `jobs/:job_id/builds/:build_id/image` route was awkward: the
`:jobid` was actually weldr's compose id and `:build_id` was always `0`.
Change it to `jobs/:job_id/artifacts/:name`, where `:job_id` is now a
job id, and `:name` is the name of the artifact to upload. In the
future, it could support uploading more than one artifact.
This allows removing outputs from `store`, which is now back to being a
pure JSON-store. Take care that `weldr` returns (and deletes) images
from the new (or for backwards compatibility, the old) location.
The `org.osbuild.local` target continues to exist as a marker for the
worker to know whether it should upload artifacts.
Since 2 releases `osbuild` accepts an `--output-directory=DIR` argument
which lets us decide where to place generated artifacts. Switch over to
it, rather than digging into the store, to make sure we will not access
the osbuild store when parallel cleanups are ongoing (which are not yet
a thing, though).
The store is responsible for two things: user state and the compose queue. This
is problematic, because the rcm API has slightly different semantics from weldr
and only used the queue part of the store. Also, the store is simply too
complex.
This commit splits the queue part out, using the new jobqueue package in both
the weldr and the rcm package. The queue is saved to a new directory `queue/`.
The weldr package now also has access to a worker server to enqueue and list
jobs. Its store continues to track composes, but the `QueueStatus` for each
compose (and image build) is deprecated. The field in `ImageBuild` is kept for
backwards compatibility for composes which finished before this change, but a
lot of code dealing with it in package compose is dropped.
store.PushCompose() is degraded to storing a new compose. It should probably be
renamed in the future. store.PopJob() is removed.
Job ids are now independent of compose ids. Because of that, the local
target gains ComposeId and ImageBuildId fields, because a worker cannot
infer those from a job anymore. This also necessitates a change in the
worker API: the job routes are changed to expect that instead of a
(compose id, image build id) pair. The route that accepts built images
keeps that pair, because it reports the image back to weldr.
worker.Server() now interacts with a job queue instead of the store. It gains
public functions that allow enqueuing an osbuild job and getting its status,
because only it knows about the specific argument and result types in the job
queue (OSBuildJob and OSBuildJobResult). One oddity remains: it needs to report
an uploaded image to weldr. Do this with a function that's passed in for now,
so that the dependency to the store can be dropped completely.
The rcm API drops its dependencies to package blueprint and store, because it
too interacts only with the worker server now.
Fixes#342
The commit 2435163f broke sending the logs to osbuild-composer. This was
partly because of unusual error handling in the RunOSBuild function.
This commit fixes that by creating a custom error and properly propagating
the result from it.
This package does not contain an actual queue, but a server and client
implementation for osbuild's worker API. Name it accordingly.
The queue is in package `store` right now, but about to be split off.
This rename makes the `jobqueue` name free for that effort.
This makes the jobqueue package independent of forking osbuild, the
choices for which (exact invocation, location of the cache directory)
should be made in the worker.
We require passing the address from the unit file. Do the same for the
socket, using host:port syntax.
Overriding the port was broken before, because we unconditionally
appended ":8700" to every address.
Introduce a mandatory argument `address`, which is interpreted as a path
to a unix socket when `-unix` is given or a network address otherwise.
Move the default path to the service file.
Add a more useful usage message when passing `-help` or no arguments.
This moves the client code into the same package as the server code,
which makes it easier to change (and version) the two in sync. Also, it
will allow to make some structs private to the jobqueue package and to
test `Client`.
Also rename it to jobqueue.Client.
Use the default dialing functions for tcp connections and set the tls
config on the transport directly. This makes the code easier to follow,
because the only special case is overriding the DialContext() for unix
connections.