Commit graph

278 commits

Author SHA1 Message Date
Lars Karlitski
4e46eacd64 worker: handle error when closing osbuild's stdin
This will only happen rarely, but it will be good to know it happened
when it does.
2020-11-09 14:17:19 +01:00
Lars Karlitski
89814c3107 worker: mark builds as failed based on osbuild's output
osbuild reports failing builds in two ways: it sets the "success" field
in its output to `false` and it returns with a non-zero exit status. The
worker used both, returning an `OSBuildError` when osbuild return
non-zero, but also forwarding the resulting object with the "success"
field.

Change this to only use the "success" field and ignore the return value.
The latter is useful for people running osbuild in a terminal or script,
but is redundant for this use-case.

This makes error reporting more consistent: `RunOSBuild` only returns an
error when *running* osbuild failed, not when the build fails.
2020-11-09 14:17:19 +01:00
Ondřej Budai
2dc0ecec73 koji: mark the osbuild version CGImport metadata as TODO
So we don't forget. Also, the version is changed to 0 so it's clear that
it is just a placeholder because osbuild 0 was never released.
2020-10-27 19:01:30 +00:00
Ondřej Budai
353a65356c koji: add signature to the CGImport metadata components
As suggested by Brew maintainers Tomáš Kopeček and Lubomír Sedlář.
2020-10-27 19:01:30 +00:00
Ondřej Budai
befeef34a5 koji: use nvra as the filename for images
We have the same thing for AWS. The AWS target also specifies under what name
should be the image available in EC2.

As requested by Brew maintainers Tomáš Kopeček and Lubomír Sedlář.
2020-10-27 19:01:30 +00:00
Ondřej Budai
b2ed59c385 koji: use none container arch in CGImport metadata
osbuild runs directly on the host, there's no intermediate container,
therefore we should set the container type to none.

As suggested by Brew maintainers Tomáš Kopeček and Lubomír Sedlář.
2020-10-27 19:01:30 +00:00
Ondřej Budai
a0832d22e0 koji: use the host arch as the buildroot and image arch in CGImport metadata
As suggested by brew maintainers Tomáš Kopeček and Lubomír Sedlář.
2020-10-27 19:01:30 +00:00
Ondřej Budai
c64d46416e koji: use the host name from /etc/redhat-release in CGImport metadata
As suggested by Brew maintainers Tomáš Kopeček and Lubomír Sedlář.
2020-10-27 19:01:30 +00:00
Ondřej Budai
b91a63c0ad koji: fix converting rpm stage metadata to koji components
This commit adds a missing pointer and a test to verify that the conversion
is indeed fixed.
2020-10-21 11:40:01 +02:00
Tom Gundersen
c6cf9de85d koji: add config files to configure kerberos settings
Kerberos keytabs and principals are configured per koji server both in
composer and in the worker.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-09-16 00:15:02 +01:00
Tom Gundersen
a97aac5846 worker/target/koji: mark builds correctly as failed
Otherwise we will leak builds and the NVR will not be possible to reuse.
2020-09-16 00:15:02 +01:00
Tom Gundersen
e52830f530 upload/koji: don't pass task_id to cg_init_build
Contrary to our assumption, we cannot initialize the build with the
link to the task. We can only update the link once the build has
completed.

This seems like a bug in koji, but we keep it like this for now.
2020-09-16 00:15:02 +01:00
Tom Gundersen
9a4c66db03 worker/target/koji: append RPM information
Include metadata about all RPMs in the build environment as well as in
the actual image.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-09-16 00:15:02 +01:00
Tom Gundersen
f446613d4a upload/koji: use CGInitBuild and clarify metadata structs
Move to requiring CGInitBuild to be called before CGImport. In the
future we could make the former optional again, but for now we want to
allow the caller to have done CGInitBuild and for composer only to do
the CGImport using the passed in build_id and token.

Also rename and document some struct fields in the metadata struct to
make them more specific to our use-case and hopefully easier to read.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-09-16 00:15:02 +01:00
Ondřej Budai
e7fbf4b660 upload/koji: add support for uploading to Koji
Introduce a target for Koji and hooked it up in the worker, so if koji
target is specified, the image is uploaded to koji.

[teg: use system kerberos config rather than reading from env]
2020-09-16 00:15:02 +01:00
Lars Karlitski
3bedd25087 worker/api: send job id to worker after all
Full circle. After switching the worker to not operate on jobs directly,
send the id anyway, so that workers can print it in their logs.
2020-09-11 14:23:24 +01:00
Lars Karlitski
b03e1254e9 worker/api: remove token in favor of callback URLs
Instead of sending a `token` to workers, send back to URLs:

 1. "location": URL at which the job can be inspected (GET) and updated
    (PATCH).
 2. "artifact_location": URL at which artifacts should be uploaded to.

The actual URLs remain the same, but a client does not need to stitch
them together manually (except appending the artifact's name).

Unfortunately, the client code generated by `deepmap` does not lend
itself to this style of APIs. Use standard http.Client again, which is a
partial revert of 0962fbd30.
2020-09-11 14:23:24 +01:00
Lars Karlitski
901d724622 osubild-worker: don't use job token as aws key
The job token will be deprecated in favor of URLs.

If a key is not set, use a new random UUID. Also, don't overwrite the
options struct with that new key.
2020-09-11 14:23:24 +01:00
Lars Karlitski
26b36ba704 worker/api: introduce job tokens
Don't give out job ids to workers, but `tokens`, which serve as an
indirection. This way, restarting composer won't confuse it when a stray
worker returns a result for a job that was still running. Also,
artifacts are only moved to the final location once a job finishes.

This change breaks backwards compatibility, but we're not yet promising
a stable worker API to anyone.

This drops the transition tests in server_test.go. These don't make much
sense anymore, because there's only one allowed transition, from running
to finished. They heavily relied on job slot ids, which are not easily
accessible with the `TestRoute` API. Overall, adjusting this seemed like
too much work for their benefit.
2020-09-11 14:23:24 +01:00
Lars Karlitski
b984fd33a8 worker: require full url to be passed to NewClient()
This lets us get of stitching URLs together with string concatenation in
favor of using package `url`.
2020-09-06 18:42:23 +01:00
Alexander Todorov
e7aa9c10c2 Move openAsStreamOptimizedVmdk() into importable package
so it can be used later within tests
2020-08-26 14:45:31 +02:00
Tom Gundersen
ac5f69e757 osbuild: move result serialization from common
In the same way `osbuild.Manifest` is the input to the osbuild API,
`osbuild.Result` is the output. Move it to the `osbuild` package where
it belongs.

This is not a functional change.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-08-26 12:12:37 +02:00
Tom Gundersen
b0cd29f78b worker: support returning returning images as StreamOptimized
vCenter requires images to be uploaded as vmdk StreamOptimized. Lorax
always produced images on this format, so we should make sure to do the
same for our VMWare images.

Allow LocalTarget to request the images produced by osbuild be converted
to be streamOptimized before saving in composer, and hook the weldr API
up to enable this option for vmdk images.

Ideally this should simply be an option in osbuild, but that would
require some more work, which we will not manage in time for RHEL8.3.
Therefore do this minimal fix.

Note that that means the images produced by our manifests (including in
our image-test test cases) are not on the format that the weldr API
returns, so the tests we run on them would also, for now, need to
convert before uploading to vCenter.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-08-23 14:45:27 +02:00
Ondřej Budai
fc2788340f worker: set the osbuild success to false even on non-osbuild errors
composer uses the success field to decide whether a build succeeded or failed.
This is bad.

Unfortunately, fixing this requires kinda big code changes. This commit
changes the worker's behaviour to set the osbuild success flag to false
even on errors which weren't caused by osbuild (e.g. an upload error).
This is certainly hacky but I think it's still essential to tell the user
that an error occurred.

Fixes #789
2020-06-29 10:21:24 +02:00
Ondřej Budai
297dbe2fc7 worker: move nil result check to more appropriate place
Result can be nil only when there's an error. Move the code to a place where
it makes more sense.
2020-06-29 10:21:24 +02:00
Major Hayden
067726e91d Print success log line after job is done
We print messages to the log when the build fails, but we don't print
one when the build is successful.

I really want to celebrate our successes more often, so let's print a
success message when osbuild completes successfully. Since success
doesn't feel like success without an emoji, let's add one of those, too. 🎉

Signed-off-by: Major Hayden <major@redhat.com>
2020-06-26 20:30:01 +02:00
Ondřej Budai
ab0a8057bf worker: ensure that the reported result is always non-nil
When osbuild crashes (e.g. when cp fails because of the machine running
out of disk space), it doesn't produce a machine-readable result. Due to
our suboptimal handling of the result struct (this is my fault), this can
lead to result == nil. However, composer expects that result != nil in all
cases because it uses the Success flag to assess the compose state. If
result == nil, it just crashes terribly.
2020-06-12 12:47:31 +02:00
Lars Karlitski
aa0c037bb2 osbuild-worker: support canceling jobs
Exit the whole worker process when a job was canceled, because osbuild
does not clean up all child processes when receiving SIGKILL.

Change the service to restart osbuild-worker also on success, and
decrease the restart timeout.
2020-06-12 10:00:50 +02:00
Lars Karlitski
33a4c55a6f osbuild-worker: don't use /var/cache for temporary directories
When osbuild-composer crashed, it left temporary directories in
`/var/cache`. Use `/var/tmp` for these output directories, because
systemd will clean these up (we set PrivateTmp=true).

Also, put the store into `/var/cache/osbuild-store`. The worker does not
checkpoint anything. The store is only used as a cache for rpms. That
can be shared between multiple workers and successive runs of a single
worker.
2020-06-12 00:13:37 +02:00
Tom Gundersen
6002a128b8 osbuild-worker: don't flush cache between jobs
Until osbuild-14, the images were unconditionally kept in the cache,
meaning the cache could grow very large. Now only the downloaded RPMs
are saved, which greatly limits how big it can grow.

Having the RPMs cached should speed up all but the first image build a
lot, so we should take advantage of that by not flushing the cache
between each build.

The cache is still flushed when the worker is stopped / restarted.

This moves the cache from /var/tmp/osbulid-worker* to
/var/cache/osbulid-worker/osbulid-worker-*. This means that each worker
gets a dedicated cache, in case there are several on one machine. In the
future we may want to combine them and only ever have one cache, but for
that we need improvements in parallel access and cache-cleanup.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-06-07 19:22:52 +02:00
Tom Gundersen
0417c6d8bb distro: make the osbuild package internal to the distros
Rather than Manifest() returning an osbuild.Manifest object, introduce a
new distro.Manifest object which represents it as an opaque, JSON
serializable object. This new type has the following properties:

1) its serialization is compatible with the input to osbuild,
2) any valid osbuild input can be deserialized into it, and
3) marshalling and unmarshaling to and from JSON is lossless.

This means that even as we change the subset of valid osbulid manifests
that we support, we can still load any previous state from disk, and it
will continue to work just as before, even though we can no longer
deserialize it into our internal notion of osbuild.Manifest.

This fixes the underlying problem of which #685 was a symptom.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-06-03 00:30:01 +02:00
Lars Karlitski
e503b0a4d4 worker: pass a temporary store to osbuild
When fdd753615 added `--output-directory` to the invocation of osbuild,
it also removed `--store`.

This was a mistake: osbuild's default store is `.osbuild`, which is not
what we want. Restore the old behavior of passing a temporary directory.
2020-05-26 22:16:52 +02:00
Lars Karlitski
b065d8c304 worker: handle error in defer
There's not much to do when removing the temporary directory fails.
Print a message so that people have the chance to notice.
2020-05-26 22:16:52 +02:00
Lars Karlitski
a1cf3984dc worker: introduce job artifact directory
The `jobs/:job_id/builds/:build_id/image` route was awkward: the
`:jobid` was actually weldr's compose id and `:build_id` was always `0`.

Change it to `jobs/:job_id/artifacts/:name`, where `:job_id` is now a
job id, and `:name` is the name of the artifact to upload. In the
future, it could support uploading more than one artifact.

This allows removing outputs from `store`, which is now back to being a
pure JSON-store. Take care that `weldr` returns (and deletes) images
from the new (or for backwards compatibility, the old) location.

The `org.osbuild.local` target continues to exist as a marker for the
worker to know whether it should upload artifacts.
2020-05-26 10:42:20 +02:00
David Rheinsberg
fdd7536152 worker: switch to --output-directory=DIR
Since 2 releases `osbuild` accepts an `--output-directory=DIR` argument
which lets us decide where to place generated artifacts. Switch over to
it, rather than digging into the store, to make sure we will not access
the osbuild store when parallel cleanups are ongoing (which are not yet
a thing, though).
2020-05-13 13:31:23 +02:00
Lars Karlitski
b5769add2c store: move queue out of the store
The store is responsible for two things: user state and the compose queue. This
is problematic, because the rcm API has slightly different semantics from weldr
and only used the queue part of the store. Also, the store is simply too
complex.

This commit splits the queue part out, using the new jobqueue package in both
the weldr and the rcm package. The queue is saved to a new directory `queue/`.

The weldr package now also has access to a worker server to enqueue and list
jobs. Its store continues to track composes, but the `QueueStatus` for each
compose (and image build) is deprecated. The field in `ImageBuild` is kept for
backwards compatibility for composes which finished before this change, but a
lot of code dealing with it in package compose is dropped.

store.PushCompose() is degraded to storing a new compose. It should probably be
renamed in the future. store.PopJob() is removed.

Job ids are now independent of compose ids. Because of that, the local
target gains ComposeId and ImageBuildId fields, because a worker cannot
infer those from a job anymore. This also necessitates a change in the
worker API: the job routes are changed to expect that instead of a
(compose id, image build id) pair. The route that accepts built images
keeps that pair, because it reports the image back to weldr.

worker.Server() now interacts with a job queue instead of the store. It gains
public functions that allow enqueuing an osbuild job and getting its status,
because only it knows about the specific argument and result types in the job
queue (OSBuildJob and OSBuildJobResult). One oddity remains: it needs to report
an uploaded image to weldr. Do this with a function that's passed in for now,
so that the dependency to the store can be dropped completely.

The rcm API drops its dependencies to package blueprint and store, because it
too interacts only with the worker server now.

Fixes #342
2020-05-08 14:53:00 +02:00
Ondřej Budai
6eb43c3d97 worker: add a support for uploads to azure
Everything else is already implemented, this commit just connects the bits
and pieces in worker.
2020-04-29 18:15:13 +02:00
Ondřej Budai
b916a88242 worker: fix passing the result from osbuild when it fails
I tried fixing this in 181128c5 and forgot to pass the right error in one
place. This commit fixes it.
2020-04-29 11:40:36 +02:00
Ondřej Budai
181128c5b9 worker: fix missing logs when osbuild fails
The commit 2435163f broke sending the logs to osbuild-composer. This was
partly because of unusual error handling in the RunOSBuild function.

This commit fixes that by creating a custom error and properly propagating
the result from it.
2020-04-27 19:36:22 +02:00
Lars Karlitski
ac40b0e73b jobqueue: rename to worker
This package does not contain an actual queue, but a server and client
implementation for osbuild's worker API. Name it accordingly.

The queue is in package `store` right now, but about to be split off.
This rename makes the `jobqueue` name free for that effort.
2020-04-16 01:02:16 +02:00
Lars Karlitski
2435163fc9 worker: move running osbuild into separate function
Setting up a command to run is quite involved. Separate that from the
logic of running it.
2020-04-06 12:11:54 +02:00
Lars Karlitski
1ece08414c jobqueue: move Job.Run() to the worker
This makes the jobqueue package independent of forking osbuild, the
choices for which (exact invocation, location of the cache directory)
should be made in the worker.
2020-04-06 12:11:54 +02:00
Lars Karlitski
d3b9a3515d worker: inline handleJob()
It's a small function that's only called once.
2020-04-06 12:11:54 +02:00
Lars Karlitski
db5dd1ee2c worker: remove redundant UpdateJob() call
A job is already set to be running when it is returned from the API (see
Store.PopJob()).
2020-04-06 12:11:54 +02:00
Lars Karlitski
1f06d78362 jobqueue: rename ID to ComposeID in job structs
It's not an id of the job, but the compose id.
2020-04-06 12:11:54 +02:00
Lars Karlitski
3b5d5a73d3 worker: drop default port
We require passing the address from the unit file. Do the same for the
socket, using host:port syntax.

Overriding the port was broken before, because we unconditionally
appended ":8700" to every address.
2020-03-25 14:05:44 +01:00
Lars Karlitski
f8982f4a1a worker: don't hard code path to unix domain socket
Introduce a mandatory argument `address`, which is interpreted as a path
to a unix socket when `-unix` is given or a network address otherwise.

Move the default path to the service file.

Add a more useful usage message when passing `-help` or no arguments.
2020-03-25 14:05:44 +01:00
Lars Karlitski
b5432e78b9 worker: move ComposerClient to jobqueue package
This moves the client code into the same package as the server code,
which makes it easier to change (and version) the two in sync. Also, it
will allow to make some structs private to the jobqueue package and to
test `Client`.

Also rename it to jobqueue.Client.
2020-03-25 14:05:44 +01:00
Lars Karlitski
cb4421b69f worker: remoteAddress → address 2020-03-25 14:05:44 +01:00
Lars Karlitski
94183d14a8 worker: split NewClient()
Use the default dialing functions for tcp connections and set the tls
config on the transport directly. This makes the code easier to follow,
because the only special case is overriding the DialContext() for unix
connections.
2020-03-25 14:05:44 +01:00