Before, the autoscaling group discovery is failing with error:
Error getting the Autoscaling instances: MissingRegion MissingRegion:
could not find region configuration
Before the instance was vulnerable to an OTA update while processing a
request. Because there is no way of retriggering a job in Composer, it
is better to avoid this situation.
The way we are doing it is by setting the `protected` flag onto the
instance when a job is being processed. This way the AWS scheduler
does hopefully not shutdown the machine at the wrong time.
Main caveats of this solution:
* Starvation: If a worker keeps accepting new jobs, then it might not be
updated.
* Inconsistency: There exist a window between the job acceptation and the
protection where the worker can be shutdown without having the time to
protect itself.
In the internal deployment, we want to talk with composer over a http/https
proxy. This proxy adds new composer.proxy field to the worker config that
causes the worker to connect to composer and the oauth server using
a specified proxy.
NB: The proxy is not supported when connection to composer via unix sockets.
For testing this, I added a small HTTP proxy implementation, pls don't
use this in production, it's just good enough for tests.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Address is always required so not passing one is a clear error, let's return
exit code 2 which go itself returns when bad arguments are passed in.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Refactor the handling of GCP credentials in the worker to be equivalent
to what is done for AWS. The main idea is that the code decides which
credentials to use when processing each job. This change will allow
preferring credentials passed via upload `TargetOptions` with the job,
over the credentials configured in worker's configuration or the default
way of authenticating implemented by the Google library.
Move loading of GCP credentials to the internal `gcp` library into
`NewFromFile()` function accepting path to the file with credentials.
Signed-off-by: Tomas Hozza <thozza@redhat.com>
jobimpl-osbuild
---------------
Add GenericS3Creds to struct
Add method to create AWS with Endpoint for Generic S3 (with its own credentials file)
Move uploading to S3 and result handling to a separate method (along with the special VMDK handling)
adjust the AWS S3 case to the new method
Implement a new case for uploading to a generic S3 service
awscloud
--------
Add wrapper methods for endpoint support
Set the endpoint to the AWS session
Set s3ForcePathStyle to true if endpoint was set
Target
------
Define a new target type for the GenericS3Target and Options
Handle unmarshaling of the target options and result for the Generic S3
Weldr
-----
Add support for only uploading to AWS S3
Define new structures for AWS S3 and Generic S3 (based on AWS S3)
Handle unmarshaling of the providers settings' upload settings
main
----
Add a section in the main config for the Generic S3 service for credentials
If provided pass the credentials file name to the osbuild job implementation
Upload Utility
--------------
Add upload-generic-s3 utility
Makefile
------
Do not fail if the bin directory already exists
Tests
-----
Add test cases for both AWS and a generic S3 server
Add a generic s3_test.sh file for both test cases and add it to the tests RPM spec
Adjust the libvirt test case script to support already created images
GitLabCI - Extend the libvirt test case to include the two new tests
This will allow us to use the service accounts which work against
identity.api.openshift.com. These are much easier to manage, especially
with the new multi-tenancy, as there's a single page to create/expire
them across an account.
They also have the added benefit of not expiring automatically when
they're not used like offline tokens, and immediate expiration when
desired.
The format is now always 'JOB_ID' (JOB_TYPE). This means that we also know
the job type when a job is finished or when it failed.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
This is backwards compatible, as long as the timeout is 0 (never
timeout), which is the default.
In case of the dbjobqueue the underlying timeout is due to
context.Canceled, context.DeadlineExceeded, or net.Error with Timeout()
true. For the fsjobqueue only the first two are considered.
Allow depsolving to be done in a worker through the job queue rather
than synchronously in composer.
The benefit this might unlock include:
- no more blocking calls in the cloud/koji APIs
- only workers accessing repositoires
- no VPN access from composer
- composer not needing to be subscribed to CDN, etc
- no dnf cache managment in composer
Potential problems:
- the version of composer (so the distro definitions) that
triggered a depsolve, may not be the same that uses the
result to generate a manfiset
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2 configurations for the listeners are now possible:
- enableJWT=false with client ssl auth
- enableJWT=true with https
Actual verification of the tokens is handled by
https://github.com/openshift-online/ocm-sdk-go.
An authentication handler is run as the top level handler, before any
routing is done. Routes which do not require authentication should be
listed as exceptions.
Authentication can be restricted using an ACL file which allows
filtering based on JWT claims. For more information see the inline
comments in ocm-sdk/authentication.
As an added quirk the `-v` flag for the osbuild-composer executable was
changed to `-verbose` to avoid flag collision with glog which declares
the `-v` flag in the package `init()` function. The ocm-sdk depends on
glog and pulls it in.
Make the handling of GCP credentials more consistent with what is being
done e.g. for Azure. Make the GCP section in worker's configuration a
pointer so that it does not show up in the printed worker's
configuration during start up if it was not specified in the actual
configuration file.
Load the GCP credentials file, if provided, during the worker start up to
prevent failure later on while processing a job with GCP upload target.
Pass the loaded GCP credentials as []byte to the OSBuildJobImpl.
Signed-off-by: Tomas Hozza <thozza@redhat.com>
This commit adds and implements org.osbuild.azure.image target.
Let's talk about the already implemented org.osbuild.azure target firstly:
The purpose of this target is to authenticate using the Azure Storage
credentials and upload the image file as a Page Blob. Page Blob is basically
an object in storage and it cannot be directly used to launch a VM. To achieve
that, you need to define an actual Azure Image with the Page Blob attached.
For the cloud API, we would like to create an actual Azure Image that is
immediately available for new VMs. The new target accomplishes it.
To achieve this, it must use a different authentication method: Azure OAuth.
The other important difference is that currently, the credentials are stored
on the worker and not in target options. This should lead to better security
because we don't send the credentials over network. In the future, we would
like to have credential-less setup using workers in Azure with the right
IAM policies applied but this requires more investigation and is not
implemented in this commit.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Add support for GCP as an upload target to the internal API.
Extend the cloudapi to allow GCP as an upload target in the compose
request. Regenerate the cloudapi go code. Added GCP-specific upload
result component in the API definition, similar to AWS. It is not yet
used, but it will be once returning a target-specific result from
worker is supported.
Add support for GCP upload target to the worker job implementation.
Signed-off-by: Tomas Hozza <thozza@redhat.com>
Let's keep this on the same filesystem as the osbuild store, and
in particular stay away from /var/tmp and its scary semantics.
We are not aware of any issues caused by /var/tmp, but getting
rid of it means we don't have to think about that when debugging,
if nothing else.
Signed-off-by: Tom Gundersen <teg@jklm.no>
The three new job types osbuild-koji, koji-init, and koji-finalize
allows the different tasks to be split appart and in particular for
there to be several builds on different architectures as part of a
given compose.
Introduce JobImplementation and turn the current RunJob() into
OSBuildJobImpl. Make main() select a job impl based on job type.
This is in preparation to add additional impls.
Move the fact that the worker is requesting jobs of type "osbuild" out
of the client library.
For one, require consumers to pass accepted job types to RequestJobs()
and allow querying for the job type with the new Type() function.
Also, make OSBuildArgs() and Update() generic, requiring to pass an
argument that matches the job type.
Now, main() does not deal with OSBuildJobResult anymore, and RunJob()
doesn't return it. This means we can add more job types (i.e., different
RunJob()s) now.
WatchJob() regularly checks if a job was canceled in a goroutine. It
does so by accessing composer's `/jobs/{token}` route. However, once the
main goroutine marks the job as done (by sending PATCH to that same
route), the `token` is no longer valid and thus the route not accessible
anymore.
main() does cancel the goroutine running WatchJob, but it's not
guaranteed that it gets scheduled in time to actually stop watching the
job.
Thus, don't cancel the job when fetching the `/jobs/{token}` fails. This
means that it won't cancel the job anymore when the connection to
composer goes down.
Also, we will be able to move job.Update() into RunJob().
Workers reported status via an `osbuild.Result`, which only includes
osbuild output. Make it report OSBuildJobResult instead, which was meant
to be used for this purpose and is already used as the result type in
the jobqueue.
While at it, add any errors produced by targets into this struct, as
well as an overall success flag.
Note that this breaks older workers returning the result of an osbuild
job to a new composer. I think this is fine in this case, for two
reasons:
1. We don't support running different versions of the worker and
composer in the weldr API, and remote workers aren't widely used yet.
2. Both osbuild.Result and worker.OSBuildJobResult have a top-level
`Success` boolean. Thus, logs are lost in such cases, but the overall
status of the compose is not.
This function is almost the same as the koji uploader, except that it
calls `CGFailBuild` instead of `CGImport` at the end.
Don't exit early from RunJob() when the job failed. Instead, go through
all the uploaders anyway. All the others don't do anything when the job
fails, but now we have the chance to do the necessary `CGFailBuild` call
for koji.
This moves more logic from main() into RunJob(), so that we can support
different job kinds in the future.
Add "image_name" and "stream_optimized" fields to the osbuild job as
replacement for the local target options. The former signifies the name
of the uploaded artifact and whether an artifact should be uploaded at
all (only weldr API). The latter will be deprecated at some point, when
osbuild itself can make streamoptimized vmdk images.
This change separates what have always been two distinct concepts:
artifacts that are reported back to the composer node (in practice
always running on the same machine), and upload targets to clouds and
such. Separating them makes it easier to add job types that only allow
one upload target while keeping artifacts.
Keep the local target around, so that jobs that are scheduled can still
be run after an upgrade.
The server hasn't used common.ImageBuildState to mark a job as
successful or failed for a long time. Instead, it's using the job's
return argument for that. (Jobs don't have a high-level concept of
failing).
Drop the check in the server, and always send "FINISHED" from the client
for backwards compatibility.
osbuild reports failing builds in two ways: it sets the "success" field
in its output to `false` and it returns with a non-zero exit status. The
worker used both, returning an `OSBuildError` when osbuild return
non-zero, but also forwarding the resulting object with the "success"
field.
Change this to only use the "success" field and ignore the return value.
The latter is useful for people running osbuild in a terminal or script,
but is redundant for this use-case.
This makes error reporting more consistent: `RunOSBuild` only returns an
error when *running* osbuild failed, not when the build fails.
We have the same thing for AWS. The AWS target also specifies under what name
should be the image available in EC2.
As requested by Brew maintainers Tomáš Kopeček and Lubomír Sedlář.
osbuild runs directly on the host, there's no intermediate container,
therefore we should set the container type to none.
As suggested by Brew maintainers Tomáš Kopeček and Lubomír Sedlář.