Unfortunately, `which` does not seem to be installed by default on our
F41 CI images. Instead of doing the dance with rebuilds, which has been
problematic recently, let's not rely on `which` in scripts any more,
since we can replace it with the Bash built-in `type` command.
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
Previously, the worker was determining the GCE image guest OS Features
on its own, based on the OS name. This caused problems, in case the
osbuild-composer was of a newer version than the worker.
Example:
osbuild-composer contained support for c10s GCE image type and its
implementation also contained the proper guest OS Features list for it.
However, when the worker got the osbuild job, it built it and tried to
fetch the guest OS Features for the distro. Since its implementation was
too old, it didn't contain the code that added the actual support for
c10s GCE images and got no guest OS features list (which is the default
for unsupported distros). The image was successfully uploaded and
shared, but it does not boot in GCP, because it does not know that it
should use UEFI to boot it.
This behavior could be considered a bug. The worker should be dumb. It
should not be making decisions about the image features, but instead it
should take them from the upload target options. And composer should be
the authoritative source of truth for this. Because otherwise, we
basically have two components that need to be updated in sync to add
support for GCE images on a new distro.
Move the GCE image guest OS features to the GCP upload target options.
The worker will just take what is specified there and use it when
importing the image to GCP. As a compatibility layer for the case when
the composer would be older than the worker (unlikely, but still),
worker will try to determine the image guest OS features in case the
list in the upload target options is empty.
Extend the GCP functional tests to check that the imported image has at
least some guest OS features set.
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
Enable testing of GCE image type on el10 / c10s. The el10 / c10s image
type temporarily uses cloud-init, because there are no GCP guest tools
for el10 / c10s yet and el9 version can't be installed. This implies
that we need to set the SSH key in the instance metadata and use SSH
directly.
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
Do not schedule gcp.sh on rhel-10 and centos-stream-10. Also improve
loggin for aws.sh and azure.sh as the cloud-image-val testing is
currently not preformed there.
in many files there was a secondary call to `trap` for the sole purpose
of killing jornalctl (watching worker logs) so that GitLab CI doesn't
hang.
The issue with this is that sometimes the cleared the trap which invokes
the cleanup() function without reinstating it again (not everywhere).
Instead of doing this back-and-forth just make sure we don't leave any
journalctl processes dangling in the background!
NOTES:
- for some scripts, mainly ostree- ones there was no cleanup trap
present, but instead `trap` was configured inside the build_image() function.
The trouble is that this function is executed multiple times and
$WORKER_JOURNAL_PID changes value between these multiple executions.
That's why these scripts introduce the cleanup_on_exit() function where
we make sure to kill any possible dangling journalctl processes.
- The name `cleanup_on_exit()` is chosed because these same scripts
often have a helper function named clean_up() which is sometimes used to remove
virtual machines and other artifacts between calls of build_image().
Previously, it was expected from the user to provide the Object name
when uploading image to GCP. The object name does not matter much,
because the object is deleted once image import finishes. Make
the specification of the object name optional and generate it if not
provided.
Adjust the GCP Weldr test case to not provide the Object name when
uploading the image.
The user can still provide the Object name if needed.
Since we're sharing functions between test scripts, move greenprint(),
the most rewritten function in the history of the project, to
shared_lib.sh and source it everywhere.
- Handle the array responses from the new weldr-client (>= 35.6).
- Move the `get_build_info` function to shared_libs.sh to source and
reuse in multiple places.
All EXIT traps are cleared on line 280 so the cleanup trap is never run
and VMs are waiting for 4 hours to get cleaned by
scheduled-cloud-cleaner. Run the cleanup at the end and rely on
scheduled-cloud-cleaner only in case of failures before that.
some scripts skip the test if it's not supported for that
distro-version. Disable them in gitlab-ci.yml so we don't waste CI
resources.
To disable them, we are using the `rules` on each job with a regex
pattern. Using `=~` (pattern matches) as a WHITELIST and `!~` (pattern
does not match) as a BLACKLIST.
`tools/provision.sh` is provisioning SUT always in the same way for
both, the Service scenario and the on-premise scenario. While this is
not causing any issues, it does not realistically represent how we
expect osbuild-composer and worker to be used in these scenarios.
The script currently supports the following authentication options:
- `none`
- Intended for the on-premise scenario with Weldr API.
- NO certificates are generated.
- NO osbuild-composer configuration file is created.
- NO osbuild-worker configuration file is created. This means that no
cloud provider credentials are configured directly in the worker.
- Only the local worker is started and used.
- Only the Weldr API socker is started.
- Appropriate repository definitions are copied to
`/etc/osbuild-composer/repositories/`.
- `jwt`
- Intended for the Service scenario with Cloud API.
- Should be the only method supported in the Service scenario in the
future.
- Certificates are generated and copied to `/etc/osbuild-composer`.
- osbuild-composer configuration file is created and configured for
JWT authentication.
- osbuild-worker configuration file is created, configured for JWT
authentication and with appropriate cloud provider credentials.
- Local worker unit is masked. Only the remote worker is used (the
socket is started and one remote-worker instance is created).
- Only the Cloud API socket is started (Weldr API socket is stopped).
- NO repository definitions are copied to
`/etc/osbuild-composer/repositories/`.
- `tls`
- Intended for the Service scenario with Cloud API.
- Should eventually go away.
- Certificates are generated and copied to `/etc/osbuild-composer`.
- osbuild-composer configuration file is created and configured for
TLS client cert authentication.
- osbuild-worker configuration file is created, configured for TLS
authentication and with appropriate cloud provider credentials.
- Services and sockets are started as they used to be originally:
- Both local and remote worker sockets are started.
- Both Weldr and Cloud API sockets are started.
- Only the local worker unit will be started automatically.
- NO repository definitions are copied to
`/etc/osbuild-composer/repositories/`.
We want to be able to safely gather any artifacts without worrying about
any possible secrets leaking. Every artifacts that we want to upload
will now have to be placed in /tmp/artifacts which will then be uploaded
to S3 by the executor and link to the artifacts will be provided in the
logs. Only people with access to our AWS account can see them.
When a test script fails in CI, it's often difficult to pinpoint the
exact line in the log where the script failed and the cleanup() function
(trapped on EXIT) begins.
Adding a prominent line (with greenprint where available) at the start
of the cleanup function will make reading logs of failed jobs a lot
easier.
We currently use a single GCP Compute region when spinning up VMs using
the imported GCE image. As a result, we are often hitting the
'IN_USE_ADDRESSES' quota limit when there are multiple CI jobs running.
Google does not allow us to increase the quota limit any more.
Change the GCP test cases to use the CI `GCP_REGION` variable to list
all GCE regions with available quota and pick a random one from the
list. The `GCP_REGION` value is used as the region name prefix when
filtering available regions. This means that if you specify an exact GCE
region, such as `us-west1`, you'll always get the same region, but if a
GCP multi-region is used, such as `us`, then a random region prefixed
with 'us' will be used.
Add support for importing the GCE image into GCP using Weldr API. The
credentials to be used can be specified in the upload settings and will
be then used by the worker to authenticate with GCP.
The GCP target credentials are passed to Weldr API as base64 encoded
content of the GCP credentials JSON file. The reason is that the JSON
file contains many values and its format could change in the future.
This way, the Weldr API does not rely on the credentials file content
format in any way.
Add a new test case for the GCP upload via Weldr and run it in CI.
Signed-off-by: Tomas Hozza <thozza@redhat.com>