Unfortunately, `which` does not seem to be installed by default on our
F41 CI images. Instead of doing the dance with rebuilds, which has been
problematic recently, let's not rely on `which` in scripts any more,
since we can replace it with the Bash built-in `type` command.
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
Do not schedule gcp.sh on rhel-10 and centos-stream-10. Also improve
loggin for aws.sh and azure.sh as the cloud-image-val testing is
currently not preformed there.
in many files there was a secondary call to `trap` for the sole purpose
of killing jornalctl (watching worker logs) so that GitLab CI doesn't
hang.
The issue with this is that sometimes the cleared the trap which invokes
the cleanup() function without reinstating it again (not everywhere).
Instead of doing this back-and-forth just make sure we don't leave any
journalctl processes dangling in the background!
NOTES:
- for some scripts, mainly ostree- ones there was no cleanup trap
present, but instead `trap` was configured inside the build_image() function.
The trouble is that this function is executed multiple times and
$WORKER_JOURNAL_PID changes value between these multiple executions.
That's why these scripts introduce the cleanup_on_exit() function where
we make sure to kill any possible dangling journalctl processes.
- The name `cleanup_on_exit()` is chosed because these same scripts
often have a helper function named clean_up() which is sometimes used to remove
virtual machines and other artifacts between calls of build_image().
We were using greenprint for failures, which makes it hard to quickly
find where the tests failed. This switches errors to use redprint, and
adds it to places that were simply using echo before doing an exit 1.
cloud-init and bash should be everywhere. Thus, there's no point in specifying
them as a customization. Actually, it might mask error if we ever stop
installing bash/enabling cloud-init.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Since we're sharing functions between test scripts, move greenprint(),
the most rewritten function in the history of the project, to
shared_lib.sh and source it everywhere.
- Handle the array responses from the new weldr-client (>= 35.6).
- Move the `get_build_info` function to shared_libs.sh to source and
reuse in multiple places.
We were using `latest` as tag, this can be dangerous as it's the default
tag, an anyone can accidentally update it. Using `prod` is safer.
Also use dev container image if the test script is running in CIV CI.
instead of manually updating CIV version every once in a while. Get
always the latest version.
In CIV CI, this test runs before any change can be introduced into the
container image, so no unexpedted errors should come from the CIV side.
`tools/provision.sh` is provisioning SUT always in the same way for
both, the Service scenario and the on-premise scenario. While this is
not causing any issues, it does not realistically represent how we
expect osbuild-composer and worker to be used in these scenarios.
The script currently supports the following authentication options:
- `none`
- Intended for the on-premise scenario with Weldr API.
- NO certificates are generated.
- NO osbuild-composer configuration file is created.
- NO osbuild-worker configuration file is created. This means that no
cloud provider credentials are configured directly in the worker.
- Only the local worker is started and used.
- Only the Weldr API socker is started.
- Appropriate repository definitions are copied to
`/etc/osbuild-composer/repositories/`.
- `jwt`
- Intended for the Service scenario with Cloud API.
- Should be the only method supported in the Service scenario in the
future.
- Certificates are generated and copied to `/etc/osbuild-composer`.
- osbuild-composer configuration file is created and configured for
JWT authentication.
- osbuild-worker configuration file is created, configured for JWT
authentication and with appropriate cloud provider credentials.
- Local worker unit is masked. Only the remote worker is used (the
socket is started and one remote-worker instance is created).
- Only the Cloud API socket is started (Weldr API socket is stopped).
- NO repository definitions are copied to
`/etc/osbuild-composer/repositories/`.
- `tls`
- Intended for the Service scenario with Cloud API.
- Should eventually go away.
- Certificates are generated and copied to `/etc/osbuild-composer`.
- osbuild-composer configuration file is created and configured for
TLS client cert authentication.
- osbuild-worker configuration file is created, configured for TLS
authentication and with appropriate cloud provider credentials.
- Services and sockets are started as they used to be originally:
- Both local and remote worker sockets are started.
- Both Weldr and Cloud API sockets are started.
- Only the local worker unit will be started automatically.
- NO repository definitions are copied to
`/etc/osbuild-composer/repositories/`.
We want to be able to safely gather any artifacts without worrying about
any possible secrets leaking. Every artifacts that we want to upload
will now have to be placed in /tmp/artifacts which will then be uploaded
to S3 by the executor and link to the artifacts will be provided in the
logs. Only people with access to our AWS account can see them.
When a test script fails in CI, it's often difficult to pinpoint the
exact line in the log where the script failed and the cleanup() function
(trapped on EXIT) begins.
Adding a prominent line (with greenprint where available) at the start
of the cleanup function will make reading logs of failed jobs a lot
easier.
Most test scripts don't have any documentation regarding it's purpose,
although it can be guessed by the code. There's value in adding this
small comment.
[skip-ci]
With new weldr-client package the metadata tar archive created has
permissions set to 600 instead of 644 which causes permission failures
when interacting with it. Adding sudo to resolve that.
The commonly used 'greenprint' function now adds a date + timestamp to
each message for debugging and tracking the duration of segments of each
scripts.
RHEL 9.0 isn't yet in .gitlab-ci.yml so this actually doesn't change in test
runs but it should make enabling of the tests easier.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
3a8c6c8a introduced a new logic for killing journalctl. Unfortunately, it
doesn't work properly. In ostree tests, multiple journalctls are spawned
but there can be only one trap active at a time. This caused all but the last
journalctls to hang indefinitely. Unfortunately, hanging background processes
is something that causes the GitLab CI to hang indefinitely as well.
This commit modifies the logic a bit: The trap is still set. However, there's
also an explicit kill of journalctl after the compose is finished. After the
process is successfully killed, the trap is removed.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
sudo journalctl -af -n 1 -u "${WORKER_UNIT}" &
WORKER_JOURNAL_PID=$!
In this snippet, WORKER_JOURNAL_PID is set to the PID of the sudo process.
Sudo doesn't propagate any signals - therefore the child process of sudo
(journalctl in this case) isn't killed when a signal is sent to the parent.
Use pkill -P instead which kills all processes where sudo is the parent.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>