When the worker executor starts up, many error messages and warnings are
shown in the system logs, worker-initialization.service should actually
not run at all. The service crashes and functionally that's fine, but
it just messes up the log, raises questions and can be avoided by just
not running it.
Update the RHEL-9 Cloud Access images used for our workers from 9.0 to
9.6, which is the latest GA. We do upgrade all packages in our Ansible
playbook, but that is just waste of resources if we can use the latest
GA images.
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
Update Fedora workers from EOL F40 to F42.
Remove workarounds that should not be needed any more (i.e. the Packer
upstream issue has been closed).
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
The `cloud-init.target` in 9.6 has `After=multi-user.target` in its unit
config. The worker initialization service was set to run before
`multi-user.target`, but after `cloud-final.service`. This created an
impossible situation and systemd just disabling the initialization
service.
So this changes:
`multi-user.target -> worker-*.service -> cloud-final.service -> multi-user.target`
to
`cloud-init.target -> worker-*.service -> cloud-final.service -> multi-user.target`.
Thus resolving the loop.
Imagine this scenario: the packer job is ran, an AMI gets created.
We configure our deployment to use this AMI. Then, someone retries the
packer job. Since we have force_deregister=true, this will not only
create a new AMI, but also remove the old one (because it has the same
name). Thus, our deployment will get broken, because the source AMI
no longer exists. This means that the ASG cannot replace any broken
instances, and the secure instance feature gets absolutely broken
because it cannot spawn new secure instances (they "inherit" the AMI
ID from their parents).
Let's remove force_deregister=true, so the AMI never gets replaced.
This might cause some pipelines to start failing because they are
rerunning the packer job for same commit (the GA pipeline currently).
Let's fix those then, rerunning the packer job is just confusing.
If this causes some unexpected issues, we can always resort to using
unique AMI names (by appending a timestamp to their name), but having
multiple AMIs with different names, but same tags will cause our
terraform configuration to be reapplied everytime there's a rerun,
which is also not great.
Ansible on fedora 40 seems broken, the default python 3.12 interpreter
doesn't work, 3.10 works but then the dnf module breaks.
Use 3.10 and stop using the dnf module.
Fedora 38 is EOL, and packit no longer builds rpms for it.
The current python3.12 + ansible 2.12 combination which is the default
on fedora 40 doesn't work, so switch to python3.9.
With the rpmcopy or rpmrepo_osbuild tags, the `Install worker rpm` stage
got skipped on RHEL and CI. Invert the tag logic and use `--tags`
instead of `--skip-tags`.
This reverts commit 484c82ce55.
The AWS sdk fails to get the instance identity document when the proxy
is configured. The proxy will need to be configured explicitly for the
depsolve job and osbuild (sources) job.
If the /tmp/cloud_init_vars contained OSBUILD_EXECUTOR_CLOUDWATCH_GROUP
variable set, the worker configuration file would contain a line with
escaped newline character at the end of the value configuring
`cloudwatch_group` for the `osbuild_executor`. This makes the worker
fail to start when loading the configuration.
Remove the newline from the value appended to the worker config by the
initialization script.
Fix#4001
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
Set the 'cloudwatch_group' value in the worker configuration if provided
in /tmp/cloud_init_vars, so that it is used by the worker when spinning
up an osbuild-executor instance.
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
The /tmp/cloud_init_vars is not created on the worker executor, so
sourcing it will make the script fail. Comment the line out, until we
change the worker implementation to inject this file into the worker
executor using cloud-init.
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
We need to use custom IAM policy name used by the worker for
osbuild-executor on Fedora workers (in prod vs. stage). And we have the
same requirement for the CloudWatch log group used by the
osbuild-executor.
Modify the Ansible playbook used by Packer to use the values from
/tmp/cloud_init_vars if set and defaulting to the current values if not
set.
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
The builder uses `/run/osbuild` as a default path for this argument. Yet
this directory doesn't exist when the builder writes the manifest. But
osbuild should own this directory, not the builder.
Furthermore `/run` is a tmpfs, so the executor might run into memory
issues if we use `/run` as the store and output directory (on the "host"
workers these are in `/var/cache`).
While `/tmp` might seem like a good candidate on RHEL, it's a tmpfs on
Fedora, so it's also to be avoided.
Don't allow unbound variables, but for the variables that are used to
determine whether or not that part of the setup should continue, default
to empty/undefined.