Fixes the special case that if no worker is available and we
generate an internal timeout and cancel the depsolve including all
followup jobs, no error was propagated.
When requeuing a job the next worker requesting the job would decrement
pending counter, but the pending counter only ever got incremented once,
when the job was first enqueued. Thus make sure to increment the pending
counter when a job is requeued.
osbuild/images added an error type that's returned when the reporegistry
loader doesn't find any repository configurations to load [1]. This
lets callers decide whether to stop or continue execution based on
whether repository configurations are required.
A new top-level configuration option is added for osbuild-composer that
makes it possible to start the service without having static rpm
repositories configured. This is useful in certain (SaaS) modes where
build requests specify their own repository configurations.
To speedup `make lint` we use a local cache.
It seems that there is no sane way to check and just
update the cache, so we'll just provide an option to wipe it.
Using the cache does reduce the time of `make lint` from
1 minute 15sec to 1 sec on my PC. So for consecutive runs,
the cache still makes sense.
The current spot price could be limiting the available instance pool
significantly. ARM instances specifically are experiencing a lot of
capacity errors.
This way users could at least build fedora 41, there is currently an
issue in rpmrepo where the fedora 41 branched repositories are very
slow, so enabling CI is currently not possible.
https://github.com/osbuild/rpmrepo/issues/111
We are on a quest to reduce clutter on our Slack channels. Thus,
I decided to simplify the daily CI notifications:
- the link to the edge pipelines got removed, it's now in bookmarks
- several words were removed to make the message shorter
- the link to the pipeline is now a hyperlink
- the whole message should be a one liner
- less text is now bold
I've also simplified the format in which we send the message. I think
that the block format used before makes redundant line-breaks.
Unfortunately, the mentions need to be done using user IDs instead
of user names. If you ever need to find them, go to the user's profile,
click on the three dots and select "Copy member ID".
The current path sometimes launches two instances, which is problematic
because the rest of the secure instance code expects exactly one
instance. A security group could be attached to both instances, and
would block the worker from launching any more SIs, as it tries to
delete the old security group first, which is still held by one of the
surplus SIs which didn't get terminated.
Only retry if:
- on "UnfulfillableCapacity" or "InsufficientInstanceCapacity" error codes;
- there wasn't an instance launched anyway.
If either of these checks fail, do not try to launch another one, and
just fail the job.
Now and then there are leftover secure instances, probably when worker
instances get terminated during builds, this is possible in ASGs. 2
hours as a cutoff should be enough, since the build times out after 60
minutes, and fetching the output archive after 30 minutes, so that
leaves 30 minutes for booting and connection.
Imagine this scenario: the packer job is ran, an AMI gets created.
We configure our deployment to use this AMI. Then, someone retries the
packer job. Since we have force_deregister=true, this will not only
create a new AMI, but also remove the old one (because it has the same
name). Thus, our deployment will get broken, because the source AMI
no longer exists. This means that the ASG cannot replace any broken
instances, and the secure instance feature gets absolutely broken
because it cannot spawn new secure instances (they "inherit" the AMI
ID from their parents).
Let's remove force_deregister=true, so the AMI never gets replaced.
This might cause some pipelines to start failing because they are
rerunning the packer job for same commit (the GA pipeline currently).
Let's fix those then, rerunning the packer job is just confusing.
If this causes some unexpected issues, we can always resort to using
unique AMI names (by appending a timestamp to their name), but having
multiple AMIs with different names, but same tags will cause our
terraform configuration to be reapplied everytime there's a rerun,
which is also not great.