Commit graph

6812 commits

Author SHA1 Message Date
Florian Schüller
1f8da3bd83 templates/dashboards/grafana: enable shared crosshair
shared crosshair makes it easier to see the affected time
in other charts/panels
2024-11-06 11:55:22 +01:00
Florian Schüller
3ff8308389 templates/dashboards/grafana: introduce number of pending jobs 2024-11-06 11:55:22 +01:00
Achilleas Koutsou
c9e412f320 test: enable ignore_missing_repos in service configs 2024-11-05 08:21:42 +01:00
Achilleas Koutsou
5d6a4b762b templates: enable ignore_missing_repos in openshift 2024-11-05 08:21:42 +01:00
Achilleas Koutsou
af48971981 osbuild-composer: fail weldr init when repos are nil
If weldr tries to initialise when there are no repositories set and
ignore_missing_repos is enabled, return with an error.
2024-11-05 08:21:42 +01:00
Achilleas Koutsou
51287ea57e osbuild-composer/config: new option: ignore_missing_repos
osbuild/images added an error type that's returned when the reporegistry
loader doesn't find any repository configurations to load [1].  This
lets callers decide whether to stop or continue execution based on
whether repository configurations are required.

A new top-level configuration option is added for osbuild-composer that
makes it possible to start the service without having static rpm
repositories configured.  This is useful in certain (SaaS) modes where
build requests specify their own repository configurations.
2024-11-05 08:21:42 +01:00
Achilleas Koutsou
161a263b45 go.mod: update osbuild/images to v0.95.0 2024-11-05 08:21:42 +01:00
Florian Schüller
69525b7ce6 Makefile: clean golangci-lint cache on make clean
To speedup `make lint` we use a local cache.
It seems that there is no sane way to check and just
update the cache, so we'll just provide an option to wipe it.

Using the cache does reduce the time of `make lint` from
1 minute 15sec to 1 sec on my PC. So for consecutive runs,
the cache still makes sense.
2024-11-04 18:53:21 +01:00
Tomáš Hozza
a7cd521325 Test/data: add test repo configs for RHEL-9.6
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
2024-11-01 14:45:02 +01:00
Sanne Raymaekers
aeba9d5a68 cloud/awscloud: don't specify max spot price
The current spot price could be limiting the available instance pool
significantly. ARM instances specifically are experiencing a lot of
capacity errors.
2024-10-30 15:41:09 +01:00
schutzbot
5ff0e9ac4e Post release version bump
[skip ci]
2024-10-30 08:16:54 +00:00
Sanne Raymaekers
f6e82ba403 templates/packer: fix fedora 40 aarch64 base image
The old one disappeared.
2024-10-29 17:42:10 +01:00
schutzbot
d29f4665ae schutzfile: Update snapshots to 20241015 2024-10-29 09:29:27 +01:00
Sanne Raymaekers
c1b67440c4 cmd/osbuild-service-maintenance: respect dry run
Respect dry run when terminating leftover SIs.
2024-10-28 10:59:25 +01:00
Sanne Raymaekers
0ef0ae7c97 templates/packer: allow setting executor type in worker config
Currently the worker images always have to use aws.ec2, this way we can
use the host executor for fedora.
2024-10-28 10:51:34 +01:00
Sanne Raymaekers
4afcd8c3fd cloud/awscloud: fix another nilpointer in maintenance functions 2024-10-25 17:46:49 +02:00
Sanne Raymaekers
73536b7743 repositories: add fedora 41
This way users could at least build fedora 41, there is currently an
issue in rpmrepo where the fedora 41 branched repositories are very
slow, so enabling CI is currently not possible.

https://github.com/osbuild/rpmrepo/issues/111
2024-10-25 11:34:04 +02:00
Simon de Vlieger
bccd1639af deps: update images to 0.94
Signed-off-by: Simon de Vlieger <supakeen@redhat.com>
2024-10-25 11:23:16 +02:00
Sanne Raymaekers
4f90a757dc cloud/awscloud: fix retrying to create secure instances
Set the correct target capacity specification type, just setting the
spot options to nil doesn't result in an on demand instance.
2024-10-24 20:25:48 +02:00
Sanne Raymaekers
6ccfc7f818 cloud/awscloud: fix nil pointer dereference in maintenance fns
The maintenance pod is crashing when describing the images by tag, most
likely something else is failing.
2024-10-24 12:05:42 +02:00
Lukas Zapletal
350ad58c31 worker: use the new resolver API 2024-10-24 11:53:04 +02:00
Ondřej Budai
eaf90f5aea schutzbot: shorten the slack notification
We are on a quest to reduce clutter on our Slack channels. Thus,
I decided to simplify the daily CI notifications:
  - the link to the edge pipelines got removed, it's now in bookmarks
  - several words were removed to make the message shorter
  - the link to the pipeline is now a hyperlink
  - the whole message should be a one liner
  - less text is now bold

I've also simplified the format in which we send the message. I think
that the block format used before makes redundant line-breaks.
Unfortunately, the mentions need to be done using user IDs instead
of user names. If you ever need to find them, go to the user's profile,
click on the three dots and select "Copy member ID".
2024-10-24 10:45:24 +02:00
Sanne Raymaekers
d5912259a0 cloud/awscloud: rework create fleet retry logic
The current path sometimes launches two instances, which is problematic
because the rest of the secure instance code expects exactly one
instance. A security group could be attached to both instances, and
would block the worker from launching any more SIs, as it tries to
delete the old security group first, which is still held by one of the
surplus SIs which didn't get terminated.

Only retry if:
- on "UnfulfillableCapacity" or "InsufficientInstanceCapacity" error codes;
- there wasn't an instance launched anyway.

If either of these checks fail, do not try to launch another one, and
just fail the job.
2024-10-24 10:29:26 +02:00
Sanne Raymaekers
661f39cbb9 cmd/osbuild-service-maintenance: add test for filtering SIs 2024-10-23 10:32:57 +02:00
Sanne Raymaekers
04a5ca6965 cmd/osbuild-service-maintenance: clean up secure instances
Now and then there are leftover secure instances, probably when worker
instances get terminated during builds, this is possible in ASGs. 2
hours as a cutoff should be enough, since the build times out after 60
minutes, and fetching the output archive after 30 minutes, so that
leaves 30 minutes for booting and connection.
2024-10-23 10:32:57 +02:00
Sanne Raymaekers
1c7a276d6f cloud/aws: add maintenance functions for secure instance cleanup 2024-10-23 10:32:57 +02:00
Sanne Raymaekers
8fc91d1c6d cloud/aws: move maintenance calls to separate file 2024-10-23 10:32:57 +02:00
Achilleas Koutsou
66c2c31a1c blueprint: add kickstart contents to conversion test
The option was added in f5c6cdd9cf but a
value was never added to the conversion test.
2024-10-22 22:08:39 +02:00
Achilleas Koutsou
654a6ad8f5 blueprint: enable the anaconda modules customization
This has been available since v0.74.0 of osbuild/images but was never
connected to the frontend blueprint.

See https://github.com/osbuild/images/pull/799
2024-10-22 22:08:39 +02:00
Tom Koscielniak
fb7a2aab96 Disable Packer job in scheduled GA pipelines 2024-10-21 14:43:18 +02:00
Ondřej Budai
1b169a150c packer: don't deregister old AMIs
Imagine this scenario: the packer job is ran, an AMI gets created.
We configure our deployment to use this AMI. Then, someone retries the
packer job. Since we have force_deregister=true, this will not only
create a new AMI, but also remove the old one (because it has the same
name). Thus, our deployment will get broken, because the source AMI
no longer exists. This means that the ASG cannot replace any broken
instances, and the secure instance feature gets absolutely broken
because it cannot spawn new secure instances (they "inherit" the AMI
ID from their parents).

Let's remove force_deregister=true, so the AMI never gets replaced.
This might cause some pipelines to start failing because they are
rerunning the packer job for same commit (the GA pipeline currently).
Let's fix those then, rerunning the packer job is just confusing.

If this causes some unexpected issues, we can always resort to using
unique AMI names (by appending a timestamp to their name), but having
multiple AMIs with different names, but same tags will cause our
terraform configuration to be reapplied everytime there's a rerun,
which is also not great.
2024-10-21 11:48:02 +02:00
schutzbot
5eedccfc1a Post release version bump
[skip ci]
2024-10-16 08:15:51 +00:00
Sanne Raymaekers
5eb8227bf3 cloud/awscloud: retry CreateFleet regardless of the error code
The errors returned by create fleet are not entirely clear. It seems it
also returns `InsufficientInstanceCapacity` in addition to
`UnfulfillableCapacity`. Let's just retry three times regardless of the
create fleet error, that way there's no need to chase error codes which
aren't clearly defined.
2024-10-15 16:04:19 +02:00
Sanne Raymaekers
73968236bd repositories: add rhel-9.6 2024-10-14 09:23:19 +02:00
Mario Cattamo
425583c1fd test: disable ostree-remount service checking since /sysroot is ro and /var rw already 2024-10-11 16:31:41 +02:00
Sanne Raymaekers
905df418aa cloud/aws: add a third secure instance fallback across AZs
In case the on demand option failed as well, retry one more time across
availability zones. This significantly increases the pool of available
instances, but increases network related costs, as transferring data
between AZs is not free.
2024-10-07 15:56:07 +02:00
Jakub Rusz
78d3b2fde5 tests/filesystem: increase /usr size
The test started failing on 8.10 GA. It seems that something changed in
the system repos and the size we had originally set was not enough.
2024-10-07 15:02:42 +03:00
Jakub Rusz
a54ac303a3 templates: fix apiVersion
There were errors using the latest oc 4.17 version:

error: failed to read input object (not a Template?): unable to decode
"templates/openshift/composer.yml": no kind "Template" is registered for
version "v1" in scheme "k8s.io/kubectl/pkg/scheme/scheme.go:28"
2024-10-03 16:27:21 +02:00
Jakub Rusz
07a18a5d49 tests/regression: Add config for v3 certificates
When generating x509 v3 certs we need to explicitely set "CA:TRUE"
otherwise they're not trusted to be used. Also start running the tests
on RHEL-9.5 and RHEL-10.0
2024-10-03 16:27:21 +02:00
schutzbot
b9d6dd342d Post release version bump
[skip ci]
2024-10-02 08:15:56 +00:00
Jakub Rusz
763cc2ffb0 CI: integration test rules fixup
Just making it more clear and scheduling aws.sh on RHEL-10 and GA
runners.
2024-09-30 07:43:42 +02:00
Lukas Zapletal
65d5f48847 cloud: fixed typo UnfulfillableCapacity 2024-09-26 18:09:45 +02:00
schutzbot
b2548f5b1a schutzfile: Update snapshots to 20240924 2024-09-25 12:41:52 +02:00
Jakub Rusz
d0ac2f1a37 tests/CI: enable oci api test on rhel-10 2024-09-25 08:30:45 +02:00
Jakub Rusz
ec4aff7e58 test/cases: Use openscap customization on RHEL-10 2024-09-25 08:30:45 +02:00
Sanne Raymaekers
8cf9a542ab Revert "repositories: add fedora-41"
This reverts commit 9c68a82d2e.
2024-09-24 14:46:58 +02:00
Sanne Raymaekers
84d916dd96 Revert ".gitlab-ci.yml: add fedora-41"
This reverts commit 75cd8ee780.
2024-09-24 14:46:58 +02:00
Sanne Raymaekers
2bdeede4b8 Revert "schutzbot/terraform: new fedora-41 runners"
This reverts commit 8485481c90.
2024-09-24 14:46:58 +02:00
Sanne Raymaekers
4cf488376d Revert "Schutzfile: add fedora-41"
This reverts commit 8ef5bac4b9.
2024-09-24 14:46:58 +02:00
Sanne Raymaekers
3f636467ff Revert "test/data: add fedora-41"
This reverts commit 9782abe184.
2024-09-24 14:46:58 +02:00